Customizing Text Generation in TwinWeaver¶
This tutorial demonstrates how to customize every textual component of the instruction generation pipeline. TwinWeaver provides extensive configuration options to tailor the generated text prompts to your specific use case, language preferences, or model requirements.
We will cover:
- Preamble & Introduction Text - Customizing the opening text of patient records
- Demographics Section - Modifying how constant/static data is introduced
- Event Day Formatting - Changing how visit days and time intervals are described
- Time Units - Switching between days and weeks
- Genetic Data Formatting - Customizing genetic event tags and text
- Forecasting Prompts - Modifying value prediction task descriptions
- Time-to-Event Prompts - Customizing survival/event prediction text
- QA/Binning Prompts - Changing quality assurance task descriptions
- Multi-Task Prompts - Customizing multi-task instruction formatting
- Event Category Overrides - Fine-grained control over specific event types
import pandas as pd
from twinweaver import (
DataManager,
Config,
DataSplitterForecasting,
DataSplitterEvents,
ConverterInstruction,
DataSplitter,
)
Load Example Data¶
First, let's load the example data to use throughout this tutorial.
# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")
Part 1: Default Configuration¶
Let's first see the default text generation to understand what we're customizing. We'll set up a minimal config and generate an example.
# Create default config
config_default = Config()
# Required settings for instruction mode
config_default.split_event_category = "lot"
config_default.event_category_forecast = ["lab"]
config_default.event_category_events_prediction_with_naming = {
"death": "death",
"progression": "next progression",
}
config_default.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config_default.constant_birthdate_column = "birthyear"
# Setup data manager with default config
dm_default = DataManager(config=config_default)
dm_default.load_indication_data(
df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description
)
dm_default.process_indication_data()
dm_default.setup_unique_mapping_of_events()
dm_default.setup_hold_out_sets(validation_split=0.1, test_split=0.1)
dm_default.infer_var_types()
# Setup splitters and converter
data_splitter_events_default = DataSplitterEvents(
dm_default,
config=config_default,
max_length_to_sample=pd.Timedelta(weeks=104),
min_length_to_sample=pd.Timedelta(weeks=1),
)
data_splitter_events_default.setup_variables()
data_splitter_forecasting_default = DataSplitterForecasting(
data_manager=dm_default,
config=config_default,
max_forecasted_trajectory_length=pd.Timedelta(days=90),
)
data_splitter_forecasting_default.setup_statistics()
data_splitter_default = DataSplitter(data_splitter_events_default, data_splitter_forecasting_default)
converter_default = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config_default,
dm=dm_default,
variable_stats=data_splitter_forecasting_default.variable_stats,
)
# Generate example with default text
patientid = dm_default.all_patientids[4]
patient_data_default = dm_default.get_patient_data(patientid)
forecasting_splits, events_splits, reference_dates = data_splitter_default.get_splits_from_patient_with_target(
patient_data_default,
)
p_converted_default = converter_default.forward_conversion(
forecasting_splits=forecasting_splits[0],
event_splits=events_splits[0],
)
print("=" * 80)
print("DEFAULT INSTRUCTION OUTPUT:")
print("=" * 80)
print(p_converted_default["instruction"])
Part 2: Fully Customized Text Generation¶
Now let's create a completely customized configuration, changing every textual element. This demonstrates all the available customization options.
2.1 Preamble and Introduction Text¶
The preamble_text is the very first text that appears in the generated instruction, introducing the patient record format.
# Create custom config
config_custom = Config()
# Required settings
config_custom.split_event_category = "lot"
config_custom.event_category_forecast = ["lab"]
config_custom.event_category_events_prediction_with_naming = {
"death": "mortality", # Custom name for death event
"progression": "disease progression", # Custom name for progression
}
config_custom.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config_custom.constant_birthdate_column = "birthyear"
# ============================================================================
# CUSTOMIZING PREAMBLE TEXT
# ============================================================================
# This is the opening text that introduces the patient record
config_custom.preamble_text = (
"📋 ELECTRONIC HEALTH RECORD SUMMARY\n"
"This document contains a chronological summary of a patient's medical journey. "
"The record begins with baseline demographics, followed by a timeline of clinical encounters. "
"Laboratory values use standardized LOINC coding."
)
2.2 Demographics Section Text¶
The constant_text introduces the static/demographic data section.
# ============================================================================
# CUSTOMIZING DEMOGRAPHICS SECTION
# ============================================================================
# Text that introduces the demographics/constant data
config_custom.constant_text = "\n\n👤 PATIENT DEMOGRAPHICS:\n"
2.3 Event Day Formatting¶
These settings control how clinical visits and time intervals are described.
# ============================================================================
# CUSTOMIZING EVENT DAY TEXT
# ============================================================================
# Text for the first visit/encounter
config_custom.first_day_text = "\n🏥 INITIAL ENCOUNTER:\nDuring the baseline visit, the following was documented:\n"
# Preamble before each subsequent visit (appears before the time delta)
config_custom.event_day_preamble = "\n📅 "
# Text describing time elapsed since previous visit
# Note: {unit} placeholder is used by set_delta_time_unit(), or set directly
config_custom.event_day_text = " weeks after the previous encounter, a follow-up visit recorded:\n"
# Text appended after listing events for a day
config_custom.post_event_text = ".\n"
2.4 Time Units¶
You can switch between days and weeks for time intervals. Use set_delta_time_unit() to update all related prompts automatically.
# ============================================================================
# CUSTOMIZING TIME UNITS
# ============================================================================
# Option 1: Use the helper method (updates all time-related prompts)
# config_custom.set_delta_time_unit("days", unit_sing="day")
# Option 2: Set directly (if you want different phrasing)
config_custom.delta_time_unit = "weeks"
# The time unit appears in several prompts - you can customize each:
config_custom.forecasting_prompt_var_time = " over the upcoming weeks "
2.5 Genetic Data Formatting¶
Control how genetic/molecular data is tagged and displayed.
# ============================================================================
# CUSTOMIZING GENETIC DATA FORMATTING
# ============================================================================
# Tags used to wrap genetic information in the text
config_custom.genetic_tag_opening = "[MOLECULAR: "
config_custom.genetic_tag_closing = "]"
# Text shown when no genetic data is available
config_custom.genetic_empty_text = "🧬 No molecular/genetic testing data available."
# Value to skip when converting genetic events (often 'present' is implied)
config_custom.genetic_skip_text_value = "present"
2.6 Forecasting Prompts (Value Prediction)¶
These settings control the task prompts for predicting future values.
# ============================================================================
# CUSTOMIZING FORECASTING PROMPTS
# ============================================================================
# Main forecasting task prompt
config_custom.forecasting_fval_prompt_start = (
"\n🔮 PREDICTION TASK - LABORATORY VALUES:\n"
"Based on the patient history above, predict the expected values for the following "
"laboratory parameters at each future time point:\n"
)
# Summary section introducing last known values
config_custom.forecasting_prompt_summarized_start = "\n📊 REFERENCE VALUES (most recent measurements):\n"
# Text used when first day is overridden/truncated
config_custom.forecasting_firstday_override = (
"\n⚠️ Note: Some early events may have been omitted due to context limits. Available history begins with:\n"
)
# Summary of last observed genetic events
config_custom.forecasting_prompt_summarized_genetic = "\n\n🧬 MOLECULAR STATUS (last observed):\n"
# Summary of most recent treatment line
config_custom.forecasting_prompt_summarized_lot = "\n💊 CURRENT TREATMENT REGIMEN:\n"
2.7 Time-to-Event Prompts (Survival Analysis)¶
These settings control the task prompts for predicting whether events occur within a time horizon.
# ============================================================================
# CUSTOMIZING TIME-TO-EVENT PROMPTS
# ============================================================================
# Start of TTE prediction prompt
config_custom.forecasting_tte_prompt_start = (
"\n⏱️ OUTCOME PREDICTION TASK:\nDetermine whether follow-up data was censored (incomplete) "
)
# Middle section specifying time horizon
config_custom.forecasting_tte_prompt_mid = " weeks from the last documented visit, and whether the event occurred: "
# End section with output format instructions
config_custom.forecasting_tte_prompt_end = (
".\n📝 Format your response as: 'PREDICTION: [event_name] - Censored: [YES/NO], Occurred: [YES/NO]'"
)
# Target/answer formatting
config_custom.target_prompt_start = "\nPREDICTION: {event_name} - "
config_custom.target_prompt_censor_true = "Censored: YES."
config_custom.target_prompt_censor_false = "Censored: NO, "
config_custom.target_prompt_before_occur = ""
config_custom.target_prompt_occur = "Occurred: YES."
config_custom.target_prompt_not_occur = "Occurred: NO."
2.8 QA/Binning Prompts¶
These settings control the quality assurance task that predicts value bins.
# ============================================================================
# CUSTOMIZING QA/BINNING PROMPTS
# ============================================================================
# QA task prompt
config_custom.qa_prompt_start = (
"\n📊 CLASSIFICATION TASK - VALUE RANGES:\n"
"For each variable below, predict which range (bin) the future value will fall into "
"at each time point:"
)
# Text introducing available bins
config_custom.qa_bins_start = "\t➡️ Available categories: "
2.9 Multi-Task Prompts¶
When multiple tasks are combined in one prompt, these settings control the formatting.
# ============================================================================
# CUSTOMIZING MULTI-TASK PROMPTS
# ============================================================================
# Introduction to multi-task section
config_custom.task_prompt_start = (
"\n" + "==================================" + "\n"
"📋 MULTI-TASK INSTRUCTIONS\n"
"Complete each task below. Label each response with the task number.\n"
"==================================" + "\n\n"
)
# Template for each task introduction
config_custom.task_prompt_each_task = "📌 TASK #{task_nr}: "
# End of task prompts section
config_custom.task_prompt_end = "\n" + "-" * 50 + "\n"
# Task type labels
config_custom.task_prompt_forecasting = "Value Forecasting"
config_custom.task_prompt_forecasting_qa = "Value Range Classification"
config_custom.task_prompt_events = "Outcome Prediction"
config_custom.task_prompt_custom = "Custom Analysis"
# Target/answer formatting for multi-task
config_custom.task_target_start = "\n✅ TASK #{task_nr} RESPONSE: "
config_custom.task_target_end = "\n"
2.10 Event Category Overrides¶
For fine-grained control, you can override how specific event categories or individual events are rendered.
# ============================================================================
# CUSTOMIZING EVENT CATEGORY PREAMBLES
# ============================================================================
# Override the introductory text for specific event categories
# Structure: {event_category: preamble_string}
config_custom.event_category_preamble_mapping_override = {
"lab": "🔬 Laboratory Results: ",
"drug": "💊 Medications: ",
"condition": "🩺 Diagnoses/Conditions: ",
"lot": "📋 Treatment Line: ",
"vitals": "📈 Vital Signs: ",
}
# ============================================================================
# CUSTOMIZING SPECIFIC EVENT RENDERING
# ============================================================================
# Override how specific events are rendered in text
# Structure: {event_category: {event_name: {"full_replacement_string": str, "reverse_string_value": str}}}
# This allows complete control over how individual events appear in the generated text
config_custom.event_category_and_name_replace_override = {
"death": {
"death": {
"full_replacement_string": "⚠️ PATIENT DECEASED",
"reverse_string_value": "deceased",
}
},
"progression": {
"progression": {
"full_replacement_string": "📈 Disease progression documented",
"reverse_string_value": "progressed",
}
},
}
2.11 Additional Text Formatting Options¶
# ============================================================================
# ADDITIONAL FORMATTING OPTIONS
# ============================================================================
# Number of decimal places for numeric values
config_custom.decimal_precision = 1
# Whether to always include the first visit (even with token constraints)
config_custom.always_keep_first_visit = True
Part 3: Generate Output with Custom Text¶
Now let's set up the pipeline with our customized config and see the difference.
# Setup data manager with custom config
dm_custom = DataManager(config=config_custom)
dm_custom.load_indication_data(
df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description
)
dm_custom.process_indication_data()
dm_custom.setup_unique_mapping_of_events()
dm_custom.setup_hold_out_sets(validation_split=0.1, test_split=0.1)
dm_custom.infer_var_types()
# Setup splitters and converter with custom config
data_splitter_events_custom = DataSplitterEvents(
dm_custom,
config=config_custom,
max_length_to_sample=pd.Timedelta(weeks=104),
min_length_to_sample=pd.Timedelta(weeks=1),
)
data_splitter_events_custom.setup_variables()
data_splitter_forecasting_custom = DataSplitterForecasting(
data_manager=dm_custom,
config=config_custom,
max_forecasted_trajectory_length=pd.Timedelta(days=90),
)
data_splitter_forecasting_custom.setup_statistics()
data_splitter_custom = DataSplitter(data_splitter_events_custom, data_splitter_forecasting_custom)
converter_custom = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config_custom,
dm=dm_custom,
variable_stats=data_splitter_forecasting_custom.variable_stats,
)
# Generate example with custom text
patientid = dm_custom.all_patientids[4]
patient_data_custom = dm_custom.get_patient_data(patientid)
forecasting_splits_custom, events_splits_custom, reference_dates_custom = (
data_splitter_custom.get_splits_from_patient_with_target(patient_data_custom)
)
p_converted_custom = converter_custom.forward_conversion(
forecasting_splits=forecasting_splits_custom[0],
event_splits=events_splits_custom[0],
)
print("=" * 80)
print("CUSTOMIZED INSTRUCTION OUTPUT:")
print("=" * 80)
print(p_converted_custom["instruction"])
print("=" * 80)
print("CUSTOMIZED ANSWER OUTPUT:")
print("=" * 80)
print(p_converted_custom["answer"])
Part 4: Quick Reference - All Text Configuration Options¶
Here's a comprehensive table of all text customization options available in Config:
| Setting | Description | Default |
|---|---|---|
| Patient Record Introduction | ||
preamble_text |
Opening text introducing the patient record | "The following is a patient..." |
constant_text |
Text introducing demographics section | "\n\nStarting with demographic data:\n" |
| Visit/Event Day Text | ||
first_day_text |
Text for the first visit | "\nOn the first visit..." |
event_day_preamble |
Preamble before subsequent visits | "\n" |
event_day_text |
Text for subsequent visits with time delta | " weeks later..." |
post_event_text |
Text after listing day's events | ".\n" |
| Time Units | ||
delta_time_unit |
Time unit for intervals | "weeks" |
forecasting_prompt_var_time |
Time description in forecasting | " the future weeks " |
| Genetic Data | ||
genetic_tag_opening |
Opening tag for genetic data | " |
genetic_tag_closing |
Closing tag for genetic data | "" |
genetic_empty_text |
Text when no genetic data | "No genetic data available." |
genetic_skip_text_value |
Value to skip in genetic events | "present" |
| Forecasting Task Prompts | ||
forecasting_fval_prompt_start |
Main forecasting task introduction | "\nYour task is to predict..." |
forecasting_prompt_summarized_start |
Last values summary intro | "\nThe last values..." |
forecasting_firstday_override |
Text when first day truncated | "\nThe following events..." |
forecasting_prompt_summarized_genetic |
Genetic summary intro | "\n\n\nHere we repeat..." |
forecasting_prompt_summarized_lot |
Treatment line summary intro | "\nThe most recent line..." |
| Time-to-Event Prompts | ||
forecasting_tte_prompt_start |
TTE task introduction | "\nYour task is to predict..." |
forecasting_tte_prompt_mid |
TTE time horizon text | " weeks from..." |
forecasting_tte_prompt_end |
TTE format instructions | ".\nPlease provide..." |
target_prompt_start |
TTE answer format start | "\nHere is the prediction..." |
target_prompt_censor_true |
Text for censored events | "censored." |
target_prompt_censor_false |
Text for non-censored events | "not censored " |
target_prompt_before_occur |
Conjunction before occurrence | "and " |
target_prompt_occur |
Text for occurred events | "occurred." |
target_prompt_not_occur |
Text for non-occurred events | "did not occur." |
| QA/Binning Prompts | ||
qa_prompt_start |
QA task introduction | "\nYour task is to predict..." |
qa_bins_start |
Bins list introduction | "\tThe possible bins are: " |
| Multi-Task Prompts | ||
task_prompt_start |
Multi-task section intro | "\nYou will now have..." |
task_prompt_each_task |
Template for each task | "Task {task_nr} is " |
task_prompt_end |
End of task prompts | "" |
task_prompt_forecasting |
Forecasting task label | "forecasting:" |
task_prompt_forecasting_qa |
QA task label | "forecasting QA:" |
task_prompt_events |
Events task label | "time to event prediction:" |
task_prompt_custom |
Custom task label | " a custom task:" |
task_target_start |
Multi-task answer format | "Task {task_nr} is " |
task_target_end |
End of task answer | "" |
| Overrides | ||
event_category_preamble_mapping_override |
Custom preambles per category | None |
event_category_and_name_replace_override |
Custom rendering per event | None |
decimal_precision |
Decimal places for numbers | 2 |
Summary¶
This tutorial demonstrated how to customize every aspect of TwinWeaver's text generation through the Config class. Key takeaways:
- Preamble and introduction text sets the context for the patient record
- Event day formatting controls how clinical visits are described temporally
- Time units can be switched between days and weeks using
set_delta_time_unit() - Genetic data formatting uses customizable tags and placeholder text
- Task prompts (forecasting, TTE, QA) can be fully rewritten for different LLM styles
- Multi-task formatting allows structured output for complex prediction tasks
- Category overrides provide fine-grained control over specific event types
Use these customization options to:
- Adapt prompts for different language models
- Translate prompts to other languages
- Add visual formatting (emojis, separators) for clarity
- Match specific institutional or research requirements