Customizing the Summarized Row with set_custom_summarized_row_fn¶
The summarized row is a compact summary section inserted into the instruction prompt just before the task questions. By default, it includes:
- The most recent genetic information
- The most recent Line of Therapy (LoT) start
- The last known value for each target variable being forecasted
In many use cases, you may want to customize this section — for example, to:
- Add domain-specific summaries (e.g., latest vital signs, risk scores)
- Simplify or remove sections that aren't relevant to your dataset
- Change the formatting or language of the summary
TwinWeaver's ConverterInstruction class exposes the set_custom_summarized_row_fn() method to let you plug in your own logic.
This notebook demonstrates:
- Generating output with the default summarized row
- Writing and applying a custom summarized row function
- Comparing the results side by side
Setup¶
We follow the same setup as the main data preparation notebook.
import pandas as pd
from twinweaver import (
DataManager,
Config,
DataSplitterForecasting,
DataSplitterEvents,
ConverterInstruction,
DataSplitter,
)
Load Data¶
# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")
Configuration¶
config = Config()
# Required settings
config.split_event_category = "lot"
config.event_category_forecast = ["lab"]
config.event_category_events_prediction_with_naming = {
"death": "death",
"progression": "next progression",
}
config.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config.constant_birthdate_column = "birthyear"
Data Manager & Splitters¶
dm = DataManager(config=config)
dm.load_indication_data(df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description)
dm.process_indication_data()
dm.setup_unique_mapping_of_events()
dm.setup_hold_out_sets(validation_split=0.1, test_split=0.1)
dm.infer_var_types()
# Setup splitters
data_splitter_events = DataSplitterEvents(
dm,
config=config,
max_length_to_sample=pd.Timedelta(weeks=104),
min_length_to_sample=pd.Timedelta(weeks=1),
)
data_splitter_events.setup_variables()
data_splitter_forecasting = DataSplitterForecasting(
data_manager=dm,
config=config,
max_forecasted_trajectory_length=pd.Timedelta(days=90),
)
data_splitter_forecasting.setup_statistics()
data_splitter = DataSplitter(data_splitter_events, data_splitter_forecasting)
Generate Splits for a Patient¶
We pick a patient and generate splits so we can compare the default and custom summarized rows on the same data.
patientid = dm.all_patientids[4]
patient_data = dm.get_patient_data(patientid)
forecasting_splits, events_splits, reference_dates = data_splitter.get_splits_from_patient_with_target(
patient_data,
)
print(f"Patient: {patientid}")
print(f"Number of forecasting split groups: {len(forecasting_splits)}")
print(f"Number of event split groups: {len(events_splits)}")
Step 1: Default Summarized Row¶
First, let's create a converter with the default summarized row function and generate an instruction to see what it looks like.
converter_default = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config,
dm=dm,
variable_stats=data_splitter_forecasting.variable_stats,
)
p_default = converter_default.forward_conversion(
forecasting_splits=forecasting_splits[0],
event_splits=events_splits[0],
)
print("=" * 80)
print("DEFAULT INSTRUCTION (last 2500 chars to see the summarized row + tasks):")
print("=" * 80)
print(p_default["instruction"][-2500:])
Step 2: Define a Custom Summarized Row Function¶
Now let's write a custom function. The function must follow this signature:
def my_custom_fn(self, events_df: pd.DataFrame, combined_meta: dict) -> str:
...
Parameters:
self— theConverterInstructioninstance (gives you access toself.config,self.dm, etc.)events_df— a DataFrame of the patient's events up to the split datecombined_meta— a dict with keys:"dates_per_variable": maps target variable names → list of future dates being forecasted"variable_name_mapping": maps variable names → descriptive names
Returns: a string that will be inserted into the instruction prompt between the event history and the task questions.
Example 1: Minimal custom summary¶
This example creates a simple summary that lists only the last known lab values, skipping genetic and LoT information entirely.
def custom_summarized_row_minimal(self, events_df, combined_meta):
"""
A minimal custom summarized row that only shows the last known value
for each target variable being forecasted.
"""
ret = "\nSummary of latest known values:\n"
dates_per_variable = combined_meta.get("dates_per_variable", {})
variable_name_mapping = combined_meta.get("variable_name_mapping", {})
if not dates_per_variable:
ret += "\tNo target variables to summarize.\n"
return ret
# Sort events by date to get the most recent values
sorted_events = events_df.sort_values(self.config.date_col)
for var_name in sorted(dates_per_variable.keys()):
descriptive_name = variable_name_mapping.get(var_name, var_name)
var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]
if not var_data.empty:
last_val = var_data[self.config.event_value_col].iloc[-1]
ret += f"\t- {descriptive_name}: {last_val}\n"
else:
ret += f"\t- {descriptive_name}: not available\n"
return ret
Apply the Custom Function¶
Use set_custom_summarized_row_fn() to register our custom function on the converter.
converter_custom = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config,
dm=dm,
variable_stats=data_splitter_forecasting.variable_stats,
)
# Register the custom summarized row function
converter_custom.set_custom_summarized_row_fn(custom_summarized_row_minimal)
p_custom = converter_custom.forward_conversion(
forecasting_splits=forecasting_splits[0],
event_splits=events_splits[0],
)
print("=" * 80)
print("CUSTOM INSTRUCTION (last 1500 chars):")
print("=" * 80)
print(p_custom["instruction"][-1500:])
Step 3: A More Advanced Custom Function¶
This example builds a richer summary that includes:
- A count of total events in the patient history
- The most recent treatment/drug event
- The last known values for target variables (with trend direction)
def custom_summarized_row_advanced(self, events_df, combined_meta):
"""
A more advanced custom summarized row that includes event counts,
the most recent drug, and last known target values with trend indicators.
"""
ret = "\n--- Patient Summary ---\n"
# 1. Total event count by category
category_counts = events_df[self.config.event_category_col].value_counts()
ret += "Event counts: "
ret += ", ".join([f"{cat}={count}" for cat, count in category_counts.items()])
ret += "\n"
# 2. Most recent drug/treatment
drug_events = events_df[events_df[self.config.event_category_col] == "drug"]
if not drug_events.empty:
last_drug = drug_events.sort_values(self.config.date_col).iloc[-1]
ret += f"Most recent treatment: {last_drug[self.config.event_descriptive_name_col]}\n"
else:
ret += "Most recent treatment: none recorded\n"
# 3. Target variable last values with simple trend
dates_per_variable = combined_meta.get("dates_per_variable", {})
variable_name_mapping = combined_meta.get("variable_name_mapping", {})
if dates_per_variable:
ret += "Latest lab values:\n"
sorted_events = events_df.sort_values(self.config.date_col)
for var_name in sorted(dates_per_variable.keys()):
descriptive_name = variable_name_mapping.get(var_name, var_name)
var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]
if len(var_data) >= 2:
try:
prev_val = float(var_data[self.config.event_value_col].iloc[-2])
last_val = float(var_data[self.config.event_value_col].iloc[-1])
trend = "↑" if last_val > prev_val else ("↓" if last_val < prev_val else "→")
ret += f"\t{descriptive_name}: {last_val} ({trend})\n"
except (ValueError, TypeError):
last_val = var_data[self.config.event_value_col].iloc[-1]
ret += f"\t{descriptive_name}: {last_val}\n"
elif len(var_data) == 1:
last_val = var_data[self.config.event_value_col].iloc[-1]
ret += f"\t{descriptive_name}: {last_val}\n"
else:
ret += f"\t{descriptive_name}: not available\n"
ret += "--- End Summary ---\n"
return ret
# Apply the advanced custom function
converter_advanced = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config,
dm=dm,
variable_stats=data_splitter_forecasting.variable_stats,
)
converter_advanced.set_custom_summarized_row_fn(custom_summarized_row_advanced)
p_advanced = converter_advanced.forward_conversion(
forecasting_splits=forecasting_splits[0],
event_splits=events_splits[0],
)
print("=" * 80)
print("ADVANCED CUSTOM INSTRUCTION (last 2000 chars):")
print("=" * 80)
print(p_advanced["instruction"][-2000:])
Step 4: Error Handling¶
TwinWeaver validates the function signature when you call set_custom_summarized_row_fn(). The function must have self as the first parameter, followed by at least two more parameters (events_df and combined_meta).
Here's what happens if you pass an invalid function:
# This will raise a TypeError because the signature is wrong (missing 'self')
def bad_fn(events_df, combined_meta):
return "This won't work"
try:
converter_default.set_custom_summarized_row_fn(bad_fn)
except TypeError as e:
print(f"Caught expected error: {e}")
If the function has the correct signature but raises an error at runtime (e.g., accesses a column that doesn't exist), the error will be raised during forward_conversion() or forward_conversion_inference().
# This has the correct signature but will fail at runtime
def broken_fn(self, events_df, combined_meta):
return events_df["nonexistent_column"] # KeyError!
converter_broken = ConverterInstruction(
nr_tokens_budget_total=8192,
config=config,
dm=dm,
variable_stats=data_splitter_forecasting.variable_stats,
)
converter_broken.set_custom_summarized_row_fn(broken_fn)
try:
converter_broken.forward_conversion(
forecasting_splits=forecasting_splits[0],
event_splits=events_splits[0],
)
except TypeError as e:
print(f"Caught runtime error: {e}")
Summary¶
| Step | What | How |
|---|---|---|
| 1 | Define a function | def my_fn(self, events_df, combined_meta) -> str: |
| 2 | Register it | converter.set_custom_summarized_row_fn(my_fn) |
| 3 | Generate prompts | converter.forward_conversion(...) uses your custom function |
Key points:
- The function signature must start with
(self, events_df, combined_meta, ...) selfgives access to the fullConverterInstructioninstance (config, data manager, etc.)events_dfcontains all patient events up to the split datecombined_meta["dates_per_variable"]tells you which variables are being forecastedcombined_meta["variable_name_mapping"]maps variable names to descriptive names- The returned string is inserted between the event history and the task questions in the instruction
- The custom function also works with
forward_conversion_inference()for inference prompts