Customizing the Summarized Row with `set_custom_summarized_row_fn`¶

The summarized row is a compact summary section inserted into the instruction prompt just before the task questions. By default, it includes:

The most recent genetic information
The most recent Line of Therapy (LoT) start
The last known value for each target variable being forecasted

In many use cases, you may want to customize this section — for example, to:

Add domain-specific summaries (e.g., latest vital signs, risk scores)
Simplify or remove sections that aren't relevant to your dataset
Change the formatting or language of the summary

TwinWeaver's ConverterInstruction class exposes the set_custom_summarized_row_fn() method to let you plug in your own logic.

This notebook demonstrates:

Generating output with the default summarized row
Writing and applying a custom summarized row function
Comparing the results side by side

Setup¶

We follow the same setup as the main data preparation notebook.

In [ ]:

Copied!





import pandas as pd

from twinweaver import (
    DataManager,
    Config,
    DataSplitterForecasting,
    DataSplitterEvents,
    ConverterInstruction,
    DataSplitter,
)
import pandas as pd

from twinweaver import (
    DataManager,
    Config,
    DataSplitterForecasting,
    DataSplitterEvents,
    ConverterInstruction,
    DataSplitter,
)

Load Data¶

In [ ]:

Copied!





# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")
# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")

Configuration¶

In [ ]:

Copied!





config = Config()

# Required settings
config.split_event_category = "lot"
config.event_category_forecast = ["lab"]
config.event_category_events_prediction_with_naming = {
    "death": "death",
    "progression": "next progression",
}
config.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config.constant_birthdate_column = "birthyear"
config = Config()

# Required settings
config.split_event_category = "lot"
config.event_category_forecast = ["lab"]
config.event_category_events_prediction_with_naming = {
    "death": "death",
    "progression": "next progression",
}
config.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config.constant_birthdate_column = "birthyear"

Data Manager & Splitters¶

In [ ]:

Copied!





dm = DataManager(config=config)
dm.load_indication_data(df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description)
dm.process_indication_data()
dm.setup_unique_mapping_of_events()
dm.setup_hold_out_sets(validation_split=0.1, test_split=0.1)
dm.infer_var_types()
dm = DataManager(config=config)
dm.load_indication_data(df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description)
dm.process_indication_data()
dm.setup_unique_mapping_of_events()
dm.setup_hold_out_sets(validation_split=0.1, test_split=0.1)
dm.infer_var_types()

In [ ]:

Copied!





# Setup splitters
data_splitter_events = DataSplitterEvents(
    dm,
    config=config,
    max_length_to_sample=pd.Timedelta(weeks=104),
    min_length_to_sample=pd.Timedelta(weeks=1),
)
data_splitter_events.setup_variables()

data_splitter_forecasting = DataSplitterForecasting(
    data_manager=dm,
    config=config,
    max_forecasted_trajectory_length=pd.Timedelta(days=90),
)
data_splitter_forecasting.setup_statistics()

data_splitter = DataSplitter(data_splitter_events, data_splitter_forecasting)
# Setup splitters
data_splitter_events = DataSplitterEvents(
    dm,
    config=config,
    max_length_to_sample=pd.Timedelta(weeks=104),
    min_length_to_sample=pd.Timedelta(weeks=1),
)
data_splitter_events.setup_variables()

data_splitter_forecasting = DataSplitterForecasting(
    data_manager=dm,
    config=config,
    max_forecasted_trajectory_length=pd.Timedelta(days=90),
)
data_splitter_forecasting.setup_statistics()

data_splitter = DataSplitter(data_splitter_events, data_splitter_forecasting)

Generate Splits for a Patient¶

We pick a patient and generate splits so we can compare the default and custom summarized rows on the same data.

In [ ]:

Copied!





patientid = dm.all_patientids[4]
patient_data = dm.get_patient_data(patientid)

forecasting_splits, events_splits, reference_dates = data_splitter.get_splits_from_patient_with_target(
    patient_data,
)
print(f"Patient: {patientid}")
print(f"Number of forecasting split groups: {len(forecasting_splits)}")
print(f"Number of event split groups: {len(events_splits)}")
patientid = dm.all_patientids[4]
patient_data = dm.get_patient_data(patientid)

forecasting_splits, events_splits, reference_dates = data_splitter.get_splits_from_patient_with_target(
    patient_data,
)
print(f"Patient: {patientid}")
print(f"Number of forecasting split groups: {len(forecasting_splits)}")
print(f"Number of event split groups: {len(events_splits)}")

Step 1: Default Summarized Row¶

First, let's create a converter with the default summarized row function and generate an instruction to see what it looks like.

In [ ]:

Copied!





converter_default = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

p_default = converter_default.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("DEFAULT INSTRUCTION (last 2500 chars to see the summarized row + tasks):")
print("=" * 80)
print(p_default["instruction"][-2500:])
converter_default = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

p_default = converter_default.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("DEFAULT INSTRUCTION (last 2500 chars to see the summarized row + tasks):")
print("=" * 80)
print(p_default["instruction"][-2500:])

Step 2: Define a Custom Summarized Row Function¶

Now let's write a custom function. The function must follow this signature:

def my_custom_fn(self, events_df: pd.DataFrame, combined_meta: dict) -> str:
    ...

Parameters:

self — the ConverterInstruction instance (gives you access to self.config, self.dm, etc.)
events_df — a DataFrame of the patient's events up to the split date
combined_meta — a dict with keys:
- "dates_per_variable": maps target variable names → list of future dates being forecasted
- "variable_name_mapping": maps variable names → descriptive names

Returns: a string that will be inserted into the instruction prompt between the event history and the task questions.

Example 1: Minimal custom summary¶

This example creates a simple summary that lists only the last known lab values, skipping genetic and LoT information entirely.

In [ ]:

Copied!





def custom_summarized_row_minimal(self, events_df, combined_meta):
    """
    A minimal custom summarized row that only shows the last known value
    for each target variable being forecasted.
    """
    ret = "\nSummary of latest known values:\n"

    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if not dates_per_variable:
        ret += "\tNo target variables to summarize.\n"
        return ret

    # Sort events by date to get the most recent values
    sorted_events = events_df.sort_values(self.config.date_col)

    for var_name in sorted(dates_per_variable.keys()):
        descriptive_name = variable_name_mapping.get(var_name, var_name)
        var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]
        if not var_data.empty:
            last_val = var_data[self.config.event_value_col].iloc[-1]
            ret += f"\t- {descriptive_name}: {last_val}\n"
        else:
            ret += f"\t- {descriptive_name}: not available\n"

    return ret
def custom_summarized_row_minimal(self, events_df, combined_meta):
    """
    A minimal custom summarized row that only shows the last known value
    for each target variable being forecasted.
    """
    ret = "\nSummary of latest known values:\n"

    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if not dates_per_variable:
        ret += "\tNo target variables to summarize.\n"
        return ret

    # Sort events by date to get the most recent values
    sorted_events = events_df.sort_values(self.config.date_col)

    for var_name in sorted(dates_per_variable.keys()):
        descriptive_name = variable_name_mapping.get(var_name, var_name)
        var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]
        if not var_data.empty:
            last_val = var_data[self.config.event_value_col].iloc[-1]
            ret += f"\t- {descriptive_name}: {last_val}\n"
        else:
            ret += f"\t- {descriptive_name}: not available\n"

    return ret

Apply the Custom Function¶

Use set_custom_summarized_row_fn() to register our custom function on the converter.

In [ ]:

Copied!





converter_custom = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

# Register the custom summarized row function
converter_custom.set_custom_summarized_row_fn(custom_summarized_row_minimal)

p_custom = converter_custom.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("CUSTOM INSTRUCTION (last 1500 chars):")
print("=" * 80)
print(p_custom["instruction"][-1500:])
converter_custom = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

# Register the custom summarized row function
converter_custom.set_custom_summarized_row_fn(custom_summarized_row_minimal)

p_custom = converter_custom.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("CUSTOM INSTRUCTION (last 1500 chars):")
print("=" * 80)
print(p_custom["instruction"][-1500:])

Step 3: A More Advanced Custom Function¶

This example builds a richer summary that includes:

A count of total events in the patient history
The most recent treatment/drug event
The last known values for target variables (with trend direction)

In [ ]:

Copied!





def custom_summarized_row_advanced(self, events_df, combined_meta):
    """
    A more advanced custom summarized row that includes event counts,
    the most recent drug, and last known target values with trend indicators.
    """
    ret = "\n--- Patient Summary ---\n"

    # 1. Total event count by category
    category_counts = events_df[self.config.event_category_col].value_counts()
    ret += "Event counts: "
    ret += ", ".join([f"{cat}={count}" for cat, count in category_counts.items()])
    ret += "\n"

    # 2. Most recent drug/treatment
    drug_events = events_df[events_df[self.config.event_category_col] == "drug"]
    if not drug_events.empty:
        last_drug = drug_events.sort_values(self.config.date_col).iloc[-1]
        ret += f"Most recent treatment: {last_drug[self.config.event_descriptive_name_col]}\n"
    else:
        ret += "Most recent treatment: none recorded\n"

    # 3. Target variable last values with simple trend
    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if dates_per_variable:
        ret += "Latest lab values:\n"
        sorted_events = events_df.sort_values(self.config.date_col)

        for var_name in sorted(dates_per_variable.keys()):
            descriptive_name = variable_name_mapping.get(var_name, var_name)
            var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]

            if len(var_data) >= 2:
                try:
                    prev_val = float(var_data[self.config.event_value_col].iloc[-2])
                    last_val = float(var_data[self.config.event_value_col].iloc[-1])
                    trend = "↑" if last_val > prev_val else ("↓" if last_val < prev_val else "→")
                    ret += f"\t{descriptive_name}: {last_val} ({trend})\n"
                except (ValueError, TypeError):
                    last_val = var_data[self.config.event_value_col].iloc[-1]
                    ret += f"\t{descriptive_name}: {last_val}\n"
            elif len(var_data) == 1:
                last_val = var_data[self.config.event_value_col].iloc[-1]
                ret += f"\t{descriptive_name}: {last_val}\n"
            else:
                ret += f"\t{descriptive_name}: not available\n"

    ret += "--- End Summary ---\n"
    return ret
def custom_summarized_row_advanced(self, events_df, combined_meta):
    """
    A more advanced custom summarized row that includes event counts,
    the most recent drug, and last known target values with trend indicators.
    """
    ret = "\n--- Patient Summary ---\n"

    # 1. Total event count by category
    category_counts = events_df[self.config.event_category_col].value_counts()
    ret += "Event counts: "
    ret += ", ".join([f"{cat}={count}" for cat, count in category_counts.items()])
    ret += "\n"

    # 2. Most recent drug/treatment
    drug_events = events_df[events_df[self.config.event_category_col] == "drug"]
    if not drug_events.empty:
        last_drug = drug_events.sort_values(self.config.date_col).iloc[-1]
        ret += f"Most recent treatment: {last_drug[self.config.event_descriptive_name_col]}\n"
    else:
        ret += "Most recent treatment: none recorded\n"

    # 3. Target variable last values with simple trend
    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if dates_per_variable:
        ret += "Latest lab values:\n"
        sorted_events = events_df.sort_values(self.config.date_col)

        for var_name in sorted(dates_per_variable.keys()):
            descriptive_name = variable_name_mapping.get(var_name, var_name)
            var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]

            if len(var_data) >= 2:
                try:
                    prev_val = float(var_data[self.config.event_value_col].iloc[-2])
                    last_val = float(var_data[self.config.event_value_col].iloc[-1])
                    trend = "↑" if last_val > prev_val else ("↓" if last_val < prev_val else "→")
                    ret += f"\t{descriptive_name}: {last_val} ({trend})\n"
                except (ValueError, TypeError):
                    last_val = var_data[self.config.event_value_col].iloc[-1]
                    ret += f"\t{descriptive_name}: {last_val}\n"
            elif len(var_data) == 1:
                last_val = var_data[self.config.event_value_col].iloc[-1]
                ret += f"\t{descriptive_name}: {last_val}\n"
            else:
                ret += f"\t{descriptive_name}: not available\n"

    ret += "--- End Summary ---\n"
    return ret

In [ ]:

Copied!





# Apply the advanced custom function
converter_advanced = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

converter_advanced.set_custom_summarized_row_fn(custom_summarized_row_advanced)

p_advanced = converter_advanced.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("ADVANCED CUSTOM INSTRUCTION (last 2000 chars):")
print("=" * 80)
print(p_advanced["instruction"][-2000:])
# Apply the advanced custom function
converter_advanced = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

converter_advanced.set_custom_summarized_row_fn(custom_summarized_row_advanced)

p_advanced = converter_advanced.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
)

print("=" * 80)
print("ADVANCED CUSTOM INSTRUCTION (last 2000 chars):")
print("=" * 80)
print(p_advanced["instruction"][-2000:])

Step 4: Error Handling¶

TwinWeaver validates the function signature when you call set_custom_summarized_row_fn(). The function must have self as the first parameter, followed by at least two more parameters (events_df and combined_meta).

Here's what happens if you pass an invalid function:

In [ ]:

Copied!





# This will raise a TypeError because the signature is wrong (missing 'self')
def bad_fn(events_df, combined_meta):
    return "This won't work"


try:
    converter_default.set_custom_summarized_row_fn(bad_fn)
except TypeError as e:
    print(f"Caught expected error: {e}")
# This will raise a TypeError because the signature is wrong (missing 'self')
def bad_fn(events_df, combined_meta):
    return "This won't work"


try:
    converter_default.set_custom_summarized_row_fn(bad_fn)
except TypeError as e:
    print(f"Caught expected error: {e}")

If the function has the correct signature but raises an error at runtime (e.g., accesses a column that doesn't exist), the error will be raised during forward_conversion() or forward_conversion_inference().

In [ ]:

Copied!





# This has the correct signature but will fail at runtime
def broken_fn(self, events_df, combined_meta):
    return events_df["nonexistent_column"]  # KeyError!


converter_broken = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)
converter_broken.set_custom_summarized_row_fn(broken_fn)

try:
    converter_broken.forward_conversion(
        forecasting_splits=forecasting_splits[0],
        event_splits=events_splits[0],
    )
except TypeError as e:
    print(f"Caught runtime error: {e}")
# This has the correct signature but will fail at runtime
def broken_fn(self, events_df, combined_meta):
    return events_df["nonexistent_column"]  # KeyError!


converter_broken = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)
converter_broken.set_custom_summarized_row_fn(broken_fn)

try:
    converter_broken.forward_conversion(
        forecasting_splits=forecasting_splits[0],
        event_splits=events_splits[0],
    )
except TypeError as e:
    print(f"Caught runtime error: {e}")

Summary¶

Step	What	How
1	Define a function	`def my_fn(self, events_df, combined_meta) -> str:`
2	Register it	`converter.set_custom_summarized_row_fn(my_fn)`
3	Generate prompts	`converter.forward_conversion(...)` uses your custom function

Key points:

The function signature must start with (self, events_df, combined_meta, ...)
self gives access to the full ConverterInstruction instance (config, data manager, etc.)
events_df contains all patient events up to the split date
combined_meta["dates_per_variable"] tells you which variables are being forecasted
combined_meta["variable_name_mapping"] maps variable names to descriptive names
The returned string is inserted between the event history and the task questions in the instruction
The custom function also works with forward_conversion_inference() for inference prompts

Customizing the Summarized Row with set_custom_summarized_row_fn¶

Setup¶

Load Data¶

Configuration¶

Data Manager & Splitters¶

Generate Splits for a Patient¶

Step 1: Default Summarized Row¶

Step 2: Define a Custom Summarized Row Function¶

Example 1: Minimal custom summary¶

Apply the Custom Function¶

Step 3: A More Advanced Custom Function¶

Step 4: Error Handling¶

Summary¶

Customizing the Summarized Row with `set_custom_summarized_row_fn`¶