TwinWeaver Examples¶

This directory contains examples demonstrating how to use TwinWeaver for various tasks including data preparation, inference, and fine-tuning.

Data Preprocessing¶

data_preprocessing/raw_data_preprocessing.ipynb: Start here if you have raw clinical data. Shows how to transform raw EHR exports into the three TwinWeaver dataframes (df_events, df_constant, df_constant_description), including handling death events and other time-to-event outcomes.

01_data_preparation_for_training.ipynb: A basic example showing how to convert data for a single patient using the instruction setup with a custom dataset.
02_inference_prompt_preparation.ipynb: Demonstrates how to run inference using TwinWeaver.
03_end_to_end_llm_finetuning.ipynb: A comprehensive end-to-end guide on fine-tuning an LLM for medical forecasting. It covers data processing, QLoRA fine-tuning, and inference.

Located in the advanced/ directory, these examples cover more specific use cases.

training_individual_splitters.ipynb: Demonstrates data preparation using individual data splitters for more granular control.
training_custom_split_events.ipynb: Shows how to customize split events and forecast different event categories.
training_forecasting_splitter_only.ipynb: Forecasting-only example showing training data generation using only the DataSplitterForecasting (no event splitter).
inference_individual_splitters.py: A Python script showing how to run inference using the individual splitter setup.

customizing_text_generation.ipynb: A comprehensive tutorial on customizing every textual component of the instruction generation pipeline, including preambles, event formatting, time units, genetic data tags, forecasting prompts, and more.
custom_summarized_row.ipynb: Shows how to customize the summarized row section of the instruction prompt using set_custom_summarized_row_fn(). Includes minimal and advanced examples, plus error handling guidance.

prepare_pretraining_data.py: A script to prepare data for the pretraining phase.
end_to_end_llm_training_with_pretrain.ipynb: An end-to-end example for training LLMs on full patient histories without a specific task, useful for developing models that can generate synthetic patients or embeddings.

tte_probability_inference.ipynb: Demonstrates how to estimate probabilities for time-to-event outcomes (e.g., death, disease progression) using a fine-tuned LLM served via vLLM. Scores three mutually exclusive completions per patient and derives softmax probabilities from length-normalised log-probabilities. Includes evaluation across multiple time horizons. Requires a fine-tuned model and a GPU with enough memory for vLLM.

Located in the integrations/ directory.

meds_data_import.ipynb: Shows how to import and work with data in the MEDS format.

example_data/: Contains the generator script and sample CSV files (events, constants, etc.) used by the examples.