TwinWeaver Examples¶
This directory contains examples demonstrating how to use TwinWeaver for various tasks including data preparation, inference, and fine-tuning.
Data Preprocessing¶
- data_preprocessing/raw_data_preprocessing.ipynb: Start here if you have raw clinical data. Shows how to transform raw EHR exports into the three TwinWeaver dataframes (
df_events,df_constant,df_constant_description), including handling death events and other time-to-event outcomes.
Basic Examples¶
- 01_data_preparation_for_training.ipynb: A basic example showing how to convert data for a single patient using the instruction setup with a custom dataset.
- 02_inference_prompt_preparation.ipynb: Demonstrates how to run inference using TwinWeaver.
- 03_end_to_end_llm_finetuning.ipynb: A comprehensive end-to-end guide on fine-tuning an LLM for medical forecasting. It covers data processing, QLoRA fine-tuning, and inference.
Advanced Examples¶
Located in the advanced/ directory, these examples cover more specific use cases.
Custom Splitting (advanced/custom_splitting/)¶
- training_individual_splitters.ipynb: Demonstrates data preparation using individual data splitters for more granular control.
- training_custom_split_events.ipynb: Shows how to customize split events and forecast different event categories.
- training_forecasting_splitter_only.ipynb: Forecasting-only example showing training data generation using only the
DataSplitterForecasting(no event splitter). - inference_individual_splitters.py: A Python script showing how to run inference using the individual splitter setup.
Custom Output (advanced/custom_output/)¶
- customizing_text_generation.ipynb: A comprehensive tutorial on customizing every textual component of the instruction generation pipeline, including preambles, event formatting, time units, genetic data tags, forecasting prompts, and more.
- custom_summarized_row.ipynb: Shows how to customize the summarized row section of the instruction prompt using
set_custom_summarized_row_fn(). Includes minimal and advanced examples, plus error handling guidance.
Pretraining (advanced/pretraining/)¶
- prepare_pretraining_data.py: A script to prepare data for the pretraining phase.
- end_to_end_llm_training_with_pretrain.ipynb: An end-to-end example for training LLMs on full patient histories without a specific task, useful for developing models that can generate synthetic patients or embeddings.
TTE Probability Inference (advanced/tte_inference/)¶
- tte_probability_inference.ipynb: Demonstrates how to estimate probabilities for time-to-event outcomes (e.g., death, disease progression) using a fine-tuned LLM served via vLLM. Scores three mutually exclusive completions per patient and derives softmax probabilities from length-normalised log-probabilities. Includes evaluation across multiple time horizons. Requires a fine-tuned model and a GPU with enough memory for vLLM.
Integrations¶
Located in the integrations/ directory.
- meds_data_import.ipynb: Shows how to import and work with data in the MEDS format.
Data¶
example_data/: Contains the generator script and sample CSV files (events, constants, etc.) used by the examples.