Dataset Format¶
TwinWeaver expects three primary dataframes (or CSV files) as input. Example files can be found in examples/example_data/.
1. Longitudinal Events (events.csv)¶
Contains time-varying clinical data where each row represents a single event.
| Column | Description |
|---|---|
patientid |
Unique identifier for the patient |
date |
Date of the event |
event_descriptive_name |
Human-readable name used in the text output |
event_category |
(Optional) Category (e.g., lab, drug) |
event_name |
(Optional) Specific event identifier |
event_value |
Value associated with the event |
meta_data |
(Optional) Additional metadata |
source |
(Optional) Source of the data - e.g. events or genetic |
Example:
patientid,date,event_descriptive_name,event_category,event_name,event_value,meta_data,source
patient_001,2024-01-15,Hemoglobin,lab,HGB,12.5,,clinical
patient_001,2024-01-15,White Blood Cells,lab,WBC,7.2,,clinical
patient_001,2024-02-01,Chemotherapy Started,treatment,CHEMO,1,,clinical
2. Patient Constants (constant.csv)¶
Contains static patient information (demographics, baseline characteristics). One row per patient.
| Column | Description |
|---|---|
patientid |
Unique identifier for the patient |
birthyear |
(example) Patient's year of birth |
gender |
(example) Patient's gender |
... |
Any other static patient attributes |
Example:
patientid,birthyear,gender,diagnosis_stage
patient_001,1965,Female,Stage II
patient_002,1978,Male,Stage III
3. Constant Descriptions (constant_description.csv)¶
Maps columns in the constant table to human-readable descriptions for the text prompt.
| Column | Description |
|---|---|
variable |
Name of the column in the constant table |
comment |
Description of the variable for the text prompt |
Example:
variable,comment
birthyear,Year of birth
gender,Patient gender
diagnosis_stage,Cancer stage at diagnosis
Loading Data¶
Data can be loaded as pandas DataFrames directly:
import pandas as pd
from twinweaver import DataManager, Config
# Load your data
df_events = pd.read_csv("events.csv")
df_constant = pd.read_csv("constant.csv")
df_constant_description = pd.read_csv("constant_description.csv")
# Initialize DataManager
config = Config()
dm = DataManager(config=config)
dm.load_indication_data(
df_events=df_events,
df_constant=df_constant,
df_constant_description=df_constant_description
)
See the Data Preparation Tutorial for a complete walkthrough.