Data Splitter (Main)¶
twinweaver.instruction.data_splitter ¶
Classes¶
DataSplitter ¶
Combines both data splitters into one interface for easier usage. For more advanced use cases, the individual data splitters can still be used directly.
Source code in twinweaver/instruction/data_splitter.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
Functions¶
get_splits_from_patient_inference ¶
get_splits_from_patient_inference(
patient_data,
inference_type="both",
forecasting_override_variables_to_predict=None,
events_override_category=None,
events_override_observation_time_delta=None,
)
Generates a single split for inference based on the latest available data.
This method assumes the inference should occur at the last recorded date in the patient's event history. It generates a single split (forecasting, events, or both) anchored at this date. This is typically used for generating predictions on new data. Target values will not be available or filtered.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_data
|
dict
|
Dictionary containing the patient's data. 'events' dataframe must be present. |
required |
inference_type
|
str
|
The type of inference task to generate: 'forecasting', 'events', or 'both'. Defaults to "both". |
'both'
|
forecasting_override_variables_to_predict
|
list[str]
|
List of variables to generate forecasts for. If None, variables might be sampled (though sampling behavior depends on implementation when no target is present). |
None
|
events_override_category
|
str
|
The event category to predict (e.g., 'death', 'progression'). |
None
|
events_override_observation_time_delta
|
Timedelta
|
The time window for the event prediction. |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
A tuple containing: 1. forecast_split: DataSplitterForecastingOption or None The generated forecasting option, or None if inference_type is 'events'. 2. events_split: DataSplitterEventsOption or None The generated event prediction option, or None if inference_type is 'forecasting'. |
Source code in twinweaver/instruction/data_splitter.py
get_splits_from_patient_with_target ¶
get_splits_from_patient_with_target(
patient_data,
max_num_splits_per_lot=1,
forecasting_nr_samples_per_split=1,
events_max_nr_samples_per_split=1,
forecasting_filter_outliers=False,
forecasting_override_categories_to_predict=None,
forecasting_override_variables_to_predict=None,
forecasting_override_split_dates=None,
events_override_category=None,
events_override_observation_time_delta=None,
)
Generates both forecasting and event prediction splits for a patient, ensuring proper alignment.
This value uses the forecasting splitter to determine candidate split dates (based on LoT or overrides), which are then passed to the event prediction splitter to ensure both tasks use the same anchor points in time. This is critical for multitasking or consistent evaluation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_data
|
dict
|
Dictionary containing the patient's data ('events' and 'constant'). |
required |
max_num_splits_per_lot
|
int
|
Maximum number of random split dates to select per Line of Therapy. Defaults to 1. |
1
|
forecasting_nr_samples_per_split
|
int
|
Number of forecasting task variants (variable subsets) to generate per split date. Defaults to 1. |
1
|
events_max_nr_samples_per_split
|
int
|
Maximum number of event prediction tasks to generate per split date. Defaults to 1. |
1
|
forecasting_filter_outliers
|
bool
|
Whether to apply outlier filtering (e.g., 3-sigma) to target values in forecasting tasks. Defaults to False. |
False
|
forecasting_override_categories_to_predict
|
list[str]
|
Force forecasting of all variables in these categories. |
None
|
forecasting_override_variables_to_predict
|
list[str]
|
Force forecasting of these specific variables. |
None
|
forecasting_override_split_dates
|
list[datetime]
|
Force usage of these specific split dates. |
None
|
events_override_category
|
str
|
Force event prediction for this specific event category. |
None
|
events_override_observation_time_delta
|
Timedelta
|
Force a specific prediction window duration for event tasks. |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
A tuple containing three elements: 1. forecasting_splits: list[DataSplitterForecastingGroup] List of generated forecasting split groups. 2. events_splits: list[DataSplitterEventsGroup] List of generated event prediction split groups, corresponding to the forecasting splits. 3. reference_dates: pd.DataFrame DataFrame containing the split dates and LoT dates used. |