Data Splitter (Main)¶
twinweaver.instruction.data_splitter ¶
Classes¶
DataSplitter ¶
Combines both data splitters into one interface for easier usage. For more advanced use cases, the individual data splitters can still be used directly.
At least one of data_splitter_events or data_splitter_forecasting must be
provided. When only one splitter is supplied, the methods will return None /
empty results for the missing task type.
Source code in twinweaver/instruction/data_splitter.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
Functions¶
get_splits_from_patient_inference ¶
get_splits_from_patient_inference(
patient_data,
inference_type="both",
forecasting_override_variables_to_predict=None,
events_override_category=None,
events_override_observation_time_delta=None,
)
Generates a single split for inference based on the latest available data.
This method assumes the inference should occur at the last recorded date in the patient's event history. It generates a single split (forecasting, events, or both) anchored at this date. This is typically used for generating predictions on new data. Target values will not be available or filtered.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_data
|
dict
|
Dictionary containing the patient's data. 'events' dataframe must be present. |
required |
inference_type
|
str
|
The type of inference task to generate: 'forecasting', 'events', or 'both'. Defaults to "both". |
'both'
|
forecasting_override_variables_to_predict
|
list[str]
|
List of variables to generate forecasts for. If None, variables might be sampled (though sampling behavior depends on implementation when no target is present). |
None
|
events_override_category
|
str
|
The event category to predict (e.g., 'death', 'progression'). |
None
|
events_override_observation_time_delta
|
Timedelta
|
The time window for the event prediction. |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
A tuple containing: 1. forecast_split: DataSplitterForecastingOption or None The generated forecasting option, or None if inference_type is 'events'. 2. events_split: DataSplitterEventsOption or None The generated event prediction option, or None if inference_type is 'forecasting'. |
Source code in twinweaver/instruction/data_splitter.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
get_splits_from_patient_with_target ¶
get_splits_from_patient_with_target(
patient_data,
max_num_splits_per_split_event=1,
forecasting_nr_samples_per_split=1,
events_max_nr_samples_per_split=1,
forecasting_filter_outliers=False,
forecasting_override_categories_to_predict=None,
forecasting_override_variables_to_predict=None,
forecasting_override_split_dates=None,
events_override_category=None,
events_override_observation_time_delta=None,
)
Generates both forecasting and event prediction splits for a patient, ensuring proper alignment.
This value uses the forecasting splitter to determine candidate split dates (based on LoT or overrides), which are then passed to the event prediction splitter to ensure both tasks use the same anchor points in time. This is critical for multitasking or consistent evaluation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patient_data
|
dict
|
Dictionary containing the patient's data ('events' and 'constant'). |
required |
max_num_splits_per_split_event
|
int
|
Maximum number of random split dates to select per Line of Therapy. Defaults to 1. |
1
|
forecasting_nr_samples_per_split
|
int
|
Number of forecasting task variants (variable subsets) to generate per split date. Defaults to 1. |
1
|
events_max_nr_samples_per_split
|
int
|
Maximum number of event prediction tasks to generate per split date. Defaults to 1. |
1
|
forecasting_filter_outliers
|
bool
|
Whether to apply outlier filtering (e.g., 3-sigma) to target values in forecasting tasks. Defaults to False. |
False
|
forecasting_override_categories_to_predict
|
list[str]
|
Force forecasting of all variables in these categories. |
None
|
forecasting_override_variables_to_predict
|
list[str]
|
Force forecasting of these specific variables. |
None
|
forecasting_override_split_dates
|
list[datetime]
|
Force usage of these specific split dates. |
None
|
events_override_category
|
str
|
Force event prediction for this specific event category. |
None
|
events_override_observation_time_delta
|
Timedelta
|
Force a specific prediction window duration for event tasks. |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
A tuple containing three elements: 1. forecasting_splits: list[DataSplitterForecastingGroup] or None List of generated forecasting split groups, or None if no forecasting splitter is set. 2. events_splits: list[DataSplitterEventsGroup] or None List of generated event prediction split groups, or None if no events splitter is set. 3. reference_dates: pd.DataFrame DataFrame containing the split dates and LoT dates used. |
Source code in twinweaver/instruction/data_splitter.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |