MEDS Importer¶
twinweaver.utils.meds_importer ¶
Functions¶
convert_meds_to_dtc ¶
convert_meds_to_dtc(
df_codes,
df_data,
df_split,
prefer_text_value_over_numeric=True,
no_value_default="occurred",
event_category_mapping=None,
default_category="generic_event",
)
Converts raw medical data into a format compatible with the digital_twin_converter package.
This function takes multiple DataFrames containing patient data, medical codes, and patient splits, and transforms them into three distinct DataFrames: one for static (constant) patient data, one for descriptions of that static data, and one for time-series event data.
The function separates events with timestamps from those without. Events
without timestamps are treated as static patient characteristics and are
pivoted into a wide-format DataFrame (converted_constant). Events with
timestamps are formatted into a long-format DataFrame (converted_events).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_codes
|
DataFrame
|
DataFrame containing the mapping from medical codes to their descriptions. Must contain 'code' and 'description' columns. |
required |
df_data
|
DataFrame
|
The primary DataFrame with event data for each patient. Must include 'subject_id', 'code', 'time', 'text_value', and 'numeric_value' columns. |
required |
df_split
|
DataFrame
|
DataFrame containing static patient information, such as train/test split assignments. Must include a 'subject_id' column. |
required |
prefer_text_value_over_numeric
|
bool
|
If True, when an event has both a text and a numeric value, the text value is used. If False, the numeric value is prioritized. Defaults to True. |
True
|
no_value_default
|
str
|
The default value to assign to an event's 'event_value' if it has a timestamp but no associated text or numeric value. Defaults to "occurred". |
'occurred'
|
event_category_mapping
|
dict
|
A dictionary to map event codes ('event_name') to event categories. If None, no specific category mapping is applied. Defaults to None. |
None
|
default_category
|
str
|
The category to assign to events that are not found in the
|
'generic_event'
|
Returns:
| Name | Type | Description |
|---|---|---|
converted_constant |
DataFrame
|
A DataFrame with one row per patient, containing static features. This
includes data from |
converted_constant_description |
DataFrame
|
A DataFrame providing human-readable descriptions for each column in
the |
converted_events |
DataFrame
|
A long-format DataFrame containing all time-stamped events, structured for time-series analysis. Includes columns like 'patientid', 'date', 'event_name', 'event_value', and 'event_category'. |
Warns:
| Type | Description |
|---|---|
Warning
|
Prints a warning to the console if duplicate rows are detected in the
final |
Source code in twinweaver/utils/meds_importer.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |