MEDS Importer¶
twinweaver.utils.meds_importer ¶
Functions¶
convert_meds_to_dtc ¶
convert_meds_to_dtc(
df_codes,
df_data,
df_split,
prefer_text_value_over_numeric=True,
no_value_default="occurred",
event_category_mapping=None,
default_category="generic_event",
)
Converts raw medical data into a format compatible with the digital_twin_converter package.
This function takes multiple DataFrames containing patient data, medical codes, and patient splits, and transforms them into three distinct DataFrames: one for static (constant) patient data, one for descriptions of that static data, and one for time-series event data.
The function separates events with timestamps from those without. Events
without timestamps are treated as static patient characteristics and are
pivoted into a wide-format DataFrame (converted_constant). Events with
timestamps are formatted into a long-format DataFrame (converted_events).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_codes
|
DataFrame
|
DataFrame containing the mapping from medical codes to their descriptions. Must contain 'code' and 'description' columns. |
required |
df_data
|
DataFrame
|
The primary DataFrame with event data for each patient. Must include 'subject_id', 'code', 'time', 'text_value', and 'numeric_value' columns. |
required |
df_split
|
DataFrame
|
DataFrame containing static patient information, such as train/test split assignments. Must include a 'subject_id' column. |
required |
prefer_text_value_over_numeric
|
bool
|
If True, when an event has both a text and a numeric value, the text value is used. If False, the numeric value is prioritized. Defaults to True. |
True
|
no_value_default
|
str
|
The default value to assign to an event's 'event_value' if it has a timestamp but no associated text or numeric value. Defaults to "occurred". |
'occurred'
|
event_category_mapping
|
dict
|
A dictionary to map event codes ('event_name') to event categories. If None, no specific category mapping is applied. Defaults to None. |
None
|
default_category
|
str
|
The category to assign to events that are not found in the
|
'generic_event'
|
Returns:
| Name | Type | Description |
|---|---|---|
converted_constant |
DataFrame
|
A DataFrame with one row per patient, containing static features. This
includes data from |
converted_constant_description |
DataFrame
|
A DataFrame providing human-readable descriptions for each column in
the |
converted_events |
DataFrame
|
A long-format DataFrame containing all time-stamped events, structured for time-series analysis. Includes columns like 'patientid', 'date', 'event_name', 'event_value', and 'event_category'. |
Warns:
| Type | Description |
|---|---|
Warning
|
Prints a warning to the console if duplicate rows are detected in the
final |
Source code in twinweaver/utils/meds_importer.py
| |