⚕️ Interpretable Clinical Decision Rules ⚕️️
Validating and deriving clinical-decision rules. Work-in-progress.
This is a collaborative repository intended to validate and derive clinical-decision rules. We use a unified pipeline across a variety of contributed datasets to vet previous modeling practices for clinical decision rules. Additionally, we hope to externally validate the rules under study here with data from UCSF.
Rule derivation datasets
Dataset | Task | Size | References | Processed |
---|---|---|---|---|
iai_pecarn | Predict intra-abdominal injury requiring acute intervention before CT | 12,044 patients, 203 with IAI-I | 📄, 🔗 | ✅ |
tbi_pecarn | Predict traumatic brain injuries before CT | 42,412 patients, 376 with ciTBI | 📄, 🔗 | ❌ |
csi_pecarn | Predict cervical spine injury in children | 3,314 patients, 540 with CSI | 📄, 🔗 | ❌ |
tig_pecarn | Predict bacterial/non-bacterial infections in febrile infants from RNA transcriptional biosignatures | 279 patients, ? with infection | 🔗 | ❌ |
exxagerate | Predict 30-day mortality for acute exacerbations of chronic obstructive pulmonary disease (AECOPD) | 1,696 patients, 17 mortalities | 📄, 🔗 | ❌ |
heart_disease_uci | Predict heart disease presence from basic attributes / screening | 920 patients, 509 with heart disease | 📄, 🔗 | ❌ |
Research paper 📄, Data download link 🔗
Datasets are all tabular (or at least have interpretable input features), reasonably large (e.g. have at least 100 positive and negative cases), and have a binary outcome. For PECARN datasets, please read and agree to the research data use agreement on the PECARN website.
Possible data sources: PECARN datasets | Kaggle datasets | MDCalc | UCI | OpenML | MIMIC | UCSF De-ID Potential specific datasets: Maybe later will expand to other high-stakes datasets (e.g. COMPAS, loan risk).
Contributing checklist
To contribute a new project (e.g. a new dataset + modeling), create a pull request following the steps below. The easiest way to do this is to copy-paste an existing project (e.g. iai_pecarn) into a new folder and then edit that one.
Helpful docs: Collaboration details | Lab writeup | Slides
- [ ] Repo set up
- [ ] Create a fork of this repo (see tutorial on forking/merging here)
- [ ] Install the repo as shown below
- [ ] Select a dataset - once you've selected, open an issue in this repo with the name of the dataset + a brief description so others don't work on the same dataset
- [ ] Assign a
project_name
to the new project (e.g.iai_pecarn
) - [ ] Data preprocessing
- [ ] Download the raw data into
data/{project_name}/raw
- Don't commit any very large files
- [ ] Copy the template files from
rulevetting/projects/iai_pecarn
to a new folderrulevetting/projects/{project_name}
- [ ] Rewrite the functions in
dataset.py
for processing the new dataset (e.g. see the dataset for iai_pecarn) - [ ] Document any judgement calls you aren't sure about using the
dataset.get_judgement_calls_dictionary
function- See the template file for documentation of each function or the API documentation
- Notebooks / helper functions are optional, all files should be within
rulevetting/projects/{project_name}
- [ ] Rewrite the functions in
- [ ] Data description
- [ ] Describe each feature in the processed data in a file named
data_dictionary.md
- [ ] Summarize the data and the prediction task in a file named
readme.md
. This should include basic details of data collection (who, how, when, where), why the task is important, and how a clinical decision rule may be used in this context. Should also include your names/affiliations. - [ ] Modeling
- [ ] Baseline model - implement
baseline.py
for predicting given a baseline rule (e.g. from the existing paper)- should override the model template in a class named
Baseline
- should override the model template in a class named
- [ ] New model - implement
model_best.py
for making predictions using your newly derived best model- also should override the model template in a class named
Model
- also should override the model template in a class named
- [ ] Lab writeup (see instructions)
- [ ] Save writeup into
writeup.pdf
+ include source files - Should contain details on exploratory analysis, modeling, validation, comparisons with baseline, etc.
- [ ] Submitting
- [ ] Ensure that all tests pass by running
pytest --project {project_name}
from the repo directory - [ ] Open a pull request and it will be reviewed / merged
- [ ] Reviewing submissions
- [ ] Each pull request will be reviewed by others before being merged
Installation
Note: requires python 3.7 and pytest (for running the automated tests). It is best practice to create a venv or pipenv for this project.
python -m venv rule-env
source rule-env/bin/activate
Then, clone the repo and install the package and its dependencies.
git clone https://github.com/Yu-Group/rule-vetting
cd rule-vetting
pip install -e .
Now run the automatic tests to ensure everything works.
pytest --project iai_pecarn
To use with jupyter, might have to add this venv as a jupyter kernel.
python -m ipykernel install --user --name=rule-env
Clinical Trial Datasets
Dataset | Task | Size | References | Processed |
---|---|---|---|---|
bronch_pecarn | Effectiveness of oral dexamethasone for acute bronchiolitisintra-abdominal injury requiring acute intervention before CT | 600 patients, 50% control | 📄, 🔗 | ❌ |
gastro_pecarn | Impact of Emergency Department Probiotic Treatment of Pediatric Gastroenteritis | 886 patients, 50% control | 📄, 🔗 | ❌ |
Research paper 📄, Data download link 🔗
Reference
Background reading
- Be familiar with the imodels: package
- See the TRIPOD statement on medical reporting
- See the Veridical data science paper
Related packages
- imodels: rule-based modeling
- veridical-flow: stability-based analysis
- gplearn: symbolic regression/classification
- pygam: generative additive models
- interpretml: boosting-based gam
Updates
- For updates, star the repo, see this related repo, or follow @csinva_
- Please make sure to give authors of original datasets appropriate credit!
- Contributing: pull requests very welcome!
Related open-source collaborations
- The imodels package maintains many of the rule-based models here
- Inspired by the BIG-bench effort.
- See also NL-Augmenter and NLI-Expansion
Expand source code
"""
.. include:: ../readme.md
"""
import os
from os.path import join as oj
MRULES_PATH = os.path.dirname(os.path.abspath(__file__))
REPO_PATH = os.path.dirname(MRULES_PATH)
DATA_PATH = oj(REPO_PATH, 'data')
PROJECTS_PATH = oj(MRULES_PATH, 'projects')
AUTOGLUON_CACHE_PATH = oj(DATA_PATH, 'autogluon_cache')
Sub-modules
rulevetting.api
rulevetting.projects
rulevetting.templates