Health Tech Databases
In order to participate fully in NYU Langone’s Health Tech Datathon, access to the Massachusetts Institute of Technology (MIT) Laboratory for Computational Physiology’s (LCP) databases and a functioning analysis environment are crucial. Below are steps to access data and a short primer on working with the Medical Information Mart for Intensive Care (MIMIC-III).
MIMIC-III contains data associated with 53,423 distinct hospital admissions to the critical care units of the Beth Israel Deaconess Medical Center from 2001 to 2012. Data are highly granular containing regular physiologic monitoring, all deidentified laboratory results, administered treatments ranging from infusions to ventilator settings, deidentified clinical notes, and more.
See more information about MIMIC-III and read our article “MIMIC-III, a freely accessible critical care database” in Scientific Data, which documents the MIMIC-III scope and structure.
Requesting Access to the MIMIC-III Database
To work with these data you must first complete the CITI course Data or Specimens Only Research. You should register “Massachusetts Institute of Technology Affiliates” as your affiliation. Upon completion of the course, you will receive a certificate that you should save.
Next, create an account on PhysioNet.
After you have an account, you can submit your CITI certificate. Approval takes several business days. See more information on the process.
Setting Up the Database
The raw data may be downloaded in compressed, comma-separated value format directly from PhysioNet after your access has been approved.
The LCP recommends loading the data into a database engine, namely PostgreSQL. PostgreSQL has versions for Windows, Mac OS X, and Linux. It is freely available and open-source.
Other database engines may be used, and the LCP provides scripts for setting up PostgreSQL as well as a number of other popular choices; for the purposes of these instructions we will assume PostgreSQL was chosen.
Following installation of the database engine, these scripts can be used to load the downloaded raw data into the database, and to build indexes for fast querying.
Detailed instructions for installing MIMIC-III on PostgreSQL exist on the MIMIC website:
Working with MIMIC-III
There are various ways to interface with MIMIC-III. Two of the most popular are with Jupyter for Python and RStudio for R.
RStudio may be installed for free and Jupyter/Python can be installed using Anaconda or directly installed with an existing Python installation using pip.
Both R and Python provide rich libraries for interacting with SQL databases, and after data have been extracted, both provide powerful platforms for analysis, visualization, and model building.
The general workflow is to define your question, identify the necessary variables, write SQL code to extract the data, and then clean, explore, visualize, and model with R or Python.
See R and Python notebooks containing projects from start to finish as examples.