
What Data Does deciphEHR Have?
Available Data
1. Whole Genome Sequencing (WGS)
- All participants have undergone genomic sequencing. Our current data is derived from imputed low-coverage WGS, which offers high-quality genotypes for Single Nucleotide Variants (SNVs) with an allele frequency of 0.1%-0.5% and higher, or 5%-10% and higher for insertions or deletions (INDEL), resulting in high quality calls for 111 million variants.
- We are currently working on sequencing high-coverage WGS and long-read sequencing for a subset of the samples and those will be increasingly available with time.
2. Clinical Data
- Our dataset includes de-identified longitudinal clinical data. This data is enables linking genotypes with the progression of diseases and the impact of various treatments over time.
- For clinical data not included in our de-identified dataset, researchers will need to apply for their own independent IRB protocol.
Universal Consent
Our data are acquired under the Universal Consent protocol, an umbrella consent protocol and form initiated by the Center for Biospecimen Research & Development (CBRD) at NYU Langone Health. Clinical blood count test tubes remain in the lab for a few days for clinical needs (e.g., repeat the test) and then are routinely discarded. For consented participants, we collect the residual blood instead of discarding them.
Accessing the Data
To access the deciphEHR dataset, researchers must apply through our "Apply for Access" page. Only NYU Langone researchers and clinicians are eligible to apply.