phenotyping


Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks.  At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes.  Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies.  This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS)

We replicated known genetic associations for five diseases. We genotyped the first 10,000 samples accrued into BioVU (the Vanderbilt EMR-associated DNA biobank) for twenty-one loci were associated with five common diseases (reported odds ratios 1.14-2.36) in at least two previous studies. We developed automated phenotype identification algorithms that used NLP techniques (to identify key findings, medication names, and family history), billing code queries, and structured data elements (such as laboratory results) to identify cases (n=70-698) and controls (n=808-3818).


About our team:

We are a team of investigators that seek to advance the basic informatics methods, and to apply advanced methods to "understand" unstructured, and sometimes inaccurate, biomedical text and electronic medical record data. 

For basic science applications, we are primarily interested in natural language processing methods, terminology development, and integration of disparate data sources (such as coded and free text sources). 

Syndicate content