PheWAS - phenome-wide association studies

Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks.  At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes.  Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies.  This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS)

PheWAS using ICD9 codes
Our EMR-based PheWAS uses a custom-developed grouping of International Classification of Disease, 9th edition (ICD9) codes.  These grouping loosely follow the 3-digit (category) and section groupings defined with the ICD9 code system itself, but vary to include, for example, all hypertension codes (401-405) as one grouping.  Each custom PheWAS code group also has an associated control group that excludes other related conditions (e.g., a patient with Graves disease cannot be a control for thyroiditis).  
Our original PheWAS using ICD9 codes replicated previously known gene-disease associations for 4/7 diseases (see publication) using records from BioVU, the Vanderbilt DNA biobank.  Replicated associations included multiple sclerosis, rheumatoid arthritis, Crohn's disease, and ischemic heart disease. The original PheWAS had 744 clinical case groups.   

The files necessary to perform PheWAS are available below:

  • code translation file: This file groups ICD9 codes into "phewas codes" of like ICD9 codes. It also defines control ranges ("phewas_exclude_range") for each "phewas code".
  • A PERL script that takes as its input tab-delimited genotype files, a file containing all ICD9 files for an individual, and a file with race and gender for each individual. It has various options available in the header of the file.

UPDATE 5/2013
  • code_translation_updated.txt: This file contains the latest PheWAS code groupings (~1600 code groups), now arranged hierarchically.  A Boolean value "rollup" defines whether the code can be rolled-up to the parent number above it (e.g., "427.3" can be rolled up to "427").  Note: Rollup functionality is not supported in the PERL script currently available.  Please use the PheWAS R package that supports the newest hierarchy as well as provide graphing options.


Creative Commons License
PheWAS by Josh Denny, MD MS is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Key references:
  1. Denny JC, Ritchie MD, Basford M, Pulley J, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Mar 24. PMID: 20335276
  2. Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011 Oct 7;89(4):529-42.
  3. Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N; Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) QRS Group, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013 Apr 2;127(13):1377-85

Other PheWAS links
 on this site