PheWAS - phenome-wide association studies

Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks.  At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes.  Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies.  This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS)

PheWAS using ICD9 codes
Our EMR-based PheWAS uses a custom-developed grouping of International Classification of Disease, 9th edition (ICD9) codes.  These grouping loosely follow the 3-digit (category) and section groupings defined with the ICD9 code system itself, but vary to include, for example, all hypertension codes (401-405) as one grouping.  Each custom PheWAS code group also has an associated control group that excludes other related conditions (e.g., a patient with Graves disease cannot be a control for thyroiditis).  
Our original PheWAS in 2010 using ICD9 codes replicated previously known gene-disease associations for 4/7 diseases (see publication) using records from BioVU, the Vanderbilt DNA biobank.  Replicated associations included multiple sclerosis, rheumatoid arthritis, Crohn's disease, and ischemic heart disease. The original PheWAS had 744 clinical case groups.   

2013 study using this revised model with 1645 phenotypes arranged hierarchically analyzed 3144 SNPs, replicated 210/751 associations (including 66% of those with adequate sample site), and noted 63 new, potentially pleiotropic associations.  See for an online catalog of these results.

The files necessary to perform PheWAS are available below:

  • code translation file (original 2010 PheWAS): This file groups ICD9 codes into "phewas codes" of like ICD9 codes. It also defines control ranges ("phewas_exclude_range") for each "phewas code".
  • A PERL script that takes as its input tab-delimited genotype files, a file containing all ICD9 files for an individual, and a file with race and gender for each individual. It has various options available in the header of the file.

UPDATE 5/2013
  • code_translation_updated.txt: This file contains the latest PheWAS code groupings (~1600 code groups), now arranged hierarchically.  A Boolean value "rollup" defines whether the code can be rolled-up to the parent number above it (e.g., "427.3" can be rolled up to "427").  Note: Rollup functionality is not supported in the PERL script currently available.  Please use the PheWAS R package that supports the newest hierarchy as well as provide graphing options.

Link to R PheWAS Package
 - this is preferred method to run PheWAS currently since it allows for adjustment and supports the hierarchical model of the PheWAS codes.


Creative Commons License
PheWAS by Josh Denny, MD MS is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Key references:
  1. Denny JC, Ritchie MD, Basford M, Pulley J, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Mar 24. PMID: 20335276
  2. Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011 Oct 7;89(4):529-42.
  3. Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N; Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) QRS Group, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013 Apr 2;127(13):1377-85
  4. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study dataNat Biotechnol. 2013 Dec;31(12):1102-10

Other PheWAS links
 on this site