PheWAS - phenome-wide association studies


Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks.  At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes.  Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies.  This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS)


PheWAS using ICD9 codes
Our EMR-based PheWAS uses a custom-developed grouping of International Classification of Disease, 9th edition (ICD9) codes.  These grouping loosely follow the 3-digit (category) and section groupings defined with the ICD9 code system itself, but vary to include, for example, all hypertension codes (401-405) as one grouping.  Each custom PheWAS code group also has an associated control group that excludes other related conditions (e.g., a patient with Graves disease cannot be a control for thyroiditis).  
 
Our original PheWAS in 2010 using ICD9 codes replicated previously known gene-disease associations for 4/7 diseases (see publication) using records from BioVU, the Vanderbilt DNA biobank.  Replicated associations included multiple sclerosis, rheumatoid arthritis, Crohn's disease, and ischemic heart disease. The original PheWAS had 744 clinical case groups.   

2013 study using this revised model with 1645 phenotypes arranged hierarchically analyzed 3144 SNPs, replicated 210/751 associations (including 66% of those with adequate sample site), and noted 63 new, potentially pleiotropic associations.  See http://phewascatalog.org for an online catalog of these results.

The files necessary to perform PheWAS are available below:

  • code translation file (original 2010 PheWAS): This file groups ICD9 codes into "phewas codes" of like ICD9 codes. It also defines control ranges ("phewas_exclude_range") for each "phewas code".
  • phewas.pl: A PERL script that takes as its input tab-delimited genotype files, a file containing all ICD9 files for an individual, and a file with race and gender for each individual. It has various options available in the header of the file.

UPDATE 5/2013
  • code_translation_updated.txt: This file contains the latest PheWAS code groupings (~1600 code groups), now arranged hierarchically.  A Boolean value "rollup" defines whether the code can be rolled-up to the parent number above it (e.g., "427.3" can be rolled up to "427").  Note: Rollup functionality is not supported in the PERL script currently available.  Please use the PheWAS R package that supports the newest hierarchy as well as provide graphing options.

Link to R PheWAS Package
 - this is preferred method to run PheWAS currently since it allows for adjustment and supports the hierarchical model of the PheWAS codes.


 

Creative Commons License
PheWAS by Josh Denny, MD MS is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.


Key references:
  1. Denny JC, Ritchie MD, Basford M, Pulley J, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Mar 24. PMID: 20335276
  2. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study dataNat Biotechnol. 2013 Dec;31(12):1102-10
  3. Carroll RJ, Bastarache L, Denny JCR PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014 Aug 15;30(16):2375-6


Selected other publications using PheWAS

  1. Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011 Oct 7;89(4):529-42.
  2. Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N; Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) QRS Group, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013 Apr 2;127(13):1377-85
  3. Cronin RM, Field JR, Bradford Y, Shaffer CM, Carroll RJ, Mosley JD, Bastarache L, Edwards TL, Hebbring SJ, Lin S, Hindorff LA, Crane PK, Pendergrass SA, Ritchie MD, Crawford DC, Pathak J, Bielinski SJ, Carrell DS, Crosslin DR, Ledbetter DH, Carey DJ, Tromp G, Williams MS, Larson EB, Jarvik GP, Peissig PL, Brilliant MH, McCarty CA, Chute CG, Kullo IJ, Bottinger E, Chisholm R, Smith ME, Roden DM, Denny JC. Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index. Front Genet. 2014 Aug 5;5:250.
  4. Shameer K, Denny JC, Ding K, Jouni H, Crosslin DR, de Andrade M, Chute CG, Peissig P, Pacheco JA, Li R, Bastarache L, Kho AN, Ritchie MD, Masys DR, Chisholm RL, Larson EB, McCarty CA, Roden DM, Jarvik GP, Kullo IJ. A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects. Hum Genet. 2014 Jan;133(1):95-109.
  5. Liao KP, Kurreeman F, Li G, Duclos G, Murphy S, Guzman R, Cai T, Gupta N, Gainer V, Schur P, Cui J, Denny JC, Szolovits P, Churchill S, Kohane I, Karlson EW, Plenge RM. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum. 2013 Mar;65(3):571-81


Other PheWAS links
 on this site