Next-generation GWAS for complex disease research

Genome-wide association studies (GWAS), using tag SNPs in genome to analyze their association with diseases, follow a hypothesis-free approach and interrogate the majority of common SNPs across the human genome. It is designed to identify possible genetic variants that contribute to complex diseases. In the past five years, more than 100 complex diseases and traits have been studied by GWAS and numerous susceptibility genes/loci were identified.

However, the large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases[1]. One explanation is that many rare variants (a minor allele frequency, MAF<5%), which are not included in the common genotyping platforms, contribute substantially to the genetic variation of these diseases[2]. Recently, exome sequencing of 200 individuals from Denmark[3] uncover more deleterious rare variants than expected, which also support that much of the heritable variation affecting fitness is caused by low-frequency mutations, which are often overlooked in the studies based on genotyping but not resequencing.

Finding the missing heritability by Next-Generation GWAS

Next-generation GWAS is the next-generation sequencing based GWAS, which has the advantage of uncovering novel causative genetic mutations of human diseases through the combination of high-throughput sequencing and genotyping. Massively parallel sequencing of exome and targeted regions (which has been found by previous GWAS) are two promising and effective approaches to find missing heritability of complex diseases, by capturing more valuable data beyond common SNPs.

There are two novel strategies based on next-generation GWAS to discover novel and low-frequency causative genetic mutations associated with human complex diseases.

Protocol I —Exome Sequencing & Genotyping validation

At the first stage of this two-stage design, we suggest applying exome sequencing of hundreds of cases and hundreds of controls to select the associated SNPs by allele frequency estimation. At the second stage, validate the best candidate SNPs selected from the first stage by genotyping in a larger set of individuals. This protocol is cost-effective and has the potential to detect rare SNPs that would not be captured by any of the major genotyping platforms.

Protocol II —Genome genotyping & Target region sequencing

At the first stage, a genome-wide genotyping is used to scan the case and control samples to obtain the candidate loci. At the second stage, using designed chip to capture these candidate loci or targeted regions, then sequencing the targeted regions in large-scale samples to verify these candidate loci, so as to identify disease-associated mutations.


[1]       Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature,2009 Oct; 461(7265):747-53.

[2]       McClellan J, King MC. Genetic heterogeneity in human disease. Cell, 2010 Apr; 141(2):210-7.

[3]       Yingrui Li, Nicolas Vinckenbosch, Geng Tian, et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet, 2010 Nov; 42(11): 969-72.