Exome Sequencing
Technical Information
Contact Us / Wish List

Exome sequencing selectively targets the most functionally relevant DNA sequences that encode proteins, allowing the identification of novel genes associated with both Mendelian disorders and common diseases.

BGI is highly experienced in exome sequencing and analysis for human, plants, and animals. To date, we have sequenced more than 45,000 human exomes.  In addition, we provide mouse exome sequencing service using Agilent Mouse All Exon kit and developed the exome sequencing platform for monkeys based on our research from the Chinese rhesus macaque and Cynomolgus macaque genome projects.


  1. Rich experience in exome sequencing
  2. Rapid turnaround at our Hong Kong and CHOP (Philadelphia, PA) facilities
  3. High quality data (See examples of data generated by BGI)
  4. Affordable pricing
  5. Strong analytical capabilities by 1,000 bioinformaticians (Advanced analysis available for Mendelian disorder, complex disease, cancer, and population genetics research)

Customer Testimonial:

"Knome has been working with BGI for well over three years. Specifically, BGI has been resequencing whole human genomes and exomes for us and it is work they perform exceedingly well. Partnering with BGI has been a positive experience for us and we highly recommend them."
-Ari Kiirikki, VP of Knome

BGI has successfully completed numerous exome sequencing projects, including a Danish study of 1000 patient samples and 1000 controls with the aim of finding rare SNPs associated with metabolic disorders such as obesity and hypertension.

Frequent Mutations of Genes Encoding Ubiquitin-mediated Proteolysis Pathway Components in Clear Cell Renal Cell Carcinoma. Nature Genetics. 44:17-9 (2012).

Clear cell renal cell carcinomaThe research sequenced whole exomes of ten clear cell renal cell carcinomas (ccRCCs) and performed a screen of ~1,100 genes in 88 additional ccRCCs. Frequent mutations were detected in the ubiquitin-mediated proteolysis pathway (UMPP). The findings highlight the potential contribution of UMPP to ccRCC tumorigenesis through the activation of the hypoxia regulatory network.

Exome Sequencing Identifies NMNAT1 Mutations as a Cause of Leber Congenital Amaurosis. Nature Genetics. 44:972-4 (2012).

The exome of an individual with Leber congenital amaurosis(LCA) was sequenced and identified nonsense (c.507G>A, p.Trp169*) and missense (c.769G>A, p.Glu257Lys) mutations in NMNAT1,which encodes an enzyme in the nicotinamide adenine dinucleotide (NAD) biosynthesis pathway.It is implicated in protection against axonal degeneration. We also found NMNAT1 mutations inten other individuals with LCA, all of whom carry the p.Glu257Lys variant.

Single-Cell Exome Sequencing Reveals Single-Nucleotide Mutation Characteristics of a Kidney Tumor. Cell. 148:886-95 (2012).

singlecellTo better understand the intratumoral genetics underlying mutations of ccRCC, single-cell exome sequencing was carried out on a clear cell renal cell carcinoma (ccRCC) and its adjacent kidney tissue. The pilot study demonstrates that ccRCC may be more genetically complex than previously thought and provides information that can lead to new ways to investigate individual tumors, with the goal of developing more effective cellular targeted therapies.

An Integrated Map of Genetic Variation from 1,092 Human Genomes. Nature. 491:56-65 (2012).

By characterizing the geographic and functional spectrum of human genetic variation, the 1000Genomes Project aims to build a repository that can help understanding the genetic contribution to disease. Up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations were captured, enabling analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude.Science. 329, 75 (2010).


50 exomes of ethnic Tibetans were sequenced for 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This research can help us to prevent and cure the disease of plateau anoxia.


Standard Bioinformatics Analysis

  1. Data filtering (removing adaptors, contamination, and low-quality reads from raw reads)
  2. Align reads to the human reference genome (UCSC build HG19) using BWA software
  3. Assessment of sequencing quality
  4. SNP and InDel calling
  5. SNP and InDel annotation (annotate each SNP to the corresponding gene functional units in RefGene database, including nucleotide and amino acid changes)
  6. SNP and InDel validation and comparison with dbSNP database, 1000 Genomes Project database, publicly available exome databases (ESP), and YH
  7. SNP function prediction and sequence conservation analysis
  8. Statistic analysis of SNPs and InDels in each functional unit

General Advanced Analysis

  1. Non-coding SNP calling, annotation, and statistical analysis
  2. Non-coding InDel calling, annotation, and statistical analysis

Population Genetics Advanced Analysis

  1. Population-level SNP calling, annotation, and statistical analysis
  2. Quality control (QC) for population SNPs
  3. Genotype imputation based on reference panel
  4. Sample QC
  5. Population structure analysis
  6. Selection signal detection, especially for recent selection events, based on iHS and XP-EHH tests with validation using DDAF, Fst, and Tajima's D methods
  7. Pathway analyses for the candidate selected genes
  8. Haplotype analysis
  9. Population history inference

Cancer Advanced Analysis

  1. Preliminary identification of the paired tumor-normal samples based on MassARRAY (this service item is recommended prior to sequencing)
  2. Somatic SNV calling, annotation, and statistical analysis
  3. Somatic InDel calling, annotation, and statistical analysis
  4. Somatic exonic CNV detection from the paired tumor-normal samples using ExomeCNV software
  5. Somatic SNV/InDel annotation against the COSMIC database
  6. SNV function prediction and sequence conservation analysis
  7. Non-synonymous annotation for mutated genes against the CancerGeneCensus database

Complex Disease Advanced Analysis

  1. Sample design and power calculation (during the project design stage)
  2. Population-level SNP calling, annotation, and statistical analysis
  3. Quality control (QC) for population SNPs
  4. LD-based genotype calling
  5. Sample QC
  6. Single site SNP association test

Mendelian disorders Advanced Analysis

Please contact technical support(bgiseq_MD@genomics.org.cn) for details.

Exome Capture Arrays:

BGI currently has the capacity to process 800 samples of exome capture per week. There are mainly two exome capture strategies which both deliver a high-level performance and substantial savings on sequencing. Both NimbleGen SeqCap EZ (Biotinylated DNA oligonucleotide probes) and Agilent SureSelect system (Biotinylated RNA probes) can capture all exons in solution via a simple, scalable workflow and stringent built-in quality controls.

Exome capture arrays that we perform are as follows:

Human Exon Capture Array
Design Capture Targets (Mb) (Regions Covered by Probes) Database Used to Select Primary Targets

Agilent SureSelect Human All V3


CCDS Sep 2009 + miRBase V14 + GENCODE + Sanger

Agilent SureSelect Human All V4


CCDS Mar 2011 + miRBase V17 + GENCODE + RefSeq Mar 2011

Agilent SureSelect Human All V4+UTRs


CCDS Mar 2011 + miRBase V17 + GENCODE + RefSeq Mar 2011

NimbleGen SeqCap EZ Exome V2.0


CCDS Sep 2009 + miRBase V14, Sep 2009 + RefSeq Jan 2010

NimbleGen SeqCap EZ Exome V3.0


CCDS Apr 2011 + miRBase V15 + GECODE+RefSeq Jun 2011

Model Animals Capture Array
Exome capture kit Insert size

Agilent SureSelect Mouse All Exon

50 Mb

150-200 bp

NimbleGen Monkey All Exome

50 Mb

200-300 bp

Sample Requirements:

For the genomic DNA samples:

  1. Purity: OD260/280=1.8-2.0, without degradation and RNA contamination
  2. Sample concentration: ≥ 37.5ng/μl
  3. Quantity: 1μg (2.5μg gDNA recommended)

Turnaround Time:

The standard turnaround time can be as fast as~40 working days for exome sequencing of 100 samples with 50X coverage. This includes library construction, sequencing, and standard bioinformatics analysis.