Whole Genome Resequencing

A primary use of NGS technology is resequencing human genomes. The ability of NGS to sequence the whole genome has allowed large-scale comparative studies to be performed to understand how genetic differences affect health and disease. BGI utilizes short-, paired-end reads, and different insert-sizes for re-sequencing to achieve the most comprehensive detection and analysis of genomic variations.

Workflow

Genomic DNA are extracted from a sample and then randomly fragmented. Fragments of a desired length are gel-purified, and adapter ligation and DNA cluster preparation are performed and subjected to Illumina GA paired-end sequencing (see figure below). In addition to obtaining sequence information, this paired-end strategy allows the identification of large structural variations and reduces the influence of repeat sequences on the assembly process.

Paired-end sequencing

Bioinformatics

Comparison of different genome sequences allows the detection and annotation of genomic variations, including single nucleotide polymorphisms (SNPs), insertions and deletions (indels), and various structural variations (SVs). In addition, having a variety of resequenced genomes allows evolutionary analyses to be performed at the population level.

Bioinformatics analysis in whole-genome re-sequencing

Data analysis includes:

  1. Basic bioinformatics analysis: includes a summary of data production and distribution of per-base sequencing depth and coverage.
  2. Assembly of consensus sequences: based on the alignment with a reference sequence, under a Bayesian model, the genotype with the highest probability at a locus can be identified for the individual sequencing sample.
  3. SNP detection and distribution: based on the consensus sequence, the polymorphic loci between the identified genotype and the reference can be filtered and a high fidelity SNP dataset generated. A summary of SNPs present in coding and untranslated regions will be provided to allow better assessment of which SNPs are likely to cause alterations in function. The SNPs can be further annotated given available gene annotation information. 
  4. Indel detection and distribution: after aligning the short reads of each individual with the reference genome, a comparison of the relationships between paired-end sequences allows detection of short indels.
  5. Structural variation detection and distribution: structural variations that can be detected include deletions, duplications, inversions, transposition events, and others. These can be further annotated given available gene annotation information.