de novo Sequencing 

de novo sequencing provides the first genome sequence of an organism. The advent of next-generation Sequencing (NGS) technology produces an enormous volume of data at a relatively low cost (in some cases, more than one billion short reads per instrument run), has opened the way for researchers to obtain whole genome data for organisms previously considered to be a lower priority. Having this variety whole genome data available has allowed large-scale genomic studies to be performed that were unimaginable just a few years ago. As one of the leading research institutes in genomics research, BGI has sequenced several plant and animal genomes, including rice, silkworm, cucumber, and panda, with the latter two using only NGS technology.

Workflow

Bioinformatics

For bioinformatics analyses, BGI uses its in-house software SOAPdenovo to carry out de novo assembly for all newly sequenced genomes (see below). 

 

Short read sequence assembly

Standard Data Analysis Includes:

  1. Basic genome information: genome size, GC content, average heterozygosity, repeat information
  2. Genome sequencing results: sequencing data and handling including image analysis, base calling and sequence analysis, and sequencing data summary
  3. Genome assembly results: contig size, contig number, scaffold size, and scaffold number (from N50 to N90)
  4. Genome assembly results evaluation: euchromatic region coverage, gene region coverage, sequencing depth, and genome GC content analysis
  5. Genome annotation results: repeats analysis and annotation, protein-coding gene annotation (including gene structure prediction and gene function annotation), non-coding RNA gene annotation (including annotation of microRNA, tRNA, rRNA, and other ncRNA), and transposon and tandem repeats annotation
  6. Comparative genomics and evolution analysis: chromosome structure variation detection in specific genome regions, specific gene detection, rapidly evolving gene detection, synteny block detection, and gene family analysis