Bioinformatics Software

BGI’s mission is to provide a variety of bioinformatics tools that meet the needs of researchers carrying out large-scale biological analyses and or analyzing large-scale genome sequencing data. BGI’s computational biologists have developed software for a variety of different applications, several of which are incorporated in the BGI software platform. Some of BGI’s proprietary software include:


SOAP (Short Oligonucleotide Alignment Program) – For gapped and ungapped alignment of short oligonucleotides onto reference sequences. This program was designed to handle the huge numbers of short reads generated by parallel sequencing using the next generation Illumina-Solexa sequencing technology.

EXON_CAPTURE_PIPELINE – Whole-genome exon trapping analysis software.

ReAS – Software to recover ancestral sequences for transposable elements using unassembled reads from a whole genome shotgun sequencing.

RePS (repeat-masked Phrap with scaffolding) – RePS is a WGS sequence assembler, it identifies repeated kmer sequences and deletes WGS sequence prior to assembly. The established software Phrap is used to compute meaningful error probabilities for each base. Clone-end-pairing information is used to construct scaffolds that order and orient the contigs. The updated version of RePS incorporates some of the ideas introduced by Phusion on clustering.

Maq (Mapping and Assembly with Quality)This builds assemblies by mapping short reads to reference sequences. Maq was previously known as mapass2. More information is available here.

Calling Variations

SOAPsnp – SOAPsnp is a member of the SOAP (Short Oligonucleotide Analysis Package). This program is a resequencing utility that can assemble consensus sequence for the genome from newly sequenced individuals based on the alignment of raw sequencing reads on a known reference. The SNPs can then be identified on the consensus sequence through comparison with the reference.

SOPAdenovo – SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program was specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective manner.

Solar – This processes the alignment results, especially for intron alignment 

Annotations & Variants

MIEREAP – This is used to identify both known and novel microRNAs from small RNA libraries that were deeply sequenced using Illumina-Solexa/454/Solid technology

FGF (Fishing Gene Family) – This finds gene families, plots phylogenetic trees, and gives gene duplication evolutionary information.

Reporting & Visualization

SVBP – This provides reliability tests and results Visualization for sequence assembly

WEGO (Web Gene Ontology Annotation Plot) – Web Gene Ontology Annotation Plot is a useful tool for plotting GO annotation results especially for Comparative genomics 

Data Comparison

HIBAIS – Ancestor deduction software based on HapMap.

SOLEXA-MRNATAG_PIPELINE – Digital gene expression software based on Illumina-Solexa sequencing data

CAT (Cross-species Alignment Tool) – Allows mRNA sequence and mammalian genome alignment across species

KaKs_Calculator – This calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates. More information is available here.