BGI-Cloud

Exploiting powerful cloud computing and storage capabilities, BGI Cloud provides next-generation sequencing data storage and analyses to users who need to process large amounts of data while avoiding problems associated with data storage, delivery, and analysis. We help you set up a personalized bioinformatics platform on the cloud. With the high-performance bioinformatics software and larger number of databases available on cloud platforms, researchers can capture desired target information much more quickly and effectively.

The process of off-line delivery is complicated, and data often need to be uploaded or downloaded repeatedly. With cloud-based delivery service, these problems can be easily overcome. Based on this platform, data are delivered to customers on the network and analyzed online, and the results are shown in the browser. Therefore, customers only need to download a small set of results files whenever they need them.

BGI Cloud is dedicated to delivering an easy-to-use, flexible, “one-stop” intelligent system that efficiently processes massive datasets for global basic research and clinical application in “trans-omics” fields such as genomics, transcriptomics, proteomics, and metabolomics.

Comprehensive bioinformatics solution

BGI_cloud_figure_1

Building on a tremendous breadth of experience acquired in bioinformatics in recent years, BGI Cloud offers comprehensive bioinformatics solutions, including hardware, software, and customer service, which includes professional maintenance and training (Figure 1). Depending on the circumstances, the hardware can be placed on the cloud, in a local lab, or both as a mixed system.

BGI Cloud provides a variety of bioinformatics software, databases, and professional maintenance and training services to satisfy different needs. Software and database grouping is customized for different research needs. For global basic research, the group contains many common software programs and databases in different fields (e.g., evolution analysis). For special research in bioinformatics (e.g., epigenomics studies), the software and databases are more personalized and targeted to a given field. For special research in applied fields (e.g., basic medical sciences), a robust pipeline or a well-designed solution plan is provided (e.g., from QC to mapping, SNP/CNV calling, etc.).

Cloud-based platform for analysis

BGI Cloud provides a web-based platform for users to perform bioinformatics analyses. The incorporation of multiple well-recognized databases and fully integrated data analysis workflows offers a great opportunity for users to analyze their data much more thoroughly. The well-designed and intuitive interface saves users from dealing with a terminal interface filled with complicated code and provides a convenient data management system. BGI Cloud provides a simple and convenient user experience that makes bioinformatics analysis easier to achieve.

Cloud-based data delivery

Cloud-based data delivery improves the timeliness of data transmission, and lays the foundation for analysis in the cloud as well.

Species database construction

BGI Cloud offers construction of species databases in addition to sequencing and de novo genome assembly services. The species database displays multilevel information for a species, from genome to transcriptome and proteome. The species database also provides other key features, such as sequence search and data and results visualization capabilities. The constructed database is a complete research platform that facilitates rapid and efficient species-specific data analysis.

database_construction

BGI Cloud has successfully constructed databases for a large number of species, including panda, silkworm, monkey, millet, snapdragon, oyster, and pestalotiopsis (Figure 2). Simultaneously, BGI Cloud has constructed many human comprehensive databases, such as the cancer genome database (cancerdb.genomics.org.cn).

Core technologies

High-speed data exchange

The high-speed data exchange technology Aspera faspTM is ~10-100 times faster than conventional FTP, accelerating your analysis and making it easier to send and receive genomics data from anywhere in the world. It truly makes “Big Genomics Data” exchange a reality.

Flexible Green Cloud computing

The flexible green cloud computing framework leverages Apache™ Hadoop™ map reduce open-source software, which enables flexible and efficient distribution of computational tasks across a cluster of computer nodes. Built on this framework, SOAPhecate and SOAPgaea, two tools for genome de novo assembly and resequencing analysis, respectively, employ extremely effective algorithms for genome assembly, alignment, and variant detection, considerably reducing computing resource costs and running time. For example, with 96 Hecate cores, the genome coverage increases by 84% in 42 hours, resulting in a savings of 28 hours and a price reduction of over 30% compared with SOAPdenovo running on a single server.

High-performance GPU computing

High-performance graphics processing unit (GPU) computing is used to process large-scale bioinformatics datasets that involve complicated, intensive computation. GPU computing not only reduces hardware cost, energy consumption, and data center space, but it also achieves much higher performance and requires fewer research cycles than traditional cloud computing. GPU-based computing performance surpasses that of CPU-based computing, as shown in the figure below.
figure_GPU_computing

Data security

BGI has passed ISO27001 certification and offers customer data protection through its comprehensive data management system and rigorous encryption of account information, as well as portable devices (Figure 4).
data security