DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

Size: px
Start display at page:

Download "DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing"

Transcription

1 TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence to that of a known reference genome. Re-sequencing of the plant and animal genome will identify genetic variations such as SNPs and Indels and discover other genetic changes of the sequenced species. It has been used for the identification of functional genes and markers of important traits to facilitate molecular breeding and to improve agricultural production and conservation. We care for your samples from the start through to the result reporting. Highly experienced laboratory professionals follow strict quality procedures to ensure the integrity of your results. BGI Plant and Animal Whole Genome Re-Sequencing services are executed with the sequencing technology. Sample Preparation and Services Library preparation 100bp paired-end sequencing Raw data, standard and customized data analysis Available data storage and bioinformatics applications Guaranteed 90% of clean bases with quality score of Q20 Standard sequencing coverage of 10-30X is recommended for the study of individuals and 5-10X for population studies T Typical 40 working days from sample QC acceptance to filtered raw data availability Expedited services are available.contact your local BGI specialist for details

2 Sequencing Technology is an innovative high-throughput sequencing solution, developed by BGI s Complete Genomics subsidiary in Silicon Valley. The system is powered by combinatorial Probe-Anchor Synthesis (cpas), linear isothermal Rolling-Circle Replication and DNA Nanoballs (DNB TM ) technology, followed by high-resolution digital imaging. The combination of linear amplification and DNB technology reduces the error rate while enhancing the signal. The size of the DNB is controlled in such a way that only one DNB is bound per active site on the flow cell. This densely patterned array technology provides optimal sequencing accuracy and increases flow cell utilization. Besides clean data output, BGI offers a range of standard and customized bioinformatics pipelines for your whole genome re-sequencing project. Reports and output data files are delivered in industry standard file formats: FASTQ, BAM, VCF,.xls,.png Advanced Analysis Data Filtering Alignment Statistics of data production Assembly of consensus sequences SNP calling, annotation and statistics Variation detection (Indel, SV, CNV) Population evolution analysis Point mutation detection (wild vs. mutant) Point mutation detection (MutMap) Linkage map construction and QTL mapping HapMap construction GWAS analysis BSA analysis Further customization of bioinformatics analysis to suit your unique project is available. Please contact your BGI technical representative for details.

3 We can process your gdna samples from a variety of species, with the following general requirements: Regular Samples Low Input Samples DNA Amount and Concentration Intact genomic DNA 1µg, Concentration 12.5ng/µl Intact genomic DNA 200ng, Concentration 2.5ng/µl Minimum Sample Volume 15µl 15µl 1 soybean sample and 1 mouse sample were used to validate the sequencing technology. sequencing datasets from 2 technical replicates of each species were generated and compared to datasets from the Illumina 4000 sequencing system. Libraries were prepared according to the manufacturer s protocols. The sequencing technology generated high quality PE100 data that is comparable to that from the 4000 (Table 1). At similar sequencing output between different platforms of each species, over 95% of bases score above Q20 while the genome mapping rate and the unique mapping rate of data and data are at the comparative level. Notably, the clean data rates of all replicates are higher than those of replicates, generating more filtered bases for the following analysis. Moreover, the duplication rates of reads of both species are constantly lower than those of reads. Since the duplicated reads are usually skipped for downstream variant analysis, the will generate more valid data for variant analysis than the 4000 at the same data output. Table 1. Data quality for soybean and mouse samples (averages from 2 technical replicates). Species Soybean Mouse Sequencer Raw data amount (Gb) Clean data amount (Gb) Clean rate (%) Clean read Q20 (%) Mapping rate (%) Unique mapping rate (%) Duplication rate (%) Mismatch rate (%) Average sequencing depth Coverage (%) Coverage at least 4X (%) Coverage at least 10X (%) Coverage at least 20X (%)

4 Mouse sample data performance 1 mouse liver DNA sample was sequenced by both and 4000 systems. 2 technical replicates for and 2 for are included in the experiment. Both and data show high mapping rates, while less bias is found in the read distribution of datasets at low GC content (Fig. 1, 2 and 3). Moreover, duplication rates are significantly lower (Fig. 1), due to less PCR bias from s rolling circle replication, resulting in more uniform distribution of reads for variant analysis. 100% 80% 60% 40% 20% _1 _2 0% Mapping Rate Unique Mapping Rate Duplication Rate Figure 1. Mapping rate and duplication rate of 2 datasets and 2 datasets with comparable data output of 70Gb. 100% 80% 60% 40% 20% _1 _2 0% Coverage 1x Coverage 4x Coverage 10x Coverage 20x Figure 2. Sequencing coverage of 2 datasets and 2 datasets with comparable data output of 70Gb. Figure 3. Normalized read coverage by GC content (duplication reads excluded). It shows a less biased distributionof reads at low GC content when compared to read distribution.

5 The demonstrated equivalent variant calling reproducibility to, with 93.63% and 76.64% for SNP and Indel calling, respectively. The concordant rates of SNP and Indel calling between platforms were 91.17% and 77.96%, respectively (Fig. 4 and Fig. 5). 147, % 4,624,513 Concordant SNPs 91.17% 300, % _1 4,639,014 4,772,466 _2 4,639,471 4,785,071 total SNPs 4,791,741 total SNPs 4,946, % 93.23% Figure 4. The SNP reproducibility and cross-platform consistency statistics from 2 replicates and 2 replicates. 194, % 1,209,499 Concordant Indels % 147, % _1 1,299,372 1,250,891 _2 1,307,108 1,269,891 total Indels 1,477,897 total Indels 1,414, % 78.25% Figure 5. The Indel reproducibility and cross-platform consistency statistics from 2 replicates and 2 replicates.

6 Request for Information or Quotation specific project requirements or for expert advice on experiment design, from sample to bioinformatics. International H BGI Americas One Broadway, 14th Floor Cambridge, MA 02142, USA Tel: BGI Europe Ole Maaløes Vej 3, DK-2200 Copenhagen N, Denmark Tel: BGI Asia-Pacific 16 Dai Fu Street, Tai Po Industrial Estate, New Territories, Hong Kong Tel: Copyright 2018 BGI. The BGI logo is a trademark of BGI. All rights reserved. All brand and product names are trademarks or registered trademarks of their respective holders. Information, descriptions and specifications in this publication are subject to change without notice. Published September All Services and Solutions are for research use only.