Next Generation Sequencing

Size: px
Start display at page:

Download "Next Generation Sequencing"

Transcription

1 Next Generation Sequencing Complete Report Catalogue # and Service: IR16001 rrna depletion (human, mouse, or rat) IR11081 Total RNA Sequencing (80 million reads, 2x75 bp PE) Xxxxxxx - xxxxxxxxxxxxxxxxxxxxxx Customer: OOOOO Order #: OOOOO Operator: OOOOO Sequencer: NextSeq 500 (Illumina) Completion Date: OOOOO # V i k i n g W a y Richmond BC, Canada V6V 2J5 T e l : F a x :

2 Report Content Service Summary... 3 Quality Control of Received Samples... 4 Library Construction... 5 Cluster Generation and Sequencing... 6 Data Access... 7 Bioinformatics Analysis... 8 Overview Workflow Results Sequencing and mapping results Sample similarity analysis Principal component analysis Gene expression distribution Differential expression analysis Differential gene clustering Methods Files References Acknowledgement Page 2 of 15

3 Service Summary Total 20 RNA samples were received and a quality check was performed with the Agilent 2100 Bioanalyzer. All samples passed abm internal quality control. The samples were subjected to rrna depletion followed by fragmentation, first and second strand synthesis, adenylation of 3 ends, adapter ligation, DNA fragment enrichment, and real-time PCR quantification. Project was completed with one sequencing run. Bcl files were converted to fastq data immediately after the run. Over 80 million paired-end reads for all samples can be downloaded by following the link provided on page five; username and password can be found below the link on the same page. Page 3 of 15

4 Quality Control of Received Samples RNA Integrity Number Check Agilent Bioanalyzer RNA 6000 Pico Kit Sample RIN Pass/Fail SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass SRR Pass Page 4 of 15

5 Library Construction RiboZero Human,Mouse,Rat (Illumina) rrna Depletion TruSeq Stranded mrna LT (Illumina) Synthesize first strand cdna Synthesize second strand cdna Adenylate 3 ends Adapter ligation Enrich DNA fragments Agilent 2100 Bioanalyzer (Agilent Technologies) Assess the size distribution of the amplified DNA KAPA SYBR FAST qpcr Kit (KAPA Bio Systems) Quantitation of library concentration Sample Library Size (Average) SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR Page 5 of 15

6 Cluster Generation and Sequencing NextSeq 500 (Illumina) Cluster Generation and 2-Channel Sequencing Bcl to FastQ Generation (Illumina) Onboard Instrument Software/local server Sample Reads SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR SRR Page 6 of 15

7 Data Access Please note that the raw data saved is time-sensitive and will be removed from the FTP server after 3 months of the release date. Add data are available for download at: ftp.abmgood.com Username: OOOOOO Password: ********* *Note: Do not copy and paste username or password. Please type in the login information manually. Data download is best achieved by using a FTP client like FileZilla. Page 7 of 15

8 Bioinformatics Analysis Overview 1. Workflow 2. Results o 2.1. Sequencing and mapping results o 2.2. Sample similarity analysis o 2.3. Principal component analysis o 2.4. Differential expression analysis o 2.5. Differential gene clustering 3. Methods 4. Files 5. References 1. Workflow The bioinformatics pipeline built in abm evolves with new technology and algorithms. The RNA-seq workflow applies latest mapping, transcript assembly, and read counting algorithms to achieve better sensitivity and specificity over traditional TopHat, Cufflinks, and Cuffdiff workflow. Page 8 of 15

9 2. Results 2.1. Sequencing and mapping results Table Sequencing QC and mapping results Sample Name R1 Q20% R2 Q20% R1 Q30% R2 Q30% R1 GC% R2 GC% Mapping rate SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % SRR % Page 9 of 15

10 2.2. Sample similarity analysis Figure Sample similarity based on gene expression. Darker shade represents higher similarity between two samples. This was done by measuring the edit distance between samples using the expression value of each gene as features. Page 10 of 15

11 2.3. Principal component analysis Figure Principal component analysis is to indicate the components that show decreasing level of variance with the first component having the highest variance. The clustering of the first two components usually tells how much variance is between samples. Page 11 of 15

12 2.4. Gene expression distribution Figure The distribution of gene expression based on FPKM values. The boxes represent the 1st to the 3rd quartile. The ends of the whiskers represent 95% of the values. Details of FPKM and TPM values for each sample can be found in Data File 3. Page 12 of 15

13 2.5. Differential expression analysis Figure The distribution of expression changes is plotted on a MA-plot, which plots the log-fold change (M) against the log-average (A). The red dots indicate genes that show expression differences with FDR adjusted p-value < 0.1. Page 13 of 15

14 2.6. Differential gene clustering Figure Clustering of genes with fold-change > 2 and FDR < List of genes differentially expressed can be found in Data File Methods Read QC was done using FastQC and aligned to the reference genome using Hisat2 [1]. The transcripts were assembled and expression level estimated using FPKM and TPM by StringTie [2]. Read counts between samples were normalized using the DESeq2 algorithm and the differentially expressed genes were identified by DESeq2 [3] with fold-change >= 2 and FDR < Statistical analyses for PCA and MA plots were performed in R. The differentially expressed genes were functionally annotated with KEGG [4], GO, and OMIM databases. Page 14 of 15

15 4. Data Files Data File 1: Fastq files Data File 2: Assembled transcript structures (GTF format) Data File 3: FPKM and TPM values for each gene (excel file) Data File 4: Differentially regulated genes (excel file) 5. References 1. Kim D, Langmead B and Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology Love M, Huber W and Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology Kanehisa M, Sato Y, Kawashima M, Furumichi M, and Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res Acknowledgement Thank you for choosing abm as your sequencing service provider. It is our goal to provide you with the best customer experience in the world. Please do not hesitate to contact us should you need further assistance analyzing your data or other next generation sequencing services. We are grateful to be a part of your scientific exploration and we look forward to serving you again. Yours truly, Jennie Kwan abm NGS Program Director Page 15 of 15