CNV and variant detection for human genome resequencing data - for biomedical researchers (II)
|
|
- Arron Robertson
- 6 years ago
- Views:
Transcription
1 CNV and variant detection for human genome resequencing data - for biomedical researchers (II) Chuan-Kun Liu 劉傳崑 Senior Maneger National Center for Genome Medican bioit@ncgm.sinica.edu.tw
2 Abstract Common NGS Data Analysis Pipelines Data Output of NGS Platforms Quality Check and Read Trimming Sequence Alignment Variant Detection HiPipe Project
3 Common NGS Data Analysis Pipelines
4 Common NGS data analysis pipelines (DNA) Format Conversion and Demultiplexing (CASAVA) Quality Check (FastX / FastQC) Sequence Alignment (BWA) de novo Assembly (Velvet) Sequence Markduplicate (Sambamba) Variant calling (Freebayes) Sequence Realignment (GATK) Sequence Markduplicate (GATK) Variants Annotation (VarioWatch) Multi-sample Variant calling (GATK) LOH Detection (ExomeCNV or NCGM) CNV Detection (ExomeCNV) Translocation (SVDetect) Variants Annotation (VarioWatch) Visualization (Circos)
5 Common NGS data analysis pipelines (RNA) Format Conversion and Demultiplexing (CASAVA) Quality Check (FastX / FastQC) Sequence Alignment (Bowtie2) Sequence Alignment (Bowtie) de novo assembly (Trinity) Differential expression (Cufflinks) Sequence Realignment (GATK) Gene-fusion Detection (TopHat-fusion) mirna expression analysis (mirdeep2) Visualization Variant calling Gene Annotation (CummeRbund) (GATK) (VarioWatch) Gene Annotation (VarioWatch) Variants Annotation (VarioWatch)
6 NGS data analysis pipelines (Others) ChIP RNA IP Methylation Aptamer Virus integration Metagenomics
7 Data Output of NGS Platforms
8 Data output of NGS platforms Illumina HiSeq Plaform (BCL files) Roche GS FLX+ (Standard Flowgram Format (SFF) files) Thermo Fisher Ion Proton (FASTQ files) FASTQ PacBio RS II (FASTQ files) 8
9 Format Conversion and 2:N:0:GCCAAT ATGCAAATAAACTAGAAAATCTAGAAGAAATGGAGAAATTCCTGGACACAC + CCCFFFFFHHHGHJIIJJIJJJIIJJIJJIJJJFIIJAJJIJIJJJJI??? Illumina HiSeq 2000 (multiplexing BCL files) FASTQ by samples Phred Quality Score Probability of incorrect base call Base call accuracy 10 1 in % 20 1 in % 30 1 in % 40 1 in % 50 1 in %
10 ASCII code
11 Three described FASTQ variants Description, OBF name ASCII characters Quality score Range Offset Type Range Sanger standard* fastq-sanger PHRED 0 to 93 Solexa/early Illumina* fastq-solexa Solexa -5 to 62 Illumina 1.3+* fastq-illumina PHRED 0 to 62 Illumina 1.8+ fastq-sanger PHRED 0 to 93 *Nucleic Acids Res April; 38(6):
12 Quality Check and Read Trimming
13 FastQC Project Home Quality Check Merge fastq files into one file Java environment Interactive UI
14 Trimming (When) When to trim reads Format conversion Before alignment Upon alignment Trimming Trimming by lane Trimming by reads
15 How to trim reads Trimming (How) Specify the bases to be trimmed (eg. n5y*n5) Specify the lower bound threshold of base quality and trim 2:N:0:GCCAAT ATGCAAATAAACTAGAAAATCTAGAAGAAATGGAGAAATTCCTGGACACAC + CCCFFFFFHHHGHJIIJJIJJJIIJJIJJIJJJFIIJAJJIJIJJJJI???
16 A tool for read trimming Trimmomatic : A flexible read trimming tool for Illumina NGS data Project Home Adaptor trimming Quality trimming Using Sliding Window strategy Cuts a read when the average base quality within a sliding window drops below the lower bound threshold
17 Trimming Threshold Bases above Q30 85% (2 x 50 bp) 80% (2 x 100 bp) Q30 or Q24 or Q5? Is it necessary to trim reads?
18 Sequence Alignment
19 Tools for Sequence Alignment Open source BWA aln, BWA mem Bowtie, Bowtie2 STAR Commercial CASAVA, Isaac (Illumina) CLC Genomics Workbench
20 Human reference sequence NCBI36 (Aug, 2005) the last assembly produced by Human Genome Project (HGP) hg18, Ensembl release 54 GRCh37 (Feb, 2009) The first assembly submitted by Genome Reference Consortium (GRC) hg19, Ensembl release 55~75 (v75 Feb, 2014) GRCh38 (Dec, 2013) hg38, Ensembl release 76+ (v76 Aug, 2014)
21 GRCh37 Primary assembly Chromosome assembly (chr1-22, X, Y, Mt) Unlocalized sequence Unplaced sequence Alternate loci A sequence that provides an alternate representation of a locus found in a largely haploid assembly. (MHC region, UGT2B17, MAPT) Patches A contig sequence that is released outside of the full assembly release cycle.
22 Choose your reference genome Human_g1k_v37 The reference sequence provided by the 1000 Genomes Project* Unlocalized and Unplaced sequence are included (Full primary assembly) Mitochondrial sequence was replaced with the revised Cambridge Reference Sequences (rcrs; AC:NC_012920) (APR, 30, 2010)** Male or female *ftp://ftp.ncbi.nih.gov/1000genomes/ftp/technical/reference/readme.human_g1k_v37.fasta.txt **
23 SAM/BAM format Sequence Alignment/Map (SAM) format text file BAM format (binary file) to reduce the file size
24 Coverage Whole genome sequencing Mean coverage Whole exome sequencing Over 90% of target region were covered with 0.2x mean coverage Custom panel PCR based (~ Mean coverage) Capture-based (~ whole exome sequencing)
25 TruSeq Exome Enrichment Kit
26 Variant Detection
27 DNA Variant Detection Variant Detection Variants Alignment Result Sequencing Reads Reference Sequnece (GRCh37) A C G T
28 Input - bam files Output - vcf files Tools Variant Detection Germline mutation Genome Analysis Toolkit (GATK) (MAF > 5%) FreeBayes Somatic mutation MuTect Somatic Variant Caller (MAF > 1%)
29 Genome Analysis Toolkit (GATK) Project Home Maintenance - Broad Institute License Version before Free for all users (the MIT license) Version after 2.4 Free for academics Fee for commercial use
30 GATK Best Practices 1. Pre-processing Mark duplicates Realign indels Recalibrate Bases 2. Variant discovery Call Varinats Filter Varinats 3. Callset refinement Refine Genotypes Annotate variants Evaluate variants
31 Mark Duplicates AGGGAAACCACACAGGCTTCTTAGGCCATTGGAAT GGAAACCACACAGGCTT---AGGCCATTGGAA GAAACCACACAGGCTT---AGGCCATTGGAAT AAACCACACAGGCTT---AGGCCATTGG AAACCACACAGGCTT---AGGCCATTGG CCACACAGGCTT---AGGCCATTGGAA CACACAGGCTT---AGGCCATTGGAAT
32 Mark Duplicates The same DNA fragments may be sequenced several times The resulting duplicate reads are not informative and should not be counted as additional evidence for or against a putative variant The process mark the reads only, does not remove them NOT be applied to amplicon sequencing data
33 Realign Indels (Before) AGGGAAACCACACAGGCTTCTTAGGCCATTGGAAT GGAAACCACACAGGCTT---AGGCCATTGGAA GAAACCACACAGGC---TTAGGCCATTGGAAT AAACCACACAGGCT---TAGGCCATTGG AAACCACACAGGCTT---AGGCCATTGG CCACACAGGC---TTAGGCCATTGGAA CACACAGG---CTTAGGCCATTGGAAT
34 Realign Indels (After) AGGGAAACCACACAGGCTTCTTAGGCCATTGGAAT GGAAACCACACAGGCTT---AGGCCATTGGAA GAAACCACACAGGCTT---AGGCCATTGGAAT AAACCACACAGGCTT---AGGCCATTGG AAACCACACAGGCTT---AGGCCATTGG CCACACAGGCTT---AGGCCATTGGAA CACACAGGCTT---AGGCCATTGGAAT
35 Realignment reads that align on the edges of indels often get mapped with mismatching bases that might look like evidence for SNPs, but are actually mapping artifacts.
36 Project Home FreeBayes Maintenance Erik Garrison, Gabor Marth License Free for all users (the MIT license) Pipeline using FreeBayes SpeedSeq ( Varpipe (
37 (1) Summary and description of fields VCF format (2) Basic info of variants (3) Variants info by individual
38 Output File Size Human whole genome 30X Raw reads (FASTQ) 100GB Alignment results (BAM) 100GB Variant detection results (VCF) 2GB Functional analysis of variants (CSV) 1GB Human whole exome 100X Raw reads(fastq) 20GB Alignment results(bam) 20GB Variant detection results(vcf) 0.1GB Functional analysis of variants(csv) 0.1GB
39 HiPipe Project
40 Why HiPipe? It is hard to analyze NGS data for researchers with biology background: Most analysis tools run only in the Linux environment Few websites provide proper analysis Many tools are only applied for small genome analysis It takes a long time to get a result Transferring large amount of data between websites is required to complete an analysis It is error-prone to orchestrate different analysis tools
41 User Expectations for NGS Easy to learn and use analysis tools Researchers without computer science background can complete an analysis themselves Doing analysis anywhere, anytime Suitable for any genome size Without file size limitation Upload sequence data upon read sequencing Get an analysis result rapidly Combine different experiment data in one analysis
42 Challenges familiar with bioinformatics tools ( 行中 亭均 ) Job scheduling and pipeline developing (Louis) Cluster and distributed system management ( 方智 ) Performance tuning Software ( 耀德 仁屏 ) Hardware (MIS Team) Bandwidth Disk I/O User Interface ( 爾瞻 萬嘉 中饋 ) Team coordinate ( 傳崑 ) Adam Yao
43 Architecture
44 HiPipe (Basic)
45 HiPipe (Basic) Online since Jun, 2013 Provide 7 popular NGS data analysis pipelines: Whole Genome & Exome Variant Detection Differential Expression Analysis Gene Fusion Detection mirna Analysis RNA Variant Detection De novo Assembly Exome CNV Detection
46 HiPipe Professional Authentication / Authorization Web-based interface supporting unlimited upload file size and resumable file upload Multi-sample indel realignment and variant detection Cloud / Onsite Providing integrated NGS data analysis and data store service
47 HiPipe Basic vs HiPipe Professional HiPipe HiPipe Professional HiPipe Basic HiPipe Cloud HiPipe Cluster HiPipe Quad HiPipe Duo HiPipe Uno Single sample analysis Multi-sample analysis Data storage location Cloud Cloud Local Local Local Local Nodes CPU cores X human whole genome variant detection 56 mins 56 mins 56 mins 7 hrs ~12 hrs ~36 hrs Expected ship date 2013 Q Q
48 Authentication / Authorization 1. Select a project 2. Add or remove members of the project
49 Upload files through a browser Unlimited upload file size and resumable file upload
50 Create Project / Analysis
51 Configure settings
52 Choose samples
53 Save settings
54 Confirm settings
55 Launch analysis
56 Multi-sample analysis result Multi-sample indel realignment
57 Integrative Genomics Viewer 1. Integrate and visualize NGS, microarray and annotation in genomics 2. Load data only in specified regions to reduce physical memory usage 3. Customize the attribute of samples for further filtering 4. Browse multiple region at the same time 5. It s FREE
58 Browse VCF and BAM in IGV
59 VCF Track I The allele fraction A C G T
60 VCF Track II Heterozygous Homozygous Reference A C G T
61 Zoom in to see alignment results Thorvaldsdóttir H et al. Brief Bioinform 2013;14: The Author(s) Published by Oxford University Press.
62 Browse alignment result in larger region Set to 200kb for browsing most of genes Status Memory usage
63 VarioWatch Functional analysis of Variants
64 VarioWatch Visualized result
65 Performance (Human) Estimated Analysis Time Whole Genome 56 minutes Note 40x Coverage Exome 10 minutes 100x Coverage Transcriptome 1-3 days 12 * 2 Gb (Paired Sample) de novo 24 hours 100 Gb K-mer = 73
66 NGS data analysis service Analysis Pipeline Samples DNA Whole Genome Variant Detection 135 Whole Exome Variant Detection 1241 CNV Decection 68 LOH Detection 47 de novo assembly 3 RNA Differential Expression Analysis 153 Gene-fusion Detection 32 mirna expression analysis 14 Variant Detection 43 de novo assembly 4 Other ChIP 25 Aptamer 25 RNA IP 15 Virus integration 15 Total (2011/10 ~ 2015/06) 1820
67 Applying for NGS data analysis service Contact us and discuss detail information in Pre-Service Meeting Create account for LIMS system and issue tracking system (JIRA) Create issue for data analysis on issue tracking system (JIRA) Confirm the analysis pipeline and the estimated price Transfer the raw data ( raw Reads) NGS data analysis Provide NGS data analysis results (preserve 6 month) Confirm the final price Case closed and charge from LIMS system Discuss results in Post-Service Meeting
68 HiPipe Basic (for free) HiPipe Cloudbeta apply for trial account Thanks for your attention!
NGS in Pathology Webinar
NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationCourse Presentation. Ignacio Medina Presentation
Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital
More informationData Analysis with CASAVA v1.8 and the MiSeq Reporter
Data Analysis with CASAVA v1.8 and the MiSeq Reporter Eric Smith, PhD Bioinformatics Scientist September 15 th, 2011 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationExperimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis
-Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification
More informationIDENTIFYING A DISEASE CAUSING MUTATION
IDENTIFYING A DISEASE CAUSING MUTATION Targeted resequencing MARCELA DAVILA 3/MZO/2016 Core Facilities at Sahlgrenska Academy Statistics Software bioinformatics@gu.se www.cf.gu.se/english// Increasing
More informationDATA FORMATS AND QUALITY CONTROL
HTS Summer School 12-16th September 2016 DATA FORMATS AND QUALITY CONTROL Romina Petersen, University of Cambridge (rp520@medschl.cam.ac.uk) Luigi Grassi, University of Cambridge (lg490@medschl.cam.ac.uk)
More informationQIAseq Targeted Panel Analysis Plugin USER MANUAL
QIAseq Targeted Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted Panel Analysis 1.1 Windows, macos and Linux June 18, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationDNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.
DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationData Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis
Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-
More informationMatthew Tinning Australian Genome Research Facility. July 2012
Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909
More informationBST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1
BST 226 Statistical Methods for Bioinformatics David M. Rocke March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 NGS Technologies Illumina Sequencing HiSeq 2500 & MiSeq PacBio Sequencing PacBio
More informationVariant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD
Variant Discovery Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD Variant Type Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958 Variant Type http://www.broadinstitute.org/education/glossary/snp
More informationHiSeq Whole Exome Sequencing Report. BGI Co., Ltd.
HiSeq Whole Exome Sequencing Report BGI Co., Ltd. Friday, 11th Nov., 2016 Table of Contents Results 1 Data Production 2 Summary Statistics of Alignment on Target Regions 3 Data Quality Control 4 SNP Results
More informationMapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010
Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong
More informationNormal-Tumor Comparison using Next-Generation Sequencing Data
Normal-Tumor Comparison using Next-Generation Sequencing Data Chun Li Vanderbilt University Taichung, March 16, 2011 Next-Generation Sequencing First-generation (Sanger sequencing): 115 kb per day per
More informationRNA-Seq Module 2 From QC to differential gene expression.
RNA-Seq Module 2 From QC to differential gene expression. Ying Zhang Ph.D, Informatics Analyst Research Informatics Support System (RISS) MSI Apr. 24, 2012 RNA-Seq Tutorials Tutorial 1: Introductory (Mar.
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES ABOUT T H E N E W YOR K G E NOM E C E N T E R NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. Through
More informationIntroduction to bioinformatics (NGS data analysis)
Introduction to bioinformatics (NGS data analysis) Alexander Jueterbock 2015-06-02 1 / 45 Got your sequencing data - now, what to do with it? File size: several Gb Number of lines: >1,000,000 @M02443:17:000000000-ABPBW:1:1101:12675:1533
More informationDeep Sequencing technologies
Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University
More informationEcole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech
GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG
More informationIntroduction to Next Generation Sequencing
The Sequencing Revolution Introduction to Next Generation Sequencing Dena Leshkowitz,WIS 1 st BIOmics Workshop High throughput Short Read Sequencing Technologies Highly parallel reactions (millions to
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationBioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION
IDENTIFYING A DISEASE CAUSING MUTATION MARCELA DAVILA 2/03/2017 Core Facilities at Sahlgrenska Academy www.cf.gu.se 5 statisticians, 3 bioinformaticians Consultation 7-8 Courses / year Contact information
More informationAnalysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail
Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.
More informationNext-Generation Sequencing Services à la carte
Next-Generation Sequencing Services à la carte www.seqme.eu ngs@seqme.eu SEQme 2017 All rights reserved The trademarks and names of other companies and products mentioned in this brochure are the property
More informationNEXT GENERATION SEQUENCING. Farhat Habib
NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp
More informationNGS part 2: applications. Tobias Österlund
NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationVariant Callers. J Fass 24 August 2017
Variant Callers J Fass 24 August 2017 Variant Types Caller Consistency Pabinger (2014) Briefings Bioinformatics 15:256 Freebayes Bayesian haplotype caller that can call SNPs, short CNVs / duplications,
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina
More informationCompatible with: Ion Torrent Platforms Roche Sequencing Platforms Illumina Sequencing Platforms Life Technologies SOLiD System
Application Modules for: SNP/Indel/Structural Variant Analysis CNV Analysis Somatic Mutation Mining Large Genome Alignment and Variant Discovery Exome Analysis and Variant Discovery RNA-Seq/Transcriptome
More informationL3: Short Read Alignment to a Reference Genome
L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list
More informationAddressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods
DISSERTATION Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods submitted in fulfillment of the requirements for the degree Doctorate of natural science doctor rerum
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationProcessing Ion AmpliSeq Data using NextGENe Software v2.3.0
Processing Ion AmpliSeq Data using NextGENe Software v2.3.0 July 2012 John McGuigan, Megan Manion, Kevin LeVan, CS Jonathan Liu Introduction The Ion AmpliSeq Panels use highly multiplexed PCR in order
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationAlignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014
Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationIntroduction to NGS analyses
Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1
More informationGenomic DNA ASSEMBLY BY REMAPPING. Course overview
ASSEMBLY BY REMAPPING Laurent Falquet, The Bioinformatics Unravelling Group, UNIFR & SIB MA/MER @ UniFr Group Leader @ SIB Course overview Genomic DNA PacBio Illumina methylation de novo remapping Annotation
More informationIntroduction to the MiSeq
Introduction to the MiSeq 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, BeadArray, BeadXpress, cbot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate,
More informationBICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte
BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis
More informationNext Generation Sequencing. Tobias Österlund
Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationIllumina s Suite of Targeted Resequencing Solutions
Illumina s Suite of Targeted Resequencing Solutions Colin Baron Sr. Product Manager Sequencing Applications 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,
More informationRNA-Seq analysis workshop
RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of
More informationThe Final Frontier. Data Analysis. Jean Jasinski, Ph.D. Field Application Scientist Sept. 27, 2017
The Final Frontier Data Analysis Jean Jasinski, Ph.D. Field Application Scientist Sept. 27, 2017 1 For Research Use Only. Not for use in diagnostic procedures. Final Frontier: Data Analysis Agenda Introduction
More informationGENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON
GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON Matthew Baranski, Casey Jowdy, Hooman Moghadam, Ashie Norris, Håvard Bakke, Anna Sonesson,
More informationPrioritization: from vcf to finding the causative gene
Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for
More informationGene Expression analysis with RNA-Seq data
Gene Expression analysis with RNA-Seq data C3BI Hands-on NGS course November 24th 2016 Frédéric Lemoine Plan 1. 2. Quality Control 3. Read Mapping 4. Gene Expression Analysis 5. Splicing/Transcript Analysis
More informationBioinformatics Advice on Experimental Design
Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics
More informationLecture 7. Next-generation sequencing technologies
Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively
More informationGalaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12
Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 What s Galaxy? Bringing Developers And Biologists Together. Reproducible Science Is Our Goal An open, web-based platform for data intensive
More information02 Agenda Item 03 Agenda Item
01 Agenda Item 02 Agenda Item 03 Agenda Item SOLiD 3 System: Applications Overview April 12th, 2010 Jennifer Stover Field Application Specialist - SOLiD Applications Workflow for SOLiD Application Application
More informationIntroduction to NGS. Data Analysis
Introduction to NGS (Now Generation Sequencing) Data Analysis Alex Sánchez Statistics and Bioinformatics Research Group Statistics department, Universitat de Barelona Statistics and Bioinformatics Unit
More informationVariant detection analysis in the BRCA1/2 genes from Ion torrent PGM data
Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data Bruno Zeitouni Bionformatics department of the Institut Curie Inserm U900 Mines ParisTech Ion Torrent User Meeting 2012, October
More informationAnalysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)
Analysis of RNA-seq Data Feb 8, 2017 Peikai CHEN (PHD) Outline What is RNA-seq? What can RNA-seq do? How is RNA-seq measured? How to process RNA-seq data: the basics How to visualize and diagnose your
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationBioinformatics for NGS projects. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics for NGS projects Guidelines genomescan.nl GenomeScan s Guidelines for Bioinformatics Services on NGS Data Using our own proprietary data analysis pipelines Dear
More informationFrancisco García Quality Control for NGS Raw Data
Contents Data formats Sequence capture Fasta and fastq formats Sequence quality encoding Quality Control Evaluation of sequence quality Quality control tools Identification of artifacts & filtering Practical
More informationThe New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing
The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before Jeremy Preston, PhD Marketing Manager, Sequencing Illumina Genome Analyzer: a Paradigm Shift 2000x gain in efficiency
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationNext Generation Sequencing: An Overview
Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation
More informationFast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:
Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing: Patented, Anti-Correlation Technology Provides 99.5% Accuracy & Sensitivity to 5% Variant Knowledge Base and External Annotation
More informationSNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationSupplementary Figures and Data
Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,
More informationAaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop
Output (bp) Aaron Liston, Oregon State University Growth in Next-Gen Sequencing Capacity 3.5E+11 2002 2004 2006 2008 2010 3.0E+11 2.5E+11 2.0E+11 1.5E+11 1.0E+11 Adapted from Mardis, 2011, Nature 5.0E+10
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity
Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationGenomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics
Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to
More informationThe Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience
Building Excellence in Genomics and Computa5onal Bioscience Resequencing approaches Sarah Ayling Crop Genomics and Diversity sarah.ayling@tgac.ac.uk Why re- sequence plants? To iden
More informationTranscriptome analysis
Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize
More informationComparing a few SNP calling algorithms using low-coverage sequencing data
Yu and Sun BMC Bioinformatics 2013, 14:274 RESEARCH ARTICLE Open Access Comparing a few SNP calling algorithms using low-coverage sequencing data Xiaoqing Yu 1 and Shuying Sun 1,2* Abstract Background:
More informationRNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols Next generation sequencing protocol cdna, not RNA sequencing
More informationVariant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016
Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with
More informationGenomic Data Analysis Services Available for PL-Grid Users
Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space PLGrid Plus Domain-oriented services and resources of Polish Infrastructure
More informationAccelerate precision medicine with Microsoft Genomics
Accelerate precision medicine with Microsoft Genomics Copyright 2018 Microsoft, Inc. All rights reserved. This content is for informational purposes only. Microsoft makes no warranties, express or implied,
More information14 March, 2016: Introduction to Genomics
14 March, 2016: Introduction to Genomics Genome Genome within Ensembl browser http://www.ensembl.org/homo_sapiens/location/view?db=core;g=ensg00000139618;r=13:3231547432400266 Genome within Ensembl browser
More informationNextSeq 500 System WGS Solution
NextSeq 500 System WGS Solution An accessible, high-quality whole-genome sequencing solution for any species. Highlights High-Quality, High-Coverage Genome Illumina chemistry offers highest read quality
More informationUsing New ThiNGS on Small Things. Shane Byrne
Using New ThiNGS on Small Things Shane Byrne Next Generation Sequencing New Things Small Things NGS Next Generation Sequencing = 2 nd generation of sequencing 454 GS FLX, SOLiD, GAIIx, HiSeq, MiSeq, Ion
More informationQuality assurance in NGS (diagnostics)
Quality assurance in NGS (diagnostics) Chris Mattocks National Genetics Reference Laboratory (Wessex) Research Diagnostics Quality assurance Any systematic process of checking to see whether a product
More informationIntroducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager
Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis
More informationFast and Accurate Variant Calling in Strand NGS
S T R A ND LIF E SCIENCE S WH ITE PAPE R Fast and Accurate Variant Calling in Strand NGS A benchmarking study Radhakrishna Bettadapura, Shanmukh Katragadda, Vamsi Veeramachaneni, Atanu Pal, Mahesh Nagarajan
More informationWhy QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:
Why QC? Next-Generation Sequencing: Quality Control BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute Do you want to include the reads with low quality base calls?
More informationVariation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI
Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality
More information2nd (Next) Generation Sequencing 2/2/2018
2nd (Next) Generation Sequencing 2/2/2018 Why do we want to sequence a genome? - To see the sequence (assembly) To validate an experiment (insert or knockout) To compare to another genome and find variations
More informationStructural variation analysis using NGS sequencing
Structural variation analysis using NGS sequencing Victor Guryev NBIC NGS taskforce meeting April 15th, 2011 Scale of genomic variants Scale 1 bp 10 bp 100 bp 1 kb 10 kb 100 kb 1 Mb Variants SNPs Short
More informationNext-Generation Sequencing: Quality Control
Next-Generation Sequencing: Quality Control Bingbing Yuan BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Why QC? Do you want to
More informationWhite Paper GENALICE MAP: Variant Calling in a Matter of Minutes. Bas Tolhuis, PhD - GENALICE B.V.
White Paper GENALICE MAP: Variant Calling in a Matter of Minutes Bas Tolhuis, PhD - GENALICE B.V. White Paper GENALICE MAP Variant Calling GENALICE BV May 2014 White Paper GENALICE MAP Variant Calling
More informationGalaxy Workshop
Galaxy Workshop 1-8-13 Intros: Tom Bair thomas-bair@uiowa.edu Ann Black-Ziegelbein annblack@eng.uiowa.edu Srinivas Maddhi srinivas-maddhi@uiowa.edu What is galaxy good for Access to resources Documentation
More informationVariant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012
+ Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools
More informationAnalysis Datasheet Exosome RNA-seq Analysis
Analysis Datasheet Exosome RNA-seq Analysis Overview RNA-seq is a high-throughput sequencing technology that provides a genome-wide assessment of the RNA content of an organism, tissue, or cell. Small
More informationUAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science
+ UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point
More information