Bioinformatics for High Throughput Sequencing

Size: px
Start display at page:

Download "Bioinformatics for High Throughput Sequencing"

Transcription

1 Bioinformatics for High Throughput Sequencing Eric Rivals LIRMM & IBC, Montpellier /

2 High Throughput Sequencing or Next Generation Sequencing High Throughput Sequencing or Next Generation Sequencing 2 /

3 High Throughput Sequencing or Next Generation Sequencing 3 /

4 High Throughput Sequencing or Next Generation Sequencing Overview of techniques Name Read Lg Time Gb/run pros / cons 454 GS Flex h 0.7 long Illumina HiSeq 2* h 120 short/cost SOLID (LifeSc) 85 8 d 150 long time Ion Proton h 100 new PacBio Sciences high error rate 4 /

5 High Throughput Sequencing or Next Generation Sequencing HTS output: an example one Human RNA library 75 million reads of 100 bp each Analysis reveals that it represents > 140, 000 splice events on 16, 000 expressed genes 5 /

6 High Throughput Sequencing or Next Generation Sequencing HTS output: an example one Human RNA library 75 million reads of 100 bp each Analysis reveals that it represents > 140, 000 splice events on 16, 000 expressed genes Bottleneck: Bioinformatics read analysis 5 /

7 What can life scientists do with NGS assays? What can life scientists do with NGS assays? 6 /

8 What can life scientists do with NGS assays? Domains of applications bio-molecular research biotechnology (e.g. bio-fuels) biodiversity monitoring personalised medicine epidemiology surveillance pharmacogenomics personal genomics forensic agronomy (animal and plant research) 7 /

9 What can life scientists do with NGS assays? Biological questions sample genomic variations in a population of individuals detect genotype differences related to a disease measure variations in gene expression & identify RNA variants study replication, transcription or translation processes interrogate protein binding sites on the whole genome or RNAs assess epigenetic modifications on the genome (3D structure) estimate the fitness contribution of each gene in bacteria identify genes involved in pathogenicity or adaptation study gene interactions and their role in regulatory pathways or in metabolic pathways survey the species or assess the biodiversity of an environment list the bio-molecular functions or processes active in an environmental sample 8 /

10 What can life scientists do with NGS assays? Remarks on seq based assays This type of questions and assays pre-existed to NGS but NGS made them cheaper, high-throughput, and genome-wide Genome wide is the major qualitative change: no predefined target, no knowledge required, potentially all sites are scrutinized 9 /

11 What can life scientists do with NGS assays? Two situations in genomics 1 a reference genome is available map reads on the genome 10 /

12 What can life scientists do with NGS assays? Two situations in genomics 1 a reference genome is available map reads on the genome 2 without a reference genome assemble the reads to get the genome or comparative analysis of several read sets 10 /

13 A pattern matching primer A pattern matching primer 11 /

14 A pattern matching primer Outline 1 The problem 2 Text indexing approach 3 Filtration approach 12 /

15 A pattern matching primer Pattern Matching 1 a text T of length n 2 a pattern M of length m 3 generally m << n. Example: M := tgtg T: c t g t g t g t a c a t g t g t g t g t g t g t g t g Solution: {2, 4, 12} 13 /

16 A pattern matching primer Pattern Matching 1 a text T of length n 2 a pattern M of length m 3 generally m << n. For one read: window 1 2 m How to do it for millions of reads? 13 /

17 A pattern matching primer Naive and involved algorithms Naive algorithm: for each window m pairwise symbol comparisons about n windows Total time proportional to n m (complexity) Linear time solutions: Idea: exploit results on a window to ease that of overlapping windows Boyer-Moore or Knuth Morris Pratt algorithms in the 70 s Total time proportional to n + m 14 /

18 A pattern matching primer Naive and involved algorithms Naive algorithm: for each window m pairwise symbol comparisons about n windows Total time proportional to n m (complexity) Linear time solutions: Idea: exploit results on a window to ease that of overlapping windows Boyer-Moore or Knuth Morris Pratt algorithms in the 70 s Total time proportional to n + m Limitations: single query and exact match 14 /

19 A pattern matching primer Naive and involved algorithms Naive algorithm: for each window m pairwise symbol comparisons about n windows Total time proportional to n m (complexity) Linear time solutions: Idea: exploit results on a window to ease that of overlapping windows Boyer-Moore or Knuth Morris Pratt algorithms in the 70 s Total time proportional to n + m Limitations: single query and exact match Answers: indexing text and filtration approaches 14 /

20 A pattern matching primer Multiple PM with a text index Matching in two steps: 1 preprocessing the text T in time O(n) build and store a data structure: an index enables exact search query 2 search for each pattern in the index in O(m) time (optimal) 15 /

21 A pattern matching primer Text indexing data structures For a text of length n, a good index: 1 occupancy memory in O(n) 2 construction time in O(n) units 3 enables exact motif search in O(m) time for a motif of length m Three historical structures: 1 compact suffix tree [Wiener 73, McCreight 76, Ukkonen 92] 2 suffix array: construction in O(n) [Kärkkäinen & Sanders 03] 3 DAWG (Directed Acyclic Word Graph) [Blumer et al. 85] 16 /

22 A pattern matching primer Breakthrough in text indexing With historical index structures, 1 you need the text and the index 2 both in main memory to keep it fast Around 2000, the advent of compressible self indexing structures : 1 a self-index replaces the text and the classical index 2 its size can be modulated in function of available memory. Example 1 Burrows-Wheeler Transform or FM-index [Ferragina Manzini 00] 2 Compressed k-mer indexes [Philippe et al. 11] 3 Minimum information de Bruijn Graphs for assembly [Li 09, Chikhi & Rizk 13] 17 /

23 Personalized Medicine Personalized Medicine 18 /

24 Personalized Medicine Personalised Medicine Wikipedia emphasizes the systematic use of information about an individual patient to select or optimize that patient s preventative and therapeutic care. US Congress definition the application of genomic and molecular data to better target the delivery of health care, facilitate the discovery and clinical testing of new products, and help determine a person s predisposition to a particular disease or condition /

25 Personalized Medicine Abnormal chromosome pool in cancer Blood cancer karyotype (leukemia) Normal human karyotype 20 /

26 Personalized Medicine Abnormal chromosome pool in cancer Normal human karyotype Blood cancer karyotype (leukemia) 20 /

27 Personalized Medicine Abnormal chromosome pool in cancer Diagnosis of chronic myelogenous leukemia (CML) Prognosis in myelodysplastic syndrome Blood cancer karyotype (leukemia) 21 /

28 Personalized Medicine Leukemia with gene fusion 22 /

29 Personalized Medicine Leukemia with gene fusion translocation 22 /

30 Personalized Medicine Translocated gene to fusion RNA 23 /

31 Personalized Medicine Personalised Medicine for Chronic Myelogenous Leukemia Test in the bone marrow: presence of BCR-ABL t(9;22) fusion RNA? 1 diagnosis 2 monitoring disease recurrence 3 treatment follow up 24 /

32 Personalized Medicine Personalised Medicine for Chronic Myelogenous Leukemia Test in the bone marrow: presence of BCR-ABL t(9;22) fusion RNA? 1 diagnosis 2 monitoring disease recurrence 3 treatment follow up What if test goes wrong because of human genetic variability? another form of BCR-ABL fusion is produced? other, still unknow, aberrant RNA are involved in this cancer? 24 /

33 Personalized Medicine What we need... Monitoring all active genes, i.e. RNAs, in a cell very fast at low cost, and limited cell material. High-Throughput Transcriptomics RNA-seq determining which genomic regions are transcribed and activated in a cell, at which activation/expression level 25 /

34 Personalized Medicine Detection needs sensitivity and specificity 26 /

35 Locating read on a reference sequence Mapping Locating read on a reference sequence Mapping 27 /

36 Locating read on a reference sequence Mapping A definition of mapping Locating or mapping reads for each read, find its location of origin on the reference genome 28 /

37 Locating read on a reference sequence Mapping A definition of mapping Locating or mapping reads for each read, find its location of origin on the reference genome How? use the sequence similarity between the read and the reference approximate pattern matching or alignment 28 /

38 Locating read on a reference sequence Mapping A definition of mapping Locating or mapping reads for each read, find its location of origin on the reference genome How? use the sequence similarity between the read and the reference approximate pattern matching or alignment Differences in sequence come from 1 sequencing errors 2 genetic variability at intra- and inter-individual 3 splicing of RNA compared to DNA sequence 28 /

39 Locating read on a reference sequence Mapping Mapping for genomics, transcriptomics, or epigenomics Find for each read all genomic positions at which the read match either exactly or approximately on the genome (+/ strands) Results: is a read located? once or more than once? unmapped : not found uniquely mapped : mapped at a single genomic location mutiply mapped : mapped at several genomic locations 29 /

40 Locating read on a reference sequence Mapping Bottleneck of mapping Data volume, typically: 3 Giga bp of the Human genome sequence 50 million reads, each 100 bases long par read 30 /

41 Locating read on a reference sequence Mapping Bottleneck of mapping Data volume, typically: 3 Giga bp of the Human genome sequence 50 million reads, each 100 bases long par read Main issue: Scalability in terms of memory and time data flow especially in sequencing centers How? indexing the genome sequence for answering pattern matching queries filtration algorithms for fast alignment 30 /

42 Locating read on a reference sequence Mapping Mapping programs /

43 Locating read on a reference sequence Mapping Mapping comparison Data Human K562 cancer cell line RNA-Seq library 12 millions reads, 75 bp long Percentage of mapped reads Unique Multiple Bowtie BWA SOAP2 Exact 32 /

44 High Throughput Sequencing & transcriptomics High Throughput Sequencing & transcriptomics 33 /

45 High Throughput Sequencing & transcriptomics HTS Transcriptomics: RNA-Seq RNA-Seq: monitoring gene activation in cells cataloguing and discovery of RNAs Bottlenecks: Bioinformatics processing Big Data and scalability issues 34 /

46 High Throughput Sequencing & transcriptomics Mapping of RNA-seq reads Goal: Find the alignments of the read with regions of the genome. spliced read exon exon 35 /

47 High Throughput Sequencing & transcriptomics Typical analysis information flow Multi-step analysis pipeline 1 Mapping (only genomic locations) 2 Coverage (distinguishing errors from biological events) 3 Prediction of candidate (mutations, splicing, etc) Limitations Late distinction between sequencing errors and mutations No control on false negatives & positives in mapping Almost no backtracking - error propagation 36 /

48 High Throughput Sequencing & transcriptomics Remedy / Solution? integrate all information at once in a single program 37 /

49 High Throughput Sequencing & transcriptomics CRAC CRAC 38 /

50 High Throughput Sequencing & transcriptomics CRAC CRAC: a tool for analyzing genomic or transcriptomic reads Case where a reference genome is available Inputs: the indexed genome sequence the set of reads (no FASTQ quality) a integer parameter k: the length of k-mers 39 /

51 High Throughput Sequencing & transcriptomics CRAC CRAC: a tool for analyzing genomic or transcriptomic reads Case where a reference genome is available Inputs: the indexed genome sequence the set of reads (no FASTQ quality) a integer parameter k: the length of k-mers Questions: detect genome localization: single, a few, many erroneous position and error mutation position and mutation (substitution & indels) exon-exon junctions rearrangements or chimeric RNAs repeats borders 39 /

52 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 40 /

53 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 40 /

54 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 40 /

55 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 40 /

56 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 16 located k-mers 22 k-mers not located 16 located k-mers 40 /

57 High Throughput Sequencing & transcriptomics CRAC CRAC principle I: k-mer profiling C T A G T T T T A T A C T T T A G G G G T A A G C A G T G G A A A G T T A G A G T T C G G A G C T G T T T A T T G A G G G C A G G G G A A G A A T G T 16 located k-mers 22 k-mers not located 16 located k-mers error or mutation? 40 /

58 High Throughput Sequencing & transcriptomics CRAC Principle I: genomic location With the k-mers genomic location, you get: the read location the difference with the genome However, with mapping only no distinction between genetic variations and errors 41 /

59 High Throughput Sequencing & transcriptomics CRAC Principle II: genetic variation while a sequence error occurs in a read, affects only that read Error or mutation? An Integrated approach Principle II A genetic variation affect all reads covering its position mutation? Error or mutation? Polymorphism Reads Polymorphism An Integrate gen All reads incorporate the mutation Error Reads All reads i Error 42 /

60 High Throughput Sequencing & transcriptomics CRAC Support: a proxy for local coverage Definition: support of a k-mer Number of reads containing that k-mer (at least once) Support: approximation of the local coverage by the read set 43 /

61 High Throughput Sequencing & transcriptomics CRAC CRAC: idea For each read, it analyzes jointly two signals for each k-mer the location of the k-mer on the genome i.e. its matching locations and their number, the support: the number of reads sharing this k-mer How? on the fly using two indexes: a compressed Burrows-Wheeler Transform of the genome a generalized k-factor table built on all reads [Philippe et al., 2011] 44 /

62 High Throughput Sequencing & transcriptomics CRAC profiles Sequence error vs mutation profile (m = 50, k = 20) CGGCTGTGTATTACTGTGCGAGAGTCGGGGGAGATTACTATGATAGTAGT (blue dots): support (left scale) x (red cross): nb of genome locations (right scale) /

63 High Throughput Sequencing & transcriptomics CRAC profiles Sequence error vs mutation profile (m = 50, k = 20) CGGCTGTGTATTACTGTGCGAGAGTCGGGGGAGATTACTATGATAGTAGT CTGGACCCCCTGGACATGCCCTGCACAACCATCCCCTCCGCGCCCCAGGC /

64 High Throughput Sequencing & transcriptomics CRAC profiles Profile Analysis: Rules for Single Cause Length of location break: Substitution: k Deletion or splice junction: k 1 Insertion: k + p with p length of the insertion Issues: suppress isolated random location compare left & right vs inner support levels of a break 46 /

65 High Throughput Sequencing & transcriptomics CRAC profiles Support variation SNV error Read k-mers break Analysis of the support profile location profile 30 reads share the k- mer starting here Stable Variable There is only one read E. Rivals (LIRMM) High Throughput Sequencing with this erroneous & bioinformatics k-mer 47 /

66 High Throughput Sequencing & transcriptomics CRAC profiles Random locations expected break mirage breaks Read Genome False locations 48 /

67 High Throughput Sequencing & transcriptomics Classification Classification process of a read CRAC reads analysis according to P-loc FM-index mapping no break location break(s) no mutation Gk arrays fall support ambiguous no fall ambiguous unique or duplicated seq error undetermined SNV or insertion or bio undetermined or deletion multiple or no loc or splice or chimera 49 /

68 High Throughput Sequencing & transcriptomics CRAC results CRAC results 50 /

69 High Throughput Sequencing & transcriptomics Results on simulated data Results mapping: simulated data 100 Human 42M length 75 bp Percent of single mapped reads Bowtie BWA CRAC GASSST GSNAP SOAP /

70 High Throughput Sequencing & transcriptomics Results on simulated data Results mapping: simulated data 100 Human 48M length 200 bp Percent of single mapped reads Bowtie BWASW CRAC GASSST GSNAP SOAP /

71 High Throughput Sequencing & transcriptomics Results on simulated data Splice junction prediction: simulated data 75bp 200bp Tool Sensitivity Precision Sensitivity Precision CRAC GSNAP MapSplice TopHat TopHat /

72 High Throughput Sequencing & transcriptomics Results on simulated data Memory & time Data: 42 M reads, 75 bp Same nb of processors Computing time in (m,h,d) and memory in GB Prog. Bowtie BWA GASSST SOAP2 CRAC GSNAP MapSplice TopHat Time 7h 6h 5h 40m 9h 2d 4h 12h Memory /

73 High Throughput Sequencing & transcriptomics Results on simulated data Splice junction detection on real data (Human) Agreement between tools on known RefSeq splice junctions 54 /

74 High Throughput Sequencing & transcriptomics Results on simulated data Reads spanning several exons and junctions a read overlapping exons 2 to 5 of TIMM50 gene (Human) CRAC can detect several successive splice junctions in a single read 55 /

75 High Throughput Sequencing & transcriptomics Results on simulated data Candidate fusion RNAs in four Breast cancer libraries [Edgren et al. 2011]: 4 cancer cell lines, RNA-seq, 50 millions reads of 50 nt CRAC & TopHat-fusion find 20, resp. 21 out of 28 validated fusion RNAs 56 /

76 High Throughput Sequencing & transcriptomics Results on simulated data Candidate fusion RNAs in four Breast cancer libraries [Edgren et al. 2011]: 4 cancer cell lines, RNA-seq, 50 millions reads of 50 nt CRAC & TopHat-fusion find 20, resp. 21 out of 28 validated fusion RNAs Nb of reported fusion RNA candidates Cancer libraries CRAC TopHat-fusion BT KPL MCF SK-BR /

77 High Throughput Sequencing & transcriptomics Results on simulated data Candidate fusion RNAs in four Breast cancer libraries [Edgren et al. 2011]: 4 cancer cell lines, RNA-seq, 50 millions reads of 50 nt CRAC & TopHat-fusion find 20, resp. 21 out of 28 validated fusion RNAs Nb of reported fusion RNA candidates Cancer libraries CRAC TopHat-fusion BT KPL MCF SK-BR CRAC reports 36 fusion candidates that recur 2 libraries 35/36 with the same junction point No recurrent fusion RNAs were found in the original study 56 /

78 Conclusion Conclusion 57 /

79 Conclusion Take home Integration of location and support informations Multiple event predictions k-mer profiling better mapping, especially for spliced reads CRAC sensitivity improves with read length Detailed information on each read 58 /

80 Conclusion Conclusions NGS assays pervade many domains of biology and are exploited for numerous and divers studies Bioinformatics analysis is the current bottleneck The scalability challenge is solved up to now... thanks to text indexing algorithms Data integration for prioritizing candidates 59 /

81 Conclusion CRAC publication & views software available on the ATGC platform: /

82 Conclusion Funding and acknowledgments MAB team and in particular B. Cazaux, M. Hébrard, V. Maillol, V. Lefort MASTODONS SePhHaDe project Thanks for your attention Questions? 61 /

83 Conclusion A few references CRAC: an integrated approach to the analysis of RNA-seq reads: N. Philippe, M. Salson, T. Commes, E. Rivals. Genome Biology 14:R30, Filtration and indexing for similarity searches: S. Burkhardt, A. Crauser, P. Ferragina, H.-P. Lenhof, E. Rivals, M. Vingron, q-gram Based Database Searching Using a Suffix Array (QUASAR), Proc. of the 3rd International Conference on Computational Molecular Biology (RECOMB99), ACM Press. Index data structures: D. Gusfield s book, OUP, V. Mäkinen, G. Navarro: Compressed Text Indexing. Encyclopedia of Algorithms. Springer-Verlag, N. Välimäki, E. Rivals, Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data, ISBRA, LNBI 7875, /

84 Supplements Supplements 63 /

85 Supplements Tools for RNA-seq To detect splice junctions TopHat (v1 & 2) [Trapnell et al., 2009] MapSplice [Wang et al., 2010] GSNAP [Wu et Nacu, 2010] CRAC [Philippe et al. 2013] To detect fusion RNAs splice junctions MapSplice [Wang et al., 2010] single reads TopHat fusion [McPherson et al., 2011] single reads FusionSeq [Sboner et al., 2010] paired reads FusionHunter [Li et al., 2011] paired reads CRAC [Philippe et al. 2013] single & paired 64 /

86 Supplements Simulated data: CRAC predictions by category (B) (C) Percent of cause found SNV Insertions Deletions Splices Chimeras Errors SNV Insertions Deletions Splices Chimeras Errors /

High Throughput Sequencing & bioinformatics analysis

High Throughput Sequencing & bioinformatics analysis High Throughput Sequencing & bioinformatics analysis Eric Rivals LIRMM & IBC, Montpellier http://www.lirmm.fr/~rivals http://www.lirmm.fr/~rivals 1 / High Throughput Sequencing or Next Generation Sequencing

More information

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 Introduction to Short Read Alignment UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

CRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data.

CRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data. CRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 The real

More information

About Strand NGS. Strand Genomics, Inc All rights reserved.

About Strand NGS. Strand Genomics, Inc All rights reserved. About Strand NGS Strand NGS-formerly known as Avadis NGS, is an integrated platform that provides analysis, management and visualization tools for next-generation sequencing data. It supports extensive

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq. Reads to Discovery RNA-Seq Small DNA-Seq ChIP-Seq Methyl-Seq RNA-Seq MeDIP-Seq www.strand-ngs.com Analyze Visualize Annotate Discover Data Import Alignment Vendor Platforms: Illumina Ion Torrent Roche

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Introduction to RNA-Seq in GeneSpring NGS Software

Introduction to RNA-Seq in GeneSpring NGS Software Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Richard Corbett Canada s Michael Smith Genome Sciences Centre Vancouver, British Columbia June 28, 2017 Our mandate is to advance knowledge about cancer and other diseases

More information

Challenging algorithms in bioinformatics

Challenging algorithms in bioinformatics Challenging algorithms in bioinformatics 11 October 2018 Torbjørn Rognes Department of Informatics, UiO torognes@ifi.uio.no What is bioinformatics? Definition: Bioinformatics is the development and use

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment Zhaojun Zhang, Shunping Huang, Jack Wang, Xiang Zhang, Fernando Pardo

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in

More information

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform

More information

NEXT GENERATION SEQUENCING. Farhat Habib

NEXT GENERATION SEQUENCING. Farhat Habib NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp

More information

Analysis of RNA-seq Data

Analysis of RNA-seq Data Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,

More information

SEQUENCING. M Ataei, PhD. Feb 2016

SEQUENCING. M Ataei, PhD. Feb 2016 CLINICAL NEXT GENERATION SEQUENCING M Ataei, PhD Tehran Medical Genetics Laboratory Feb 2016 Overview 2 Background NGS in non-invasive prenatal diagnosis (NIPD) 3 Background Background 4 In the 1970s,

More information

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

Genome 373: Mapping Short Sequence Reads II. Doug Fowler Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics RNA Sequencing T TM variation genetics validation SNP ncrna metagenomics private trio de novo exome mendelian ChIP-seq RNA DNA bioinformatics custom target high-throughput resequencing storage ncrna comparative

More information

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs

DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs Nadia Pisanti University of Pisa & Leiden University Outline New Generation Sequencing (NGS), and the importance of detecting

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data Introduction The US Food and Drug Administration (FDA) has coordinated the Sequencing Quality Control project (SEQC/MAQC-III)

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Introduction to human genomics and genome informatics

Introduction to human genomics and genome informatics Introduction to human genomics and genome informatics Session 1 Prince of Wales Clinical School Dr Jason Wong ARC Future Fellow Head, Bioinformatics & Integrative Genomics Adult Cancer Program, Lowy Cancer

More information

Assay Validation Services

Assay Validation Services Overview PierianDx s assay validation services bring clinical genomic tests to market more rapidly through experimental design, sample requirements, analytical pipeline optimization, and criteria tuning.

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 14 2011, pages 1922 1928 doi:10.1093/bioinformatics/btr310 Sequence analysis Advance Access publication May 18, 2011 FusionMap: detecting fusion genes from next-generation

More information

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist Dr. Bhagyashree S. Birla NGS Field Application Scientist bhagyashree.birla@qiagen.com NGS spans a broad range of applications DNA Applications Human ID Liquid biopsy Biomarker discovery Inherited and somatic

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler High-Throughput Bioinformatics: Re-sequencing and de novo assembly Elena Czeizler 13.11.2015 Sequencing data Current sequencing technologies produce large amounts of data: short reads The outputted sequences

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

Supplementary Information Supplementary Figures

Supplementary Information Supplementary Figures Supplementary Information Supplementary Figures Supplementary Figure 1. Frequency of the most highly recurrent gene fusions in 333 prostate cancer patients from the TCGA. The Y-axis shows numbers of patients.

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES ABOUT T H E N E W YOR K G E NOM E C E N T E R NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. Through

More information

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS) RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS) Terms and definitions TRANSCRIPTOME The full set of RNA transcripts and their

More information

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 BST 226 Statistical Methods for Bioinformatics David M. Rocke March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 NGS Technologies Illumina Sequencing HiSeq 2500 & MiSeq PacBio Sequencing PacBio

More information

TECH NOTE Stranded NGS libraries from FFPE samples

TECH NOTE Stranded NGS libraries from FFPE samples TECH NOTE Stranded NGS libraries from FFPE samples Robust performance with extremely degraded FFPE RNA (DV 200 >25%) Consistent library quality across a range of input amounts (5 ng 50 ng) Compatibility

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day five Alternative splicing Assembly RNA edits Alternative splicing

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016 Introduction to RNAseq Analysis Milena Kraus Apr 18, 2016 Agenda What is RNA sequencing used for? 1. Biological background 2. From wet lab sample to transcriptome a. Experimental procedure b. Raw data

More information

Background Wikipedia Lee and Mahadavan, JCB, 2009 History (Platform Comparison) P Park, Nature Review Genetics, 2009 P Park, Nature Reviews Genetics, 2009 Rozowsky et al., Nature Biotechnology, 2009

More information

RNA-SEQUENCING ANALYSIS

RNA-SEQUENCING ANALYSIS RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS

More information

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING Ken Chen, Ph.D. kchen@genome.wustl.edu The Genome Center, Washington University in St. Louis The path

More information

Systematic evaluation of spliced alignment programs for RNA- seq data

Systematic evaluation of spliced alignment programs for RNA- seq data Systematic evaluation of spliced alignment programs for RNA- seq data Pär G. Engström, Tamara Steijger, Botond Sipos, Gregory R. Grant, André Kahles, RGASP Consortium, Gunnar Rätsch, Nick Goldman, Tim

More information

Eucalyptus gene assembly

Eucalyptus gene assembly Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

COMPARISON OF GENE FUSION DETECTION TOOLS TO DETECT NOVEL GENE FUSIONS USING A CUSTOM ANNOTATION

COMPARISON OF GENE FUSION DETECTION TOOLS TO DETECT NOVEL GENE FUSIONS USING A CUSTOM ANNOTATION COMPARISON OF GENE FUSION DETECTION TOOLS TO DETECT NOVEL GENE FUSIONS USING A CUSTOM ANNOTATION - current state - 17.02.2017 Carolin Schimmelpfennig c.schimmelpfennig@izi.fraunhofer.de Fraunhofer What

More information

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience Building Excellence in Genomics and Computa5onal Bioscience Resequencing approaches Sarah Ayling Crop Genomics and Diversity sarah.ayling@tgac.ac.uk Why re- sequence plants? To iden

More information

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes 1.1 Division and Differentiation in Human Cells I can state that cellular differentiation is the process by which a cell develops more

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

de novo paired-end short reads assembly

de novo paired-end short reads assembly 1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory Bioinformatics Monthly Workshop Series Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory Schedule for Fall, 2015 PILM Bioinformatics Web Server (09/21/2015)

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

Illumina Genome Analyzer. Progenika Experience. - Susana Catarino -

Illumina Genome Analyzer. Progenika Experience. - Susana Catarino - Illumina Genome Analyzer Progenika Experience - Susana Catarino - Who are we? 2000 PROGENIKA BIOPHARMA Development, production and commercialization of new genomic tools for diagnosis, prognosis and drug-response

More information

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis

More information

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent

More information

Haploid Assembly of Diploid Genomes

Haploid Assembly of Diploid Genomes Haploid Assembly of Diploid Genomes Challenges, Trials, Tribulations 13 October 2011 İnanç Birol Assembly By Short Sequencing IEEE InfoVis 2009 2 3 in Literature ~40 citations on tool comparisons ~20 citations

More information

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), 2012-01-26 What is a gene What is a transcriptome History of gene expression assessment RNA-seq RNA-seq analysis

More information

Agilent NGS Solutions : Addressing Today s Challenges

Agilent NGS Solutions : Addressing Today s Challenges Agilent NGS Solutions : Addressing Today s Challenges Charmian Cher, Ph.D Director, Global Marketing Programs 1 10 years of Next-Gen Sequencing 2003 Completion of the Human Genome Project 2004 Pyrosequencing

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

University of Athens - Medical School. pmedgr. The Greek Research Infrastructure for Personalized Medicine

University of Athens - Medical School. pmedgr. The Greek Research Infrastructure for Personalized Medicine University of Athens - Medical School pmedgr The Greek Research Infrastructure for Personalized Medicine - George Kollias - Professor of Experimental Physiology, Medical School, University of Athens President

More information

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law

More information

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University Course Information Instructor: Luay Nakhleh (nakhleh@rice.edu); office hours by appointment (office: DH 3119) TA: Leo Elworth (DH

More information

Form for publishing your article on BiotechArticles.com this document to

Form for publishing your article on BiotechArticles.com  this document to Your Article: Article Title (3 to 12 words) Article Summary (In short - What is your article about Just 2 or 3 lines) Category Transcriptomics sequencing and lncrna Sequencing Analysis: Quality Evaluation

More information

FFPE in your NGS Study

FFPE in your NGS Study FFPE in your NGS Study Richard Corbett Canada s Michael Smith Genome Sciences Centre Vancouver, British Columbia Dec 6, 2017 Our mandate is to advance knowledge about cancer and other diseases and to use

More information

Performance comparison of five RNA-seq alignment tools

Performance comparison of five RNA-seq alignment tools New Jersey Institute of Technology Digital Commons @ NJIT Theses Theses and Dissertations Spring 2013 Performance comparison of five RNA-seq alignment tools Yuanpeng Lu New Jersey Institute of Technology

More information

Pioneering Clinical Omics

Pioneering Clinical Omics Pioneering Clinical Omics Clinical Genomics Strand NGS An analysis tool for data generated by cutting-edge Next Generation Sequencing(NGS) instruments. Strand NGS enables read alignment and analysis of

More information

DNA METHYLATION RESEARCH TOOLS

DNA METHYLATION RESEARCH TOOLS SeqCap Epi Enrichment System Revolutionize your epigenomic research DNA METHYLATION RESEARCH TOOLS Methylated DNA The SeqCap Epi System is a set of target enrichment tools for DNA methylation assessment

More information

Applications of short-read

Applications of short-read Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Sequencing applications RNA-Seq includes experiments

More information

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow Marcus Hausch, Ph.D. 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life, Oligator,

More information

Top 5 Lessons Learned From MAQC III/SEQC

Top 5 Lessons Learned From MAQC III/SEQC Top 5 Lessons Learned From MAQC III/SEQC Weida Tong, Ph.D Division of Bioinformatics and Biostatistics, NCTR/FDA Weida.tong@fda.hhs.gov; 870 543 7142 1 MicroArray Quality Control (MAQC) An FDA led community

More information

Unit 1 Human cells. 1. Division and differentiation in human cells

Unit 1 Human cells. 1. Division and differentiation in human cells Unit 1 Human cells 1. Division and differentiation in human cells Stem cells Describe the process of differentiation. Explain how differentiation is brought about with reference to genes. Name the two

More information

Design a super panel for comprehensive genetic testing

Design a super panel for comprehensive genetic testing Design a super panel for comprehensive genetic testing Rong Chen, Ph.D. Assistant Professor Director of Clinical Genome Sequencing Dept. of Genetics and Genomic Sciences Institute for Genomics and Multiscale

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to

More information

Target Enrichment Strategies for Next Generation Sequencing

Target Enrichment Strategies for Next Generation Sequencing Target Enrichment Strategies for Next Generation Sequencing Anuj Gupta, PhD Agilent Technologies, New Delhi Genotypic Conference, Sept 2014 NGS Timeline Information burst Nearly 30,000 human genomes sequenced

More information

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010 Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS COURSE OF BIOINFORMATICS a.a. 2016-2017 Introduction to BIOINFORMATICS What is Bioinformatics? (I) The sinergy between biology and informatics What is Bioinformatics? (II) From: http://www.bioteach.ubc.ca/bioinfo2010/

More information

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality

More information

Supplemental Methods. Exome Enrichment and Sequencing

Supplemental Methods. Exome Enrichment and Sequencing Supplemental Methods Exome Enrichment and Sequencing Genomic libraries were prepared using the Illumina Paired End Sample Prep Kit following the manufacturer s instructions. Enrichment was performed as

More information

Next Generation Sequencing: An Overview

Next Generation Sequencing: An Overview Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation

More information

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen RNA Sequencing Next gen insight into transcriptomes 05-06-2013, Elio Schijlen Transcriptome complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological

More information

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq Sequencing applications Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ RNA-Seq includes experiments

More information