Quality Filtering of Illumina Sequences. Susan Huse Brown University August 6, 2015

Size: px
Start display at page:

Download "Quality Filtering of Illumina Sequences. Susan Huse Brown University August 6, 2015"

Transcription

1 Quality Filtering of Illumina Sequences Susan Huse Brown University August 6, 2015

2 Illumina FASTQ Files File naming: NA10831_ATCACG_L002_R1_001.fastq.gz FA1_S1_L001_R1_001.fastq.gz Sample_Barcode/Index_Lane_Read#_Set#.fastq.gz Sequence A8T0A:1:1101:14740:1627 : Run# : FlowcellID : Lane : Tile : X : Y Read : Filtered : Control# : Barcode/Index

3 @sequence_id sequence + quality FASTQ Format 4 lines per 1:N:0:1 CCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGC AGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCA GGAAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAG CAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAA AGGGAGCGTAGACGGCAGCGCAAGTCTGGAGTGAAATGCCGGGGCCCAACCC CGGCCCTGCTTTGGAACCCGTCCCGCTCCAGTGCGGGCGGG + 88CCCGDBAF)===CEFFGGGG>GGGGGGCCFGGGGGDFGGGGDCFGGGFED CFG:@CFCGGGGGGG?FFG9FFFGG9ECEFGGGDFGGGFFEFAFAFFEFECE F@4AFD85CFFAA?7+C@FFF<,A?,,,,,,AFFF77BFC,8>,>8D@FFFF G,ACGGGCFG>*57;*6=C58:?<)9?:=:C*;;@C?3977@C7E*;29>/= +2**)75):17)8@EE3>D59>)>).)61)4>(6*+/)@F ??D1 :0)((,((.(.+)(()(-(*-(-((-,,(.(.)),(-0)))

4 Assembly vs. Amplicons Genome Assembly Drops reads that don t match Calculates consensus base at each posison Amplicon and Metagenomics Every read represents an independent copy of the source DNA poor quality sequences become novel organisms or genes

5 Phred Scores Q = -10 * log (p) 1. Take the log of the probability of error 2. Convert to positive integer p Q 0.1 (10%) (1%) (0.1%) (0.001%) 40

6 Theoretical Phred Scores vs Error Probability 1 Probability of Error Phred Scores

7 Theoretical Phred Scores vs Error Probability Probability of Error Phred Scores

8 Fastq Formahed Quality # $ % & ( ) * , -. / : ; < = >? A B C D E F G H I Letters are good

9 Reading FASTQ 1:N:0:1 CCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGC AGCGACGCCGCGTGAGTGAAGAAGTATCTCGGTATGTAAAGCTCTATCAGCA GGAAAGATAATGACGGTACCTGACTAAGAAGCCCCGGCTAACTACGTGCCAG CAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAA AGGGAGCGTAGACGGCAGCGCAAGTCTGGAGTGAAATGCCGGGGCCCAACCC CGGCCCTGCTTTGGAACCCGTCCCGCTCCAGTGCGGGCGGG + 88CCCGDBAF)===CEFFGGGG>GGGGGGCCFGGGGGDFGGGGDCFGGGFED CFG:@CFCGGGGGGG?FFG9FFFGG9ECEFGGGDFGGGFFEFAFAFFEFECE F@4AFD85CFFAA?7+C@FFF<,A?,,,,,,AFFF77BFC,8>,>8D@FFFF G,ACGGGCFG>*57;*6=C58:?<)9?:=:C*;;@C?3977@C7E*;29>/= +2**)75):17)8@EE3>D59>)>).)61)4>(6*+/)@F ??D1 1:N:0:1

10 Why filter infrequent errors? Ns Average 454 Error Rate Errors / 400nt Percent of Reads 0 or more 0.40% % % % If we include all reads with or without Ns, we have an overall error rate of 0.4%. If, however, we remove all sequences with Ns, we have an overall error rate of 0.4%. Why bother??

11 It s all in your perspective

12 Low Percentage, but High Errors Ns Average Error Rate Errors / 400nt Percent of Reads % % % % % % % % % % % % Low-quality reads can be interpreted as unique organisms: 2nt = 0.13% * 1 million reads = 1,300 unique organisms

13 Impact of Error Rates # errors = (error rate) * (# bases sequenced) Predicted number of errors increases with sequencing depth at Q30 = * 100,000 bases [Sanger] = 100 bases * 300,000,000 bases [Illumina] = 300 thousand bases

14 Low- Quality Reads and Errant OTUs

15 Errant OTUs If a low- quality sequence is >3% from its source, it can create a new OTU. If the rate of an errant read = 1 in 10 thousand, and we have 1 million reads: * 1,000,000 reads = 100 errant OTUs

16 Errant OTUs Errant OTUs as percent of OTUs decreases with diversity. If we have 100 errant OTUs: Mock community: 100 / 50 OTUs = +200% Diverse community: 100 / 2,000 OTUs = +5%

17 Cleaning Data Denoising improve noisy base calls, remap reads Filtering remove low- quality reads and non- target sequences Trimming prune low- quality ends Chimeras remove chimeric reads AggregaMng combine similar reads or taxa

18 EvaluaSng Error Paherns 1. Sequence known template 2. Align the actual read sequences against the expected sequences 3. Evaluate distribuson of sequencing errors 4. Find correlasons between measurable parameters and error rates 5. Assess the contribuson of each error pahern to the overall error rate.

19 Defining the Error Rate Error Rate = the number of errors per base subsstusons + insersons + delesons + uncalled alignment length Read: Template: AGCNC-ATAACTCTG AGCTCGAC--CAGCT 6/15 = 0.40

20 Illumina quality scores reflect error rates Figure 6 Minoche et al Genome Biology

21 Figure 5 Low Q are low quality, High Q usually high quality Minoche et al Genome Biology

22 DistribuSon of Errors Cumulative Percent of Errors 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 79% of error bases have a quality score <=16 12% of error bases have a quality score >= Quality Score

23 Expected error rates after filtering (adapted from Minoche et al, Table 2) Filter PhiX- GAIIx Error Rate / (% of bases discarded) No filter (0.0%) ChasSty filter, Illumina (signal intensity rasos) (17.8%) Low quality tails (25.8%) Ns (15.6%) C33 (Q<30 for 1/3 of bases in 1 st half) (21.7%) ChF +LQ- tail + N + C (28.9%)

24 Sequences with Ns NTAGCACCAAACATAAATCACCTCACTTAAGTGGCTGGAGACAAATAATCTCTTTAATAACCTGATTCAGCGAAACCAATCCGCGGCATTTAGTAGCGGTA NTAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATG NGCGCCAATATGAGAAGAGCCATACCGCTGATTCTGCGTTTGCTGATGAACTAAGTCAACCTCAGCACTAACCTTGCGAGTCATTTCTTTGATTTGGTCAT NGTAAAAATGTCTACAGTAGAGTCAATAGCAAGGCCACGACGCAATGGAGAAAGACGGAGAGCGCCAACGGCGTCCATCTCGAAGGAGTCGCCAGCGATAA NTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTC CAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTNTNNNNNAATNNNNNNNNNNNNNNNNNNNNNNNCANNNNNTNGNNNNANNNNNTTGAGTGTGAGGT CGGATTGTTCAGTAACTTGACTCATGATTTCTTACCTATTAGTGGTTNAACANNNNNNNNNNNNNATAGTAATCCACGCTCTTNTAANATGTCAACAAGAG TATGCGCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGAATTTTACCAATGACCANNNCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAG TAGAAGTCGTCATTTGGCGAGAAAGCTCAGTCTCAGGAGGAAGCGGAGCAGTCCAAANNNTTTTGAGATGGCAGCAACGGAAACCATAACGAGCATCATCT TGCTGTTGAGTGGTCTCATGACAATAAAGTATGTCNCTGNNTTGAAGNNTNNNNNNNNNNNNNNNCTNATACAATCACGCNCANNNNNAAAAGTGTCGTGT CTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGNCTTANNNNNNNNNNNNTGGCGACCCTGTTTTGTATGGCANCTTGCCGCCGCGT CGGCAGAAGCCTGAATGAGCTTAATAGAGGCCAAAGCGGTCTGGAAACGTACGGATTNNNNAGTAACTTGACTCATGATTTCTTACCTATTAGTGGTTGAA GTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTTCTGCTCTTGNTGGTNNCNNNNNNNNNAAATTGTTTGGAGGCGGTCAAAANGCCGCCTCCGGTG ATATCAACCACACCAGAAGCAGCATCAGTGACGACATTAGAAATATCCTTTGNAGTNNNNNNNNTATGAGAAGAGCCATACCGCTGATTCTGCGTTTGCTG In this dataset: 68 reads contained at least 1 N, of these: 24 (35%) contain more than 1 N 14 (21%) could not be mapped to PhiX, 7 of those 14 (50%) had only 1 N

25 Paired- End Amplicons A smaller insert size provides sequence overlap Read 1 (forward) Sequence overlap Read 2 (reverse)

26 Complete Overlap (V6) Ensures high-quality reads Does not ensure perfect data Read 1 (forward) TGGTCTTGACATCCACAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTGTGAGAC TGGTCTTGACATCCACAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTGTGAGAC Read 2 (reverse) Eren, AM et al (2013) PLoS ONE 8(6)

27 Comparing Filtering Methods Low Quality Perfect Overlap 975,410 (26%) Bokulich et al 391,993 (11%) Minoche et al 435,925 (12%) Figure 3 Eren et al (2013) PLoS ONE

28 Comparing Filtering Methods High Quality Perfect Overlap 2,707,801 (74%) Bokulich et al 3,291,218 (89%) Minoche et al 3,247,286 (88%) Figure 3 Eren et al (2013) PLoS ONE

29 Imperfect Overlap CorrecSon Requiring perfect overlap removes up to 20-30% of the reads. Can we use quality scores to correct bases in the overlap region rather dropping ensre reads? 1. Align the overlap region 2. Compare bases and quality scores 3. Assign most probable base and correct qual score

30 Edgar and Flyvbjerg (2015) Bioinformatics

31 Imperfect Overlap CorrecSon USEARCH fastq_mergepairs Edgar and Flyvbjerg (2015) Bioinforma8cs PANDASeq Masella et al. (2012) BMC Bioinforma8cs merge- illumina- pairs Eren et al. (2013) PLoS ONE PEAR Zhang et al (2014) Bioinforma8cs

32 Paired Overlap Parameters Matter IniSal Reads 1,536,548 Merge- illumina- pairs (max 3 mismatches) 996,139 à 467,792 uniques PANDASeq (no Ns, 95% similarity) 996,139 à 804,546 uniques

33 Edgar and Flyvbjerg (2015) Bioinformatics

34 Denoising Assume that sequencing errors lead to a stassscal distribuson of reads around the more abundant true error- free sequence. Use the error distribuson to map probable error reads to their probable source sequence. AmpliconNoise (454) - Quince et al (2011) BMC Bioinforma8cs DADA (DADA2) - Rosen et al (2012) BMC Bioinforma8cs

35 What other sources of error should we check for?

36 Chimeras Not an error in sequencing but in amplificason (see Chimeras lecture)

37 Non SSU rrna AmplificaSon Conserved inner membrane protein cardiolipin synthase Predicted major pilin subunit 16S rrna DNA binding transcriptional dual regulator, tyrosinebinding Putative transport system permease protein Predicted antibiotic transporter 16S rrna Courtesy of Hilary Morrison

38 16S From Other Domains SSU region Total Reads Bacteria Archaea Organelle Unknown V6 529,359 96% 0.02% 4% 0.1% V6- V4 3,437,855 87% 0.3% 8% 4% Use taxonomic filtering to remove non-target DNA Samples from Little Sippewissett Marsh. Organelles include mitochondria and chloroplasts

39 Bar Hopping Barcoding Errors can cause reads to hop from one sample to another AGATC = Sample1 AGATT = Sample2 AGATA =??? Always use codes >= 2nt different Always require <=1 mismatches (0=best)

40 AggregaSng Small Errors Taxonomic assignments are generally consistent despite a few mismatches. More so at coarser taxonomic levels (class vs. genus) OTU Clustering and Oligotyping round out small percentages of errors depending on the algorithm used. Clustering at 3% can (but does not always!) aggregate sequences with 1 2% errors.

41 Singleton Errares Singletons can be: valid = rare organisms or invalid = sequencing errors Singletons that pass quality control can only be validated ecologically

42 Singleton Errares Valid singletons represent rare organisms Invalid singletons are sequencing errors Absolute number of errors increases with sampling depth. Errors as percent of uniques decreases with diversity If you choose to remove singletons, only a er filtering, aggregason, and comparison across datasets.

43 Navy minesweeper runs aground, due to faulty charts

44 General Caveats Always maintain a healthy skepscism about the quality of any sequencing data Never underessmate the presence or impact of low- quality data or untargeted DNA Not all infrequent sequences are bad sequences Be vigilant for taxonomically- biased filtering Don t skimp on quality filtering!!!

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2

More information

Introduction to OTU Clustering. Susan Huse August 4, 2016

Introduction to OTU Clustering. Susan Huse August 4, 2016 Introduction to OTU Clustering Susan Huse August 4, 2016 What is an OTU? Operational Taxonomic Units a.k.a. phylotypes a.k.a. clusters aggregations of reads based only on sequence similarity, independent

More information

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 Read Quality Assessment & Improvement UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 QA&I should be interactive Error modes Each technology has unique error modes, depending on the physico-chemical

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

mothur Workshop for Amplicon Analysis Michigan State University, 2013

mothur Workshop for Amplicon Analysis Michigan State University, 2013 mothur Workshop for Amplicon Analysis Michigan State University, 2013 Tracy Teal MMG / ICER tkteal@msu.edu Kevin Theis Zoology / BEACON theiskev@msu.edu mothur Mission to develop a single piece of open-source,

More information

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta Quality assessment and control of sequence data Naiara Rodríguez-Ezpeleta Workshop on Genomics 2014 Quality control is important Some of the artefacts/problems that can be detected with QC Sequencing Sequence

More information

USEARCH software and documentation Copyright Robert C. Edgar All rights reserved.

USEARCH software and documentation Copyright Robert C. Edgar All rights reserved. USEARCH software and documentation Copyright 2010-11 Robert C. Edgar All rights reserved http://drive5.com/usearch robert@drive5.com Version 5.0 August 22nd, 2011 Contents Introduction... 3 UCHIME implementations...

More information

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

Illumina Read QC. UCD Genome Center Bioinformatics Core Monday 29 August 2016

Illumina Read QC. UCD Genome Center Bioinformatics Core Monday 29 August 2016 Illumina Read QC UCD Genome Center Bioinformatics Core Monday 29 August 2016 QC should be interactive Error modes Each technology has unique error modes, depending on the physico-chemical processes involved

More information

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales Targeted Sequencing Using Droplet-Based Microfluidics Keith Brown Director, Sales brownk@raindancetech.com Who we are: is a Provider of Microdroplet-based Solutions The Company s RainStorm TM Technology

More information

Bioinformatic Suggestions on MiSeq-Based Microbial Community S

Bioinformatic Suggestions on MiSeq-Based Microbial Community S J. Microbiol. Biotechnol. (2015), 25(6), 765 770 http://dx.doi.org/10.4014/jmb.1409.09057 Review Research Article jmb Bioinformatic Suggestions on MiSeq-Based Microbial Community S Analysis Tatsuya Unno*

More information

DATA FORMATS AND QUALITY CONTROL

DATA FORMATS AND QUALITY CONTROL HTS Summer School 12-16th September 2016 DATA FORMATS AND QUALITY CONTROL Romina Petersen, University of Cambridge (rp520@medschl.cam.ac.uk) Luigi Grassi, University of Cambridge (lg490@medschl.cam.ac.uk)

More information

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

An introduction into 16S rrna gene sequencing analysis. Stefan Boers An introduction into 16S rrna gene sequencing analysis Stefan Boers Microbiome, microbiota or metagenomics? Microbiome The entire habitat, including the microorganisms, their genomes (i.e., genes) and

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Applications of Next Generation Sequencing in Metagenomics Studies

Applications of Next Generation Sequencing in Metagenomics Studies Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno

More information

Novel bacterial taxa in the human microbiome

Novel bacterial taxa in the human microbiome Washington University School of Medicine Digital Commons@Becker Open Access Publications 2012 Novel bacterial taxa in the human microbiome Kristine M. Wylie Washington University School of Medicine in

More information

SHAMAN : SHiny Application for Metagenomic ANalysis

SHAMAN : SHiny Application for Metagenomic ANalysis SHAMAN : SHiny Application for Metagenomic ANalysis Stevenn Volant, Amine Ghozlane Hub Bioinformatique et Biostatistique C3BI, USR 3756 IP CNRS Biomics CITECH Ribosome ITS (1) : located between 18S and

More information

Quality assessment and control of sequence data

Quality assessment and control of sequence data Quality assessment and control of sequence data Naiara Rodríguez-Ezpeleta Workshop on Genomics 2015 Cesky Krumlov fastq format fasta Most basic file format to represent nucleotide or amino-acid sequences

More information

David Jacob Meltzer m. Supervisor: Dr. Umer Zeeshan Ijaz

David Jacob Meltzer m. Supervisor: Dr. Umer Zeeshan Ijaz AMPLIpyth: A Python Pipeline for Amplicon Processing David Jacob Meltzer 0803837m MSc Bioinformatics, Polyomics and Systems Biology Supervisor: Dr. Umer Zeeshan Ijaz A report submitted in partial fulfillment

More information

Analysing genomes and transcriptomes using Illumina sequencing

Analysing genomes and transcriptomes using Illumina sequencing Analysing genomes and transcriptomes using Illumina uencing Dr. Heinz Himmelbauer Centre for Genomic Regulation (CRG) Ultrauencing Unit Barcelona The Sequencing Revolution High-Throughput Sequencing 2000

More information

How much sequencing do I need? Emily Crisovan Genomics Core

How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best?

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Genomics AGRY Michael Gribskov Hock 331

Genomics AGRY Michael Gribskov Hock 331 Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will

More information

Measuring transcriptomes with RNA-Seq

Measuring transcriptomes with RNA-Seq Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2017 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC

More information

Tutorial. Whole Metagenome Functional Analysis (beta) Sample to Insight. November 21, 2017

Tutorial. Whole Metagenome Functional Analysis (beta) Sample to Insight. November 21, 2017 Whole Metagenome Functional Analysis (beta) November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

REGULATION OF PROTEIN SYNTHESIS. II. Eukaryotes

REGULATION OF PROTEIN SYNTHESIS. II. Eukaryotes REGULATION OF PROTEIN SYNTHESIS II. Eukaryotes Complexities of eukaryotic gene expression! Several steps needed for synthesis of mrna! Separation in space of transcription and translation! Compartmentation

More information

BIO 311C Spring Lecture 36 Wednesday 28 Apr.

BIO 311C Spring Lecture 36 Wednesday 28 Apr. BIO 311C Spring 2010 1 Lecture 36 Wednesday 28 Apr. Synthesis of a Polypeptide Chain 5 direction of ribosome movement along the mrna 3 ribosome mrna NH 2 polypeptide chain direction of mrna movement through

More information

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome Allali et al. BMC Microbiology (2017) 17:194 DOI 10.1186/s12866-017-1101-8 RESEARCH ARTICLE Open Access A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the

More information

2012 GENERAL [5 points]

2012 GENERAL [5 points] GENERAL [5 points] 2012 Mark all processes that are part of the 'standard dogma of molecular' [ ] DNA replication [ ] transcription [ ] translation [ ] reverse transposition [ ] DNA restriction [ ] DNA

More information

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Data Analysis with CASAVA v1.8 and the MiSeq Reporter Data Analysis with CASAVA v1.8 and the MiSeq Reporter Eric Smith, PhD Bioinformatics Scientist September 15 th, 2011 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense

More information

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia Development of NGS metabarcoding for the characterization of aerobiological samples Lucia Muggia Alberto Pallavicini, Elisa Banchi, Claudio G. Ametrano, David Stankovic, Silvia Ongaro, Enrico Tordoni,

More information

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality

More information

HLA and Next Generation Sequencing it s all about the Data

HLA and Next Generation Sequencing it s all about the Data HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public

More information

Next-Generation Sequencing Services à la carte

Next-Generation Sequencing Services à la carte Next-Generation Sequencing Services à la carte www.seqme.eu ngs@seqme.eu SEQme 2017 All rights reserved The trademarks and names of other companies and products mentioned in this brochure are the property

More information

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5 Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate

More information

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

Comparative Analysis of Fungal Primers

Comparative Analysis of Fungal Primers Comparative Analysis of Fungal Primers Background Most eukaryotes encode ribosomal genes in an operon, with a relatively unconserved internal transcribed spacer (ITS) between conserved genes (order = 18S

More information

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John

More information

CSE182-L16. LW statistics/assembly

CSE182-L16. LW statistics/assembly CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

Measuring transcriptomes with RNA-Seq. BMI/CS 776  Spring 2016 Anthony Gitter Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu Overview RNA-Seq technology The RNA-Seq quantification problem Generative

More information

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG

More information

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016 Warm Up Questions 1. Where is DNA located? 2. Name the 3 parts of a nucleotide. 3. Enzymes can catalyze many different reactions (T or F) 4. How many variables should you have in an experiment? 5. A red

More information

NOTES Gene Expression ACP Biology, NNHS

NOTES Gene Expression ACP Biology, NNHS Name Date Block NOTES Gene Expression ACP Biology, NNHS Model 1: Transcription the process of genes in DNA being copied into a messenger RNA 1. Where in the cell is DNA found? 2. Where in the cell does

More information

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

CHAPTER 17 FROM GENE TO PROTEIN. Section C: The Synthesis of Protein

CHAPTER 17 FROM GENE TO PROTEIN. Section C: The Synthesis of Protein CHAPTER 17 FROM GENE TO PROTEIN Section C: The Synthesis of Protein 1. Translation is the RNA-directed synthesis of a polypeptide: a closer look 2. Signal peptides target some eukaryotic polypeptides to

More information

Serial Analysis of Gene Expression

Serial Analysis of Gene Expression Serial Analysis of Gene Expression Cloning of Tissue-Specific Genes Using SAGE and a Novel Computational Substraction Approach. Genomic (2001) Hung-Jui Shih Outline of Presentation SAGE EST Article TPE

More information

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College. Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:

More information

Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar

Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Introduction Protein-coding gene prediction RNA gene prediction Modification

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

Assembly of Ariolimax dolichophallus using SOAPdenovo2

Assembly of Ariolimax dolichophallus using SOAPdenovo2 Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

Do you remember. What is a gene? What is RNA? How does it differ from DNA? What is protein?

Do you remember. What is a gene? What is RNA? How does it differ from DNA? What is protein? Lesson 1 - RNA Do you remember What is a gene? What is RNA? How does it differ from DNA? What is protein? Gene Segment of DNA that codes for building a protein DNA code is copied into RNA form, and RNA

More information

NGS sequence preprocessing. José Carbonell Caballero

NGS sequence preprocessing. José Carbonell Caballero NGS sequence preprocessing José Carbonell Caballero jcarbonell@cipf.es Contents Data Format Quality Control Sequence capture Fasta and fastq formats Sequence quality encoding Evaluation of sequence quality

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

Real-Time PCR: Practical Issues and Troubleshooting Mehmet Tevfik DORAK, MD PhD

Real-Time PCR: Practical Issues and Troubleshooting Mehmet Tevfik DORAK, MD PhD Real-Time PCR: Practical Issues and Troubleshooting Mehmet Tevfik DORAK, MD PhD Dept of Environmental & Occupational Health Robert Stempel College of Public Health and Social Work Florida International

More information

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015).

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015). F op-scoring motif Optimized motifs E Input sequences entral 1 bp region Dinucleotideshuffled seqs B D ll B1H-R predicted motifs Enriched B1H- R predicted motifs L!=!7! L!=!6! L!=5! L!=!4! L!=!3! L!=!2!

More information

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics RNA Sequencing T TM variation genetics validation SNP ncrna metagenomics private trio de novo exome mendelian ChIP-seq RNA DNA bioinformatics custom target high-throughput resequencing storage ncrna comparative

More information

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Genotype calling Genotyping methods for Affymetrix arrays Genotyping

More information

Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar

Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Introduction Protein-coding gene prediction RNA gene prediction Modification

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

CLASS 3.5: 03/29/07 EUKARYOTIC TRANSCRIPTION I: PROMOTERS AND ENHANCERS

CLASS 3.5: 03/29/07 EUKARYOTIC TRANSCRIPTION I: PROMOTERS AND ENHANCERS CLASS 3.5: 03/29/07 EUKARYOTIC TRANSCRIPTION I: PROMOTERS AND ENHANCERS A. Promoters and Polymerases (RNA pols): 1. General characteristics - Initiation of transcription requires a. Transcription factors

More information

Engineering Genetic Circuits

Engineering Genetic Circuits Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art

More information

DNA Model Stations. For the following activity, you will use the following DNA sequence.

DNA Model Stations. For the following activity, you will use the following DNA sequence. Name: DNA Model Stations DNA Replication In this lesson, you will learn how a copy of DNA is replicated for each cell. You will model a 2D representation of DNA replication using the foam nucleotide pieces.

More information

axe Documentation Release g6d4d1b6-dirty Kevin Murray

axe Documentation Release g6d4d1b6-dirty Kevin Murray axe Documentation Release 0.3.2-5-g6d4d1b6-dirty Kevin Murray Jul 17, 2017 Contents 1 Axe Usage 3 1.1 Inputs and Outputs..................................... 4 1.2 The barcode file......................................

More information

Introductory Next Gen Workshop

Introductory Next Gen Workshop Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview

More information

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Bioinformatics Advance Access published October 8, 2012 COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Binghang Liu 1,2,, Jianying Yuan 2,, Siu-Ming Yiu 1,3,

More information

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis Lina L. Faller Department of Computer Science University of New Hampshire June 2008

More information

Chapter 13. From DNA to Protein

Chapter 13. From DNA to Protein Chapter 13 From DNA to Protein Proteins All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequenceof a gene The Path From Genes to

More information

Assigning Sequences to Taxa CMSC828G

Assigning Sequences to Taxa CMSC828G Assigning Sequences to Taxa CMSC828G Outline Objective (1 slide) MEGAN (17 slides) SAP (33 slides) Conclusion (1 slide) Objective Given an unknown, environmental DNA sequence: Make a taxonomic assignment

More information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical

More information

Eukaryotic & Prokaryotic Transcription. RNA polymerases

Eukaryotic & Prokaryotic Transcription. RNA polymerases Eukaryotic & Prokaryotic Transcription RNA polymerases RNA Polymerases A. E. coli RNA polymerase 1. core enzyme = ββ'(α)2 has catalytic activity but cannot recognize start site of transcription ~500,000

More information

Nature Methods: doi: /nmeth.4396

Nature Methods: doi: /nmeth.4396 Supplementary Figure 1 Comparison of technical replicate consistency between and across the standard ATAC-seq method, DNase-seq, and Omni-ATAC. (a) Heatmap-based representation of ATAC-seq quality control

More information

scgem Workflow Experimental Design Single cell DNA methylation primer design

scgem Workflow Experimental Design Single cell DNA methylation primer design scgem Workflow Experimental Design Single cell DNA methylation primer design The scgem DNA methylation assay uses qpcr to measure digestion of target loci by the methylation sensitive restriction endonuclease

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

3.1.4 DNA Microarray Technology

3.1.4 DNA Microarray Technology 3.1.4 DNA Microarray Technology Scientists have discovered that one of the differences between healthy and cancer is which genes are turned on in each. Scientists can compare the gene expression patterns

More information

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 1, 2004 Prokaryotic Gene Structure prokaryotes are simplest free-living organisms studying prokaryotes can give us a sense what is the minimum

More information

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist

More information

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes? Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology

More information

DNA Transcription. Visualizing Transcription. The Transcription Process

DNA Transcription. Visualizing Transcription. The Transcription Process DNA Transcription By: Suzanne Clancy, Ph.D. 2008 Nature Education Citation: Clancy, S. (2008) DNA transcription. Nature Education 1(1) If DNA is a book, then how is it read? Learn more about the DNA transcription

More information