Finding Genes with Genomics Technologies

Similar documents
PLNT2530 (2018) Unit 6b Sequence Libraries

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Gene Expression Technology

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

SGN-6106 Computational Systems Biology I

Wheat CAP Gene Expression with RNA-Seq

Selected Techniques Part I

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Advanced RNA-Seq course. Introduction. Peter-Bram t Hoen

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

measuring gene expression December 5, 2017

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

Year III Pharm.D Dr. V. Chitra

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

measuring gene expression December 11, 2018

RNA-SEQUENCING ANALYSIS

Motivation From Protein to Gene

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Genomic resources. for non-model systems

Computing with large data sets

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

ChIP-seq and RNA-seq

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer & Barbara Wold

ChIP-seq and RNA-seq. Farhat Habib

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

RNA-Seq data analysis course September 7-9, 2015

Transcriptome analysis

Molecular Cell Biology - Problem Drill 08: Transcription, Translation and the Genetic Code

Chapter 1. from genomics to proteomics Ⅱ

Deep Sequencing technologies

Lesson Overview. Fermentation 13.1 RNA

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

3.1.4 DNA Microarray Technology

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

Chapter 6 - Molecular Genetic Techniques

Section 3: DNA Replication

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

CHAPTERS 16 & 17: DNA Technology

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Genome annotation & EST

RNA standards v May

DNA Function: Information Transmission

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Videos. Lesson Overview. Fermentation

Design. Construction. Characterization

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Authors: Vivek Sharma and Ram Kunwar

Key Area 1.3: Gene Expression

3. Translation. 2. Transcription. 1. Replication. and functioning through their expression in. Genes are units perpetuating themselves

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Successful gene expression studies using validated qpcr assays. Jan Hellemans, CEO Biogazelle webinar October 28 th, 2015

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Wet-lab Considerations for Illumina data analysis

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

Rapid Transcriptome Characterization for a nonmodel organism using 454 pyrosequencing

CONSTRUCTION OF GENOMIC LIBRARY

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Please purchase PDFcamp Printer on to remove this watermark. DNA microarray

Fermentation. Lesson Overview. Lesson Overview 13.1 RNA

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Protein Synthesis: Transcription and Translation

Hershey and Chase. The accumulation of evidence: Key Experiments in the Discovery of DNA: Griffith s Transformation Experiment (1928)

Unit IX Problem 3 Genetics: Basic Concepts in Molecular Biology

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

The Structure of Proteins The Structure of Proteins. How Proteins are Made: Genetic Transcription, Translation, and Regulation

Quick Review of Protein Synthesis

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

RNA-Seq de novo assembly training

DNA Microarray Technology

Automated Electrophoresis Applications for Synthetic Biology

What are proteomics? And what can they tell us about seed maturation and germination?

Combining Techniques to Answer Molecular Questions

Bioinformatics (Lec 19) Picture Copyright: the National Museum of Health

Applied Bioinformatics - Lecture 16: Transcriptomics

Measuring and Understanding Gene Expression

Rapid Method for the Purification of Total RNA from Formalin- Fixed Paraffin-Embedded (FFPE) Tissue Samples

Computational Biology I LSM5191

CBC Data Therapy. Metatranscriptomics Discussion

Microarray. Key components Array Probes Detection system. Normalisation. Data-analysis - ratio generation

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Goals of pharmacogenomics

Quiz Submissions Quiz 4

UNIT 3 GENETICS LESSON #41: Transcription

Transcription:

PLNT2530 Plant Biotechnology (2018) Unit 7 Finding Genes with Genomics Technologies Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada 1

Omics technologies Technology Description What you learn Genomics Transcriptomics Proteomics Metabolomics High throughput DNA sequencing of genomic DNA High throughput sequencing of RNA populations Identification of proteins in a protein population using mass spectrometry of oligopeptides Identification of metabolites in cells which genes are present in an organism, and which alleles which genes are transcribed in a tissue, cell type, or in response to environmental stimuli which proteins are epressed which biochemical products are produced, allowing inference of biochemical pathways 2

Eample: What genes are involved in a disease-resistance reaction? Sham inoculated plants Disease inoculated Resistant plants Disease inoculated Susceptible plants mrna mrna mrna microarray or RNAseq microarray or RNAseq microarray or RNAseq Identify genes showing differential epression between treatments, either from microarray or RNAseq data, and isearch GenBank for the homologous genes of known function. Hope to see different transcripts are present in resistant plant, vs susceptible vs sham inoculated 3

Resistant Genes uniquely epressed in resistant plants Genes occurring under all conditions 1 45 23 Sham 585 771 98 675 Susceptible Genes epressed only in susceptible plants 4

Animation of of microarray analysis http://www.sumanasinc.com/webcontent/animations/content/dnachips.html In the video, they say that oligonucleotide length is about 20 nucleotides. Typical arrays use oligo probes of 60 or 70 nucelotides. From this type of eperiment you would hope to discover which genes are being epressed selectively or differentially under the conditions in question. Microarray technology is notoriously error prone! In most cases, if the eperiment didn t include a minimum of 4, and preferrably 6 biological replicates, don t trust it! Because of the high error associated with microarrays, this technology is largely being replaced by direct sequencing of RNA populations. The latter eliminates nearly all of the errors associated with microarrays. 5

Microarray The current state of the art for microarrays is to do a single dye eperiment on a high-density array with 10,000 or more spots. Each spot contains a 60-mer oligonucleotide probe, corresponding to a particular gene. For quality control, many oligos are located at several spots, to verify that the same level of signal is see, regardless of location on the slide. In this picture, colors, representing the amount of cdna hybridized to each spot, are generated by the computer 6

Results are in the form of a spreadsheet, indicating the level of signal seen for each oligonucleotide probe on the array. 7

Eample: Gene epression in Arabidopsis embryos Different parts of Arabidopsis embryos were microdissected, and RNA was etracted. The RNA was used to measure gene epression in a microarray eperiment. Each line in the heat map at right represents a different gene. Only a small number of the genes are shown. The level of epression for each gene in each part of the embryo, is represented using the color scale shown at right. Michael D. Nodine et al., (2011) A few standing for many: embryo receptor-like kinases Trends in Plant Science 16:211-217. http://d.doi.org/10.1016/j.tplants.2011.01.005 8

RNA-seq (RNA sequencing) Another approach to measuring gene epression: Just sequence the entire mrna population! NGS now lets us do millions of reads at a reasonable price. Why not just sequence a few million cdnas? In principle, the number of transcripts found for each gene should be a good measure of the relative levels of mrna for each gene. Shouldn't it? http://finchtalk.geospiza.com/2009/05/small-rnas-get-smaller.html 9

RNA-seq (RNA sequencing) While in general the quality of the RNA is important to the success of RNA-seq eperiments, the parameter that has the most effect is the degree to which the sample has been enriched for mrna, by eliminating other RNAs. Especially in Eukaryotic RNA populations, mrna usually makes up only a few percent of the total, which is predominantly rrna. If no enrichment procedure was done, the depth of coverage of protein coding genes would be greatly compromised, because the vast majority of reads would be rrna. Although most RNA-seq library preparation protocols have a step for enriching for mrnas, there will always be contamination from other RNAs. For this reason, it is important that there be a step in the data pipeline to eliminate reads that can be identified as other forms of RNA, such as rrna or trna. http://finchtalk.geospiza.com/2009/05/small-rnas-get-smaller.html 10

TTTTTTT TTTTTTT TTTTTTT Moderate salt wash Oligo-dT column TTTTTTT TTTTTTT TTTTTTT TTTTTTT Low salt, higher T TTTTTTT TTTTTTT TTTTTTT TTTTTTT Eluted PolyA RNA 11

RNA-seq - General strategy There are many protocols for RNA sequencing, including Illumina GA/HiSeq, SOLiD, and Roche 454. Although these differ, the RNA-seq can be described generally as shown at right. In some protocols, RNA is sheared, followed by random heamer priming. In other protocols, the entire mrna transcript is used as a template for cdna synthesis, and the cdna is fragmented. Adapters for PCR are ligated onto ds-cdna, followed by PCR amplification. Sequencing reactions are either done from a single end, or for both ends (paired-end). http://cmb.molgen.mpg.de/2ndgenerationsequencing/solas/rna-seq.html 12

RNA-seq - introns complicate the assembly process The illustration at right shows RNA-seq reads aligned to two eukaryotic genes A and B. Reads that span part of an eon are shown as single lines, whereas reads that include parts of two adjacent eons are indicated by V-shaped lines. The presence of introns being spliced out of pre-mrna transcripts means that alignment programs have to check to see whether a read contains part of the 3' end of one intron and part of a 5' end of another intron. We have to already have the genomic sequence to do this. It is also noteable that most transcriptomics eperiments contain reads that map in presumably non-coding intergenic regions. These can either indicate that there are previously-unannotated transcribed regions in the genome, or the presence of untranscribed pseudogenes. Transcriptomics is also revealing that alternative spllicing occurs more frequently in eukaryotic gene epression than was previously appreciated. 13

RNA-seq - Normalization As shown in the illustration above, more reads will be found for larger genes than for smaller genes. In other words, we want to find out the number of reads or fragments that were mapped to each gene in the genome or transcriptome. Consequently, it is necessary to correct gene epression levels for The size of each gene. The total number of reads in the dataset. This makes results comparable across eperiments Depending on whether you are doing single reads or paired-end reads, there are two almost identical formulae. 14

RNA-seq - Normalization Depending on whether you are doing single reads or paired-end reads, there are two almost identical formulae. RPKM - Reads Per Kilobase of transcript per Million mapped reads RPKM = C/LN where C :number of mappable reads on a feature = #reads for single-end reads L : Length of feature (in kb) N: Total number of mappable features (in millions) FPKM - Fragments Per Kilobase of transcript per Million mapped reads FPKM = F/LN where F :number of mappable reads on a feature* = #reads/2 for paried end reads L : Length of feature (in kb) N: Total number of mappable features (in millions) *feature - a small contig representing a particular mrna 15

RNA-seq (RNA sequencing) 16

Proteomics A family of methods for identifying and quantifying proteins which are epressed under specific conditions. Older methods involved isolation of proteins by 2D electrophoresis, but more recent approaches employ more high-throughput methods There are many different strategies. We will only talk about one method of the global strategy for studying the entire population of proteins in cells or tissues. 1 Isolate total protein population from cells (thousands of different proteins) 2 Use proteases to digest proteins into smaller peptides 3 Separate the peptides into size classes by liquid chromatography 4 As fractionated peptides come off the LC column, they are fed into a mass spectrometer to precisely measure their molecular weights 17

Proteomics General procedure: 1 Isolate total protein population from cells (thousands of different proteins) 2 Use proteases to digest proteins into smaller peptides 3 Separate the peptides into size classes by liquid chromatography 4 As fractionated peptides come off the LC column, they are fed into a mass spectrometer to precisely measure their molecular weights, and to quantify how much of each oligopeptide is present. 5 Oligopeptides are compared with a database of proteins to identify which proteins were present in the cell population. 18

Proteomics from http://www.leibniz-fli.de/research/core-facilities-services/cf-proteomics/ 19

Proteomics Whole proteins are too large for mass spectrometry so each protein band is first digested with an enzyme called trypsin which cleaves the polypeptide after every lysine (K) and arginine (R) to yield smaller peptides K K R R R K R R Digestion with trypsin (protease) K Protein K K K K R R R R R Because each protein has a unique sequence, there will be a unique series of peptides generated from trypsin digestion. The miture is mass analyzed and identified if the sequences are in the database 20

Proteomics for measuring gene epression 1) Since each possible oligopeptide has a uniuqe profile on MS, the eact sequence of the oligopeptide can be identified, based on its profile. 2) By comparing the oligopeptides identified with known proteins in a database, the identities of each protein in the population can be determined. 3) Because MS is highly accurate, we also get a measurement of the amount of each protein. from http://www.nature.com/nrm/journal/v11/n6/fig_tab/nrm2900_f2.html 21

Metabolomics Similarly many of the proteins in a cell are enzymes which catalyze reactions leading to a large set of products. The set of primary and secondary products produced from the enzymes of metabolism in the cell at any instance is referred to as the metabolome. primary metabolism: eg amino acids, sugars, nucleotides Secondary metabolism: eg. carotenoids, phenolic cmps 10,000+ metabolites in a cell at any instant Metabolomics: A family of techniques for separating metabolites from a cell based on charge, molecular weight, and other characteristics. 22

Metabolomics Each type of molecule has a distinct set of characteristics. Profiles from methods such as Mass spectrometry, gas chromatography, liquid chromotography are compared to a database of molecules, to identify each metabolite. http://fiehnlab.ucdavis.edu/staff/kind/metabolomics/lipidanalysis/lipidomics-small.png 23

Genome Transcriptome Proteome Metabolome Information slices of the same cells at the same time under the same conditions 24