Computational aspects of ncrna research. Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics

Similar documents
RNA Structure and the Versatility of RNA. Mitesh Shrestha

RNA Interference (RNAi) (see also sirna, micrna, shrna, etc.)

RNA Secondary Structure Prediction Computational Genomics Seyoung Kim

RNA Interference (RNAi) (see also mirna, sirna, micrna, shrna, etc.)

Genetic Variability of MicroRNA Genes in 15 Animal Species

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

RNA secondary structure prediction and analysis

DNA Transcription. Dr Aliwaini

Thermo Scientific Dharmacon SMARTvector 2.0 Lentiviral shrna Particles

DNA Transcription. Visualizing Transcription. The Transcription Process

Gene Expression and Heritable Phenotype. CBS520 Eric Nabity

WORKING WITH THE FIGURES. 1. In Figure 8-3, why are the arrows for genes 1 and 2 pointing in opposite directions?

Name Class Date. Practice Test

Chapter 8 Lecture Outline. Transcription, Translation, and Bioinformatics

RNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University

BIOINFORMATICS ORIGINAL PAPER

TaqMan Advanced mirna Assays

90 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 4, 2006

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

RNA-Sequencing analysis

Product Applications for the Sequence Analysis Collection

CHAPTER 21 LECTURE SLIDES

RNA metabolism. DNA dependent synthesis of RNA RNA processing RNA dependent synthesis of RNA and DNA.

Galina Gabriely, Ph.D. BWH/HMS

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Year III Pharm.D Dr. V. Chitra

IDENTIFICATION AND CLASSIFICATION OF SMALL RNAS IN TRANSCRIPTOME SEQUENCE DATA

Transcription in Eukaryotes

Eukaryotic Gene Structure

Laboratoire IBISC. Biologie Intégrative et Systèmes Complexes

RNA is a single strand molecule composed of subunits called nucleotides joined by phosphodiester bonds.

Plants Fight it out Intrinsic defence mechanism The magic world of Gene silencing

Genome Sequence Assembly

Gene Expression: Transcription

Self-test Quiz for Chapter 12 (From DNA to Protein: Genotype to Phenotype)

Concepts and Methods in Developmental Biology

Genetics - Problem Drill 19: Dissection of Gene Function: Mutational Analysis of Model Organisms

sirna Overview and Technical Tips

COMPUTATIONAL DISCOVERY OF ANIMAL SMALL RNA GENES AND TARGETS. Inauguraldissertation. zur. Erlangung der Würde eines Doktors der Philosophie

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

MATH 5610, Computational Biology

Comparison of Commercial Transfection Reagents: Cell line optimized transfection kits for in vitro cancer research.

BIOLOGY LTF DIAGNOSTIC TEST DNA to PROTEIN & BIOTECHNOLOGY

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

TRANSCRIPTION AND PROCESSING OF RNA


Chapter 8: DNA and RNA

DNA Structure and Replication, and Virus Structure and Replication Test Review

RFMirTarget: A Random Forest Classifier for Human mirna Target Gene Prediction

Optimizing Synthetic DNA for Metabolic Engineering Applications. Howard Salis Penn State University

Solutions to Quiz II

TOOLS sirna and mirna. User guide

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Learning Objectives. Define RNA interference. Define basic terminology. Describe molecular mechanism. Define VSP and relevance

GENETIC VARIABILITY OF microrna GENES IN FARM ANIMALS

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Mapping strategies for sequence reads

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

MISSION shrna Library: Next Generation RNA Interference

Protein Synthesis: Transcription and Translation

13.1 RNA. Lesson Objectives. Lesson Summary

Single Cell Genomics

Transcription & post transcriptional modification

DNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted

Transcription in Prokaryotes. Jörg Bungert, PhD Phone:

2012 GENERAL [5 points]

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided.

Gene-centered resources at NCBI

FirePlex mirna Assay. Multiplex microrna profiling from low sample inputs

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Long and short/small RNA-seq data analysis

Microarray Gene Expression Analysis at CNIO

Chapter 13. From DNA to Protein

measuring gene expression December 5, 2017

Computational and Experimental Identification of C. elegans micrornas

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

Chapter 14 Active Reading Guide From Gene to Protein

Do you remember. What is a gene? What is RNA? How does it differ from DNA? What is protein?

Chapter 11: Regulation of Gene Expression

Click here to read the case study about protein synthesis.

DNA Topoisomerases relieve the supercoiling stress ahead of the fork

Fig Ch 17: From Gene to Protein

Fundamentals of Bioinformatics: computation, biology, computational biology

DNA makes RNA makes Proteins. The Central Dogma

RNA Interference and the World of Small RNAs

Gene Expression Transcription/Translation Protein Synthesis

RNA : functional role

Why learn sequence database searching? Searching Molecular Databases with BLAST

RNA and Protein Synthesis

STUDY GUIDE SECTION 10-1 Discovery of DNA

Cardioviral RNA structure logo analysis: entropy, correlations, and prediction

Non-coding RNA detection and bioinformatics-based analysis. Jonathan E. Cohen, Ph.D. FAES BIOL254 February 22, 2016

GENE EXPRESSION AT THE MOLECULAR LEVEL. Copyright (c) The McGraw-Hill Companies, Inc. Permission required for reproduction or display.


Transcription:

Computational aspects of ncrna research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics

Computational aspects on ncrna Bacterial ncrnas research Gene discovery Target discovery Discovery of transcription regulatory elements for ncrnas

Computational aspects on ncrna mirnas research Gene discovery: automated annotation gene prediction Expression profiling: sample comparisons visualization Target discovery: modeling mirna-mrna target interaction Characterization of regulatory networks involving RNAs: mirna target prediction prediction of transcription regulatory elements

Computational aspects on ncrna sirnas: design research Optimization of silencing efficacy Minimization of off-target effects

ncrna gene prediction Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure. GGACaag GUCC GUGCucauGUAC GGACag GUUC GUAUuuu GUAC Identification of pairs of sites with high mutual information Proportion of mirna sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911) Structure stabilization Mutations that are fixed in evolution preserve RNA structure (covariance models behind trnascan-se (S. Eddy), RNAalifold (I. Hofacker))

ncrna gene prediction Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure. 300-50 50-50 200-200 mir-100 is expected to preserve its hairpin secondary structure through the various steps of mirna biogenesis.

Prediction of bacterial ncrnas

Promoter regions recognized by!70 subunit of E.coli! factor binding site TATA box http://cwx.prenhall.com/horton/medialib/media_portfolio/

RNA hairpins regulate transcription termination http://cwx.prenhall.com/horton/medialib/media_portfolio/

Conserved secondary structures of Vibrio ncrnas Lenz et al. - The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 118:69-82 (2004).

mirna gene discovery Studies driven by computation 1. genome-wide computational prediction 2. validation (Lai et al., 2003 - fly; Lim et al., 2003 - worm; Lim et al., 2003 - vertebrates; Berezikov et al., 2005 - vertebrates; Pfeffer et al., 2005 - viruses). Fast, incomplete. Studies driven by experiment 1. large-scale cloning 2. functional annotation 3. mirna gene prediction 4. validation (Houbaviy et al., 2003 - mouse; Dostie et al., 2003 - rat; Aravin et al., 2003 - fly; Suh et al., 2004 - man; Pfeffer et al., 2004 - man, viruses). Laborious, exhaustive.

Functional annotation of small RNAs Sequences with known function (mrna, rrna, trna, mirna, etc.) Small (16-30 nc) cloned RNAs ALIGNMENT Genome sequence

Functional annotation of small RNAs Small (16-30 nc) cloned RNAs

Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match known sequences rrna trna mirna mrna

Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match genome multiple copies hairpin conservation Novel mirnas rrna trna mirna mrna rrna trna mirna mrna

Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match genome multiple copies hairpin conservation Novel mirnas multiple genome matches rasirna Novel mirnas rrna trna mirna mrna rrna trna mirna mrna rrna trna mirna mrna

mirna gene prediction Issues: find the locations in the genome that can give rise to mirnas predict the sequence of the mature mirna Main clue: mirna precursors form stem loop structures He & Hannon (Nat. Rev. Genet. 2004)

... so do many other genomic regions Fragment of proteincoding gene let-7a mir-147

mirna gene prediction using SVM Build a model from positive and negative examples. Detect candidate stem loops in (large) genomic sequences. Classify candidate stem loops using the model.

mirna gene prediction using SVM hsa-let-7c L = 84 dg = -33.5 kcal/mole Nucleotide composition: A - 20% C - 19% G - 29% U - 32% Paired nucleotides: A-U - 31% G-U - 14% G-C - 29% Proportion of nucleotides in: symmetrical loops - 17% asymmetrical loops - 4% average distance between loops longest symmetrical region longest slighly asymmetrical region negative stem longest symmetrical regions longest slightly asymmetrical region L = 68 dg = -22.6 kcal/mole Pfeffer et al. 2005

mirna gene prediction using SVM Negatives: mrnas, rrnas, trnas, viral stem loops Positives: human genomic regions containing known mirnas Features with largest negative weights: Free energy Nr. nc. in symmetrical loops in LSAR Nr. nc. in asymmetrical loops in LSAR Avg. size of asymmetrical loops Negatives Positives Features with largest positive weights: Stem length Length longest symmetrical region Nr. A-U pairs in LSAR Nr. G-C pairs in LSAR 29% false negatives 3% false positives Used SVMlight http://svmlight.joachims.org/

Detecting candidate stem loops Search for stems whose secondary structure remains the same irrespective of their flanking sequences. example: hsa-mir-100 300-50 50-50 200-200 86% of the known human micrornas belong to such robust stems. Density of robust stems in human genome: approximately 1 every 10 kb.

Classification of candidate stem loops LSR L = 78 dg = 31.6 kcal/mole LSAR mirna precursor? yes: mir-ul1 of CMV (cloning frequency: 101) SVM score: 0.8

Application: mirna gene prediction in viruses Identification of micrornas of the herpesvirus family. Nature Methods (2005).

Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP

Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP

Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP

Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP

Variations on mirna gene prediction " = # w f v f f Lim, L. P. et al. (2003) Genes & Dev. 17:991

Variations on mirna gene prediction Berezikov, E. et al. (2005) Cell 120:21 Proportion of mirna sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)

Variations on mirna gene prediction Xie, X. et al. (2004) Nature 434:338

mirna gene prediction servers http://genes.mit.edu/mirscan/ http://www.mirz.unibas.ch

Prediction of ncrnas using comparative genomics RNAz (www.tbi.univie.ac.at/~wash/rnaz) Start with an alignment of homologous sequences Compute the following features: - mean free energy of aligned sequences - structure conservation index ( SCI = E A / E ) - mean pairwise identity - number of sequences in the alignment Use a SVM to classify candidates E A is the free energy of the alignment (takes into account mutations that preserve the structure), and E is the mean free energy of aligned sequences.

Modeling mirna-mrna interaction for target prediction target: C.e_hbl-1 mirna : cel-let-7 target 5' U GUU C A 3' AUUAUACAACC C ACCUCA UGAUAUGUUGG G UGGAGU mirna 3' U AU A 5' Known mirna-mrna interactions in C.elegans target: C.e._COG-1A mirna : cel-lsy-6 target 5' C CA A 3' GU CUUAUACAAAA CG GAGUAUGUUUU mirna 3' GCUUUA CA 5' target: C.e_LIN-41A mirna : cel-let-7 target 5' U AUU U 3' UUAUACAACC CUGCCUC GAUAUGUUGG GAUGGAG mirna 3' UU AU U 5' Hybrids generated using RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/

Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site).

Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human).

Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna.

Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna. Compare with the number of conserved candidate sites that we get for a random mirna that has approximately the same number of predicted sites in the species of reference.

Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna. Compare with the number of conserved candidate sites that we get for a random mirna that has approximately the same number of predicted sites in the species of reference. Lewis et al. 2005

Modeling mirna-mrna interaction S/N ratio Interaction model Some mirnas have hundreds of targets but many do not.

mirna target prediction servers http://pictar.bio.nyu.edu/

sirna design Empirical rules for sirna design - derived from the work in the Tuschl Lab (sirna user s guide: http://www.rockefeller.edu/labheads/tuschl/sirna.html):

sirna design Refining the rules by analyzing large datasets of sirnas (Reynolds et al. 2004, many others): different sirnas for the same gene can have markedly different silencing efficiencies.

sirna design S<50% S>50% S>80% S>95% +1 +1/A +1 +1 +1 +1-1 -1

sirna design Accesibility of target site influences sirna efficacy: Far et al. (2003) Nucl. Acids Res. 31:4417 Target accessibility prediction server http://sfold.wadsworth.org/index.pl