Computational aspects of ncrna research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics
Computational aspects on ncrna Bacterial ncrnas research Gene discovery Target discovery Discovery of transcription regulatory elements for ncrnas
Computational aspects on ncrna mirnas research Gene discovery: automated annotation gene prediction Expression profiling: sample comparisons visualization Target discovery: modeling mirna-mrna target interaction Characterization of regulatory networks involving RNAs: mirna target prediction prediction of transcription regulatory elements
Computational aspects on ncrna sirnas: design research Optimization of silencing efficacy Minimization of off-target effects
ncrna gene prediction Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure. GGACaag GUCC GUGCucauGUAC GGACag GUUC GUAUuuu GUAC Identification of pairs of sites with high mutual information Proportion of mirna sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911) Structure stabilization Mutations that are fixed in evolution preserve RNA structure (covariance models behind trnascan-se (S. Eddy), RNAalifold (I. Hofacker))
ncrna gene prediction Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure. 300-50 50-50 200-200 mir-100 is expected to preserve its hairpin secondary structure through the various steps of mirna biogenesis.
Prediction of bacterial ncrnas
Promoter regions recognized by!70 subunit of E.coli! factor binding site TATA box http://cwx.prenhall.com/horton/medialib/media_portfolio/
RNA hairpins regulate transcription termination http://cwx.prenhall.com/horton/medialib/media_portfolio/
Conserved secondary structures of Vibrio ncrnas Lenz et al. - The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 118:69-82 (2004).
mirna gene discovery Studies driven by computation 1. genome-wide computational prediction 2. validation (Lai et al., 2003 - fly; Lim et al., 2003 - worm; Lim et al., 2003 - vertebrates; Berezikov et al., 2005 - vertebrates; Pfeffer et al., 2005 - viruses). Fast, incomplete. Studies driven by experiment 1. large-scale cloning 2. functional annotation 3. mirna gene prediction 4. validation (Houbaviy et al., 2003 - mouse; Dostie et al., 2003 - rat; Aravin et al., 2003 - fly; Suh et al., 2004 - man; Pfeffer et al., 2004 - man, viruses). Laborious, exhaustive.
Functional annotation of small RNAs Sequences with known function (mrna, rrna, trna, mirna, etc.) Small (16-30 nc) cloned RNAs ALIGNMENT Genome sequence
Functional annotation of small RNAs Small (16-30 nc) cloned RNAs
Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match known sequences rrna trna mirna mrna
Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match genome multiple copies hairpin conservation Novel mirnas rrna trna mirna mrna rrna trna mirna mrna
Functional annotation of small RNAs Small (16-30 nc) cloned RNAs match genome multiple copies hairpin conservation Novel mirnas multiple genome matches rasirna Novel mirnas rrna trna mirna mrna rrna trna mirna mrna rrna trna mirna mrna
mirna gene prediction Issues: find the locations in the genome that can give rise to mirnas predict the sequence of the mature mirna Main clue: mirna precursors form stem loop structures He & Hannon (Nat. Rev. Genet. 2004)
... so do many other genomic regions Fragment of proteincoding gene let-7a mir-147
mirna gene prediction using SVM Build a model from positive and negative examples. Detect candidate stem loops in (large) genomic sequences. Classify candidate stem loops using the model.
mirna gene prediction using SVM hsa-let-7c L = 84 dg = -33.5 kcal/mole Nucleotide composition: A - 20% C - 19% G - 29% U - 32% Paired nucleotides: A-U - 31% G-U - 14% G-C - 29% Proportion of nucleotides in: symmetrical loops - 17% asymmetrical loops - 4% average distance between loops longest symmetrical region longest slighly asymmetrical region negative stem longest symmetrical regions longest slightly asymmetrical region L = 68 dg = -22.6 kcal/mole Pfeffer et al. 2005
mirna gene prediction using SVM Negatives: mrnas, rrnas, trnas, viral stem loops Positives: human genomic regions containing known mirnas Features with largest negative weights: Free energy Nr. nc. in symmetrical loops in LSAR Nr. nc. in asymmetrical loops in LSAR Avg. size of asymmetrical loops Negatives Positives Features with largest positive weights: Stem length Length longest symmetrical region Nr. A-U pairs in LSAR Nr. G-C pairs in LSAR 29% false negatives 3% false positives Used SVMlight http://svmlight.joachims.org/
Detecting candidate stem loops Search for stems whose secondary structure remains the same irrespective of their flanking sequences. example: hsa-mir-100 300-50 50-50 200-200 86% of the known human micrornas belong to such robust stems. Density of robust stems in human genome: approximately 1 every 10 kb.
Classification of candidate stem loops LSR L = 78 dg = 31.6 kcal/mole LSAR mirna precursor? yes: mir-ul1 of CMV (cloning frequency: 101) SVM score: 0.8
Application: mirna gene prediction in viruses Identification of micrornas of the herpesvirus family. Nature Methods (2005).
Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP
Sensitivity-specificity plots for evaluating the performance of prediction programs Sn = TP TP + FN,Sp = TP TP + FP
Variations on mirna gene prediction " = # w f v f f Lim, L. P. et al. (2003) Genes & Dev. 17:991
Variations on mirna gene prediction Berezikov, E. et al. (2005) Cell 120:21 Proportion of mirna sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)
Variations on mirna gene prediction Xie, X. et al. (2004) Nature 434:338
mirna gene prediction servers http://genes.mit.edu/mirscan/ http://www.mirz.unibas.ch
Prediction of ncrnas using comparative genomics RNAz (www.tbi.univie.ac.at/~wash/rnaz) Start with an alignment of homologous sequences Compute the following features: - mean free energy of aligned sequences - structure conservation index ( SCI = E A / E ) - mean pairwise identity - number of sequences in the alignment Use a SVM to classify candidates E A is the free energy of the alignment (takes into account mutations that preserve the structure), and E is the mean free energy of aligned sequences.
Modeling mirna-mrna interaction for target prediction target: C.e_hbl-1 mirna : cel-let-7 target 5' U GUU C A 3' AUUAUACAACC C ACCUCA UGAUAUGUUGG G UGGAGU mirna 3' U AU A 5' Known mirna-mrna interactions in C.elegans target: C.e._COG-1A mirna : cel-lsy-6 target 5' C CA A 3' GU CUUAUACAAAA CG GAGUAUGUUUU mirna 3' GCUUUA CA 5' target: C.e_LIN-41A mirna : cel-let-7 target 5' U AUU U 3' UUAUACAACC CUGCCUC GAUAUGUUGG GAUGGAG mirna 3' UU AU U 5' Hybrids generated using RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/
Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site).
Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human).
Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna.
Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna. Compare with the number of conserved candidate sites that we get for a random mirna that has approximately the same number of predicted sites in the species of reference.
Modeling mirna-mrna interaction Use evolutionary conservation to determine what defines an mirna target site. Define an interaction model (e.g. the first 8 nucleotides of the mirna have to be perfectly paired with their mrna target site). Determine the locations of all candidate sites in a reference species (e.g. human). Determine the number of these candidate sites that are conserved in a set of species that have the mirna. Compare with the number of conserved candidate sites that we get for a random mirna that has approximately the same number of predicted sites in the species of reference. Lewis et al. 2005
Modeling mirna-mrna interaction S/N ratio Interaction model Some mirnas have hundreds of targets but many do not.
mirna target prediction servers http://pictar.bio.nyu.edu/
sirna design Empirical rules for sirna design - derived from the work in the Tuschl Lab (sirna user s guide: http://www.rockefeller.edu/labheads/tuschl/sirna.html):
sirna design Refining the rules by analyzing large datasets of sirnas (Reynolds et al. 2004, many others): different sirnas for the same gene can have markedly different silencing efficiencies.
sirna design S<50% S>50% S>80% S>95% +1 +1/A +1 +1 +1 +1-1 -1
sirna design Accesibility of target site influences sirna efficacy: Far et al. (2003) Nucl. Acids Res. 31:4417 Target accessibility prediction server http://sfold.wadsworth.org/index.pl