Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Similar documents
Transcription:

Transcriptomics Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Central dogma of molecular biology

Central dogma of molecular biology Genome Complete DNA content of an organism with all its genes and regulatory sequences Transcription Proteome Complete collection of proteins and their relative levels in each cell Translation Phenotype Transcriptome Complete set of transcripts and their relative levels of expression in a particular cell or tissue under defined conditions at a given time

Why is the study of RNA so important? RNA profiling provides information about: Expressed sequences and genes of a genome Gene regulation and regulatory sequences Function and interaction between genes Functional differences between tissues and cell types Identification of candidate genes for any given process or disease

Overview Methods Alternative splicing Types of transcripts Regulatory sequences ENCODE project

Transcriptome analysis methods SINGLE GENES Northern RT-PCR 5 and 3 RACE Quantitative RT-PCR (Real-Time RT-PCR) WHOLE TRANSCRIPTOME EST sequencing Microarrays RNA-Seq

Transcriptome analysis using microarrays Gene expression arrays - Quantification of transcript abundance - Single/multiple 3 probes Genome tiling arrays - Identification of transcribed sequences - Multiple probes covering the genome Gene Probes Gene Probes Alternative splicing arrays - Quantification of different RNA isoforms - Probes in exons and exon-exon junctions Inclusion form Exclusion form

Expressed Sequence Tags (EST) cdna synthesis cdna library Sanger sequencing of insert ends ESTs Alignment with genome Brent (2008) Nature Reviews Genetics 9: 62-73

RNA-seq Sequencing of all the transcripts in a sample using NGS technologies AAAAA Figure 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63

RNA-seq mapping of short reads in exon-exon junctions RNA-seq CCGAAAATCAAGTCATCCCTAAAGACTAAGTAAGTAACCATATTACATTAAGGAAGGCACTTTAAAAGTTTATAATCATTTGTAGACTCCCACCAAAGCCACTGACTCGCAAGG Exon Intron Exon

Figures 1 and 2. Graveley et al. (2011) Nature 471: 473-479 RNA-seq examples Discovery of new transcripts by RNA-seq in D. melanogaster

D. melanogaster RNA-seq data as shown in GBrowse (FlyBase) RNA-seq examples Quantification and determination of expression profiles Expression profile by RNA-seq of the D. melanogaster gene eve in different developmental stages

RNA-seq advantages Independence of the existence of an available genomic sequence Detection of new transcripts Single-nucleotide precision Detection of splicing variants and alternative transcription starts and ends Detection of SNPs in transcribed regions Detection of allele-specific transcription Accurate quantification of expression levels (wide range of measurements) Great reproducibility Small amount of initial RNA needed

Table 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63 RNA-seq advantages

DNA Regulatory elements TATA TRANSCRIPTION START SITE mrnas Polyadenylation signal CTGAATAAATCCA Promoters Splicing TRANSCRIPTION TERMINATION SITE mrna 5 UTR 3 UTR polya tail AAAAAAAAA ORF ACTGATGTCCA Methionine TRANSLATION INITIATION CCGATAAATCC STOP codon TRANLATION TERMINATION

Figure 1. Nielsen and Graveley (2010) Nature 463: 457-463 Figure 1. Li et al. (2007) Nature Reviews Neuroscience 8: 819-831. Internal exons Alternative 5 splice site selection Alternative splicing Alternative 3 splice site selection Exon inclusion/skipping Intron retention Initial/final exons

Alternative splicing example: α-tropomyosin Alternative promoters Exon inclusion/skipping Alternative polya sites Alternative 3 splice site selection Figure 8.22. Evolution. Barton et al. (2007) Cold Spring Harbor Laboratory Press

Extreme alternative splicing examples Number of isoforms >500 38016 28 Figure 2. Nielsen and Graveley (2010) Nature 463: 457-463

Table 1. Graveley et al. (2011) Nature 471: 473-479 Prevalence of alternative splicing in Drosophila 7473 genes are alternatively spliced 60.7% out of 12295 expressed genes with multiple exons

Prevalence of alternative splicing in humans 92-94% of human genes show alternative splicing 86% of human genes generate two different transcripts in significant amounts (minor isoform frequency of 15%) Many alternative isoforms are produced in different tissues as a result of a specific regulation Figure 2. Wang et al. (2008) Nature 456: 470-476

Regulation of alternative splicing Not all possible isoforms exist Developmentally regulated splicing variants in D. melanogaster Tissue-regulated splicing variants in humans Figure 2. Nielsen and Graveley (2010) Nature 463: 457-463 Figure 4. Graveley et al. (2011) Nature 471: 473-479 Figure 1. Wang et al. (2008) Nature 456: 470-476

Regulation of alternative splicing in humans 12 0.3 Genes tend to express many isoforms simultaneously One isoform dominates in a given condition ¾ of the protein-coding genes have at least two different major isoforms Variability of gene expression contributes more than variability in splicing ratios to the variability of transcript abundance across cell lines Figure 4. Djebali et al. (2012) Nature 489: 101-108

Unanswered questions How many of the observed isoforms are functionally relevant? Can alternative splicing account for the higher complexity of some organisms? Table 2. Nielsen and Graveley (2010) Nature 463: 457-463

Types of transcripts Type Name Size Transcripts Function Small non-coding RNAs rrnas ribosomal RNAs 114-5000 nt 531 Component of ribosome trnas transfer RNAs 73-93 nt 624* Translation snrnas small nuclear RNAs 100-300 nt 1923 Splicing snornas small nucleolar RNAs 60-300 nt 1529 RNA modification mirnas micro RNAs 21-23 nt 3116 Gene expression regulation Long non-coding RNAs lncrnas long non-coding RNAs >200 nt 21271 Regulation, imprinting Number of transcripts from GENCODE v14 data * Number of transcripts from GENCODE v7 data

rrnas and trnas rrnas rrnas transcribed from a polycistronic transcript that is modified and processed to generate the mature 18S, 5.8S and 28S rrnas assemble with proteins to form the two subunits of the ribosome trnas trnas carry an amino acid to the protein synthetic machinery of a cell (ribosome) as directed by a three-nucleotide sequence (codon) in the mrna Essential components of the protein translation process

snrnas and snornas snrnas snornas Part of the splicing machinery Guide chemical modifications of other RNAs Dredge et al. (2001) Nature Reviews Neuroscience 2: 43-50 Eddy (2001) Nature Reviews Genetics 2: 919-929

Figure 2. He and Hannon (2004) Nature Reviews Genetics 5: 522-531. micrornas Small non-coding RNAs (21-23 nt) involved in the post-transcriptional regulation of gene expression by binding to the 3 UTR of target mrnas Identified in the early 1990s, but recognized as a distinct class of regulators in the early 2000s Detected in multiple species ranging from humans to mice, Drosophila, C. elegans or even plants (Arabidopsis) Abundant in many cell types and may be involved in many different processes Target around 60% of mammalian genes

Long non-coding RNAs (lncrnas) Definition Non-coding RNAs longer than 200 nucleotides Genomic organization Rinn and Chang (2012) Annual Review of Biochemistry 81: 145 166

Expression of long non-coding RNAs Lower expression levels in all tissues compared to proteincoding genes More tissue-specific expression patterns compared to mrnas Distribution of the number of Human Body Map tissues in which lncrna and protein-coding transcripts are detected Figure 5. Derrien et al. (2012) Genome Research 22: 1775-1789

Long non-coding RNAs Currently 21,271 annotated transcripts transcribed from 12,933 loci in the human genome Significantly more conserved than neutrally evolving sequences but at lower levels than protein-coding genes Byproduct Guide Are lncrnas functional? Scaffold Baker (2011) Nature Methods 8: 379 383

Examples of long non-coding RNAs 3.1 kb 1 kb lincrna-p21 represses many genes and results in cellular apoptosis GAS5 is induced under starvation and growth arrest. It competes with glucocorticoid receptor for DNA binding sites and results in reduced metabolism A lncrna is transcribed from the promoter region of CCND1 induced for DNA damage, and recruits TLS protein to CCND1 (cyclin D1) and represses its expression, interrupting cell cycle Figure 2. Huarte and Rinn (2010) Hum. Mol. Genet. 19 :R152-R161

Pseudogenes Definition Types Genes that have lost their coding ability 10,000 29 3,000 175 Figure 3. Harrow et al. (2012) Genome Research 22: 1760-1774

Pseudogenes 863 pseudogenes are transcribed and associated with active chromatin in the human genome PTENP1 pseudogene protects PTEN from mirna silencing, and therefore has a tumor suppressive function Can pseudogenes have a function or they are just what remains of inactivated genes? Figure 1. Poliseno et al. (2010) Nature 465: 1033-1038

Transcript profiling across tissues Human http://biogps.org Mouse Data from Su et al. (2004) PNAS 101: 6062-6067

Transcript profiling across individuals Different expression levels of a given gene are detected in different individuals Figure 1. Cheung and Spielman (2009) Nature Reviews Genetics 10: 595-604

Coding vs. Regulatory changes Regulatory changes have unique properties that could make them especially important in phenotypic evolution Reduced pleiotropical effects Fine-tuning of gene function Co-dominance and more efficient selection

Persistence of lactase expression In most mammals ability to digest milk disapears with age and is related to the production of the lactase enzyme Lactase production in adults shows large variability in human populations and seems related with pastoralism Figure 1. Itan et al. (2010) BMC Evolutionary Biology 10:36

Persistence of lactase expression Figure 1. Tishkoff et al. (2007) Nature Genetics 39: 31-40

Figure 1. Ong and Corces (2011) Nature Reviews Genetics 12: 283-293 Regulatory elements are difficult to predict: Small ( <50 pb) Variable sequence motifs Few nucleotide positions are really important Poorly conserved and with not defined locations Regulatory elements Regulatory elements: Core promoter Proximal elements Distal enhancers (upstream / downstream)

Figure 1. Massie and Mills (2008) EMBO reports 9: 337-343. Figure 2. Park (2009) Nature Reviews Genetics 10: 669-680. ChIP-seq Chromatin immunoprecipitation (ChIP) + Sequencing Detection of transcription factor binding sites and other DNA-protein interactions

ENCODE project ENCyclopedia Of DNA Elements International project funded by the National Human Genome Research Institute (NHGRI) with the goal to identify all functional elements in the human genome. PHASES Pilot phase (2003-2007) 1% of human genome (44 regions, a total of 30 Mb) Production phase (2007-) Whole genome

Figure 1. Ecker et al. (2012) Nature 489: 52-55 ENCODE project Functional elements

ENCODE project data Maher (2012) Nature 489: 46-48 1,640 genome-wide data sets prepared from 147 cell types

ENCODE project main results A total of 62.1% and 74.7% of the human genome is covered by either processed or primary transcripts, respectively No cell line expresses more than 56.7% of the union of the expressed transcriptomes across all cell lines A large number of previously unknown transcription start sites and new transcript isoforms have been identified Thousands of new non-coding transcripts have been detected (22,531 longnoncoding RNAs) An initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features have been described 80% of the genome has been annotated with potentially functional elements

http://genome.ucsc.edu/ ENCODE project data