Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Size: px
Start display at page:

Download "Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona"

Transcription

1 Transcriptomics Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

2 Central dogma of molecular biology

3 Central dogma of molecular biology Genome Complete DNA content of an organism with all its genes and regulatory sequences Transcription Proteome Complete collection of proteins and their relative levels in each cell Translation Phenotype Transcriptome Complete set of transcripts and their relative levels of expression in a particular cell or tissue under defined conditions at a given time

4 Why is the study of RNA so important? RNA profiling provides information about: Expressed sequences and genes of a genome Gene regulation and regulatory sequences Function and interaction between genes Functional differences between tissues and cell types Identification of candidate genes for any given process or disease

5 Overview Methods Alternative splicing Types of transcripts Regulatory sequences ENCODE project

6 Transcriptome analysis methods SINGLE GENES Northern RT-PCR 5 and 3 RACE Quantitative RT-PCR (Real-Time RT-PCR) WHOLE TRANSCRIPTOME EST sequencing Microarrays RNA-Seq

7 Transcriptome analysis using microarrays Gene expression arrays - Quantification of transcript abundance - Single/multiple 3 probes Genome tiling arrays - Identification of transcribed sequences - Multiple probes covering the genome Gene Probes Gene Probes Alternative splicing arrays - Quantification of different RNA isoforms - Probes in exons and exon-exon junctions Inclusion form Exclusion form

8 Expressed Sequence Tags (EST) cdna synthesis cdna library Sanger sequencing of insert ends ESTs Alignment with genome Brent (2008) Nature Reviews Genetics 9: 62-73

9 RNA-seq Sequencing of all the transcripts in a sample using NGS technologies AAAAA Figure 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63

10 RNA-seq mapping of short reads in exon-exon junctions RNA-seq CCGAAAATCAAGTCATCCCTAAAGACTAAGTAAGTAACCATATTACATTAAGGAAGGCACTTTAAAAGTTTATAATCATTTGTAGACTCCCACCAAAGCCACTGACTCGCAAGG Exon Intron Exon

11 Figures 1 and 2. Graveley et al. (2011) Nature 471: RNA-seq examples Discovery of new transcripts by RNA-seq in D. melanogaster

12 D. melanogaster RNA-seq data as shown in GBrowse (FlyBase) RNA-seq examples Quantification and determination of expression profiles Expression profile by RNA-seq of the D. melanogaster gene eve in different developmental stages

13 RNA-seq advantages Independence of the existence of an available genomic sequence Detection of new transcripts Single-nucleotide precision Detection of splicing variants and alternative transcription starts and ends Detection of SNPs in transcribed regions Detection of allele-specific transcription Accurate quantification of expression levels (wide range of measurements) Great reproducibility Small amount of initial RNA needed

14 Table 1. Wang et al. (2009) Nature Reviews Genetics 10: RNA-seq advantages

15 DNA Regulatory elements TATA TRANSCRIPTION START SITE mrnas Polyadenylation signal CTGAATAAATCCA Promoters Splicing TRANSCRIPTION TERMINATION SITE mrna 5 UTR 3 UTR polya tail AAAAAAAAA ORF ACTGATGTCCA Methionine TRANSLATION INITIATION CCGATAAATCC STOP codon TRANLATION TERMINATION

16 Figure 1. Nielsen and Graveley (2010) Nature 463: Figure 1. Li et al. (2007) Nature Reviews Neuroscience 8: Internal exons Alternative 5 splice site selection Alternative splicing Alternative 3 splice site selection Exon inclusion/skipping Intron retention Initial/final exons

17 Alternative splicing example: α-tropomyosin Alternative promoters Exon inclusion/skipping Alternative polya sites Alternative 3 splice site selection Figure Evolution. Barton et al. (2007) Cold Spring Harbor Laboratory Press

18 Extreme alternative splicing examples Number of isoforms > Figure 2. Nielsen and Graveley (2010) Nature 463:

19 Table 1. Graveley et al. (2011) Nature 471: Prevalence of alternative splicing in Drosophila 7473 genes are alternatively spliced 60.7% out of expressed genes with multiple exons

20 Prevalence of alternative splicing in humans 92-94% of human genes show alternative splicing 86% of human genes generate two different transcripts in significant amounts (minor isoform frequency of 15%) Many alternative isoforms are produced in different tissues as a result of a specific regulation Figure 2. Wang et al. (2008) Nature 456:

21 Regulation of alternative splicing Not all possible isoforms exist Developmentally regulated splicing variants in D. melanogaster Tissue-regulated splicing variants in humans Figure 2. Nielsen and Graveley (2010) Nature 463: Figure 4. Graveley et al. (2011) Nature 471: Figure 1. Wang et al. (2008) Nature 456:

22 Regulation of alternative splicing in humans Genes tend to express many isoforms simultaneously One isoform dominates in a given condition ¾ of the protein-coding genes have at least two different major isoforms Variability of gene expression contributes more than variability in splicing ratios to the variability of transcript abundance across cell lines Figure 4. Djebali et al. (2012) Nature 489:

23 Unanswered questions How many of the observed isoforms are functionally relevant? Can alternative splicing account for the higher complexity of some organisms? Table 2. Nielsen and Graveley (2010) Nature 463:

24 Types of transcripts Type Name Size Transcripts Function Small non-coding RNAs rrnas ribosomal RNAs nt 531 Component of ribosome trnas transfer RNAs nt 624* Translation snrnas small nuclear RNAs nt 1923 Splicing snornas small nucleolar RNAs nt 1529 RNA modification mirnas micro RNAs nt 3116 Gene expression regulation Long non-coding RNAs lncrnas long non-coding RNAs >200 nt Regulation, imprinting Number of transcripts from GENCODE v14 data * Number of transcripts from GENCODE v7 data

25 rrnas and trnas rrnas rrnas transcribed from a polycistronic transcript that is modified and processed to generate the mature 18S, 5.8S and 28S rrnas assemble with proteins to form the two subunits of the ribosome trnas trnas carry an amino acid to the protein synthetic machinery of a cell (ribosome) as directed by a three-nucleotide sequence (codon) in the mrna Essential components of the protein translation process

26 snrnas and snornas snrnas snornas Part of the splicing machinery Guide chemical modifications of other RNAs Dredge et al. (2001) Nature Reviews Neuroscience 2: Eddy (2001) Nature Reviews Genetics 2:

27 Figure 2. He and Hannon (2004) Nature Reviews Genetics 5: micrornas Small non-coding RNAs (21-23 nt) involved in the post-transcriptional regulation of gene expression by binding to the 3 UTR of target mrnas Identified in the early 1990s, but recognized as a distinct class of regulators in the early 2000s Detected in multiple species ranging from humans to mice, Drosophila, C. elegans or even plants (Arabidopsis) Abundant in many cell types and may be involved in many different processes Target around 60% of mammalian genes

28 Long non-coding RNAs (lncrnas) Definition Non-coding RNAs longer than 200 nucleotides Genomic organization Rinn and Chang (2012) Annual Review of Biochemistry 81:

29 Expression of long non-coding RNAs Lower expression levels in all tissues compared to proteincoding genes More tissue-specific expression patterns compared to mrnas Distribution of the number of Human Body Map tissues in which lncrna and protein-coding transcripts are detected Figure 5. Derrien et al. (2012) Genome Research 22:

30 Long non-coding RNAs Currently 21,271 annotated transcripts transcribed from 12,933 loci in the human genome Significantly more conserved than neutrally evolving sequences but at lower levels than protein-coding genes Byproduct Guide Are lncrnas functional? Scaffold Baker (2011) Nature Methods 8:

31 Examples of long non-coding RNAs 3.1 kb 1 kb lincrna-p21 represses many genes and results in cellular apoptosis GAS5 is induced under starvation and growth arrest. It competes with glucocorticoid receptor for DNA binding sites and results in reduced metabolism A lncrna is transcribed from the promoter region of CCND1 induced for DNA damage, and recruits TLS protein to CCND1 (cyclin D1) and represses its expression, interrupting cell cycle Figure 2. Huarte and Rinn (2010) Hum. Mol. Genet. 19 :R152-R161

32 Pseudogenes Definition Types Genes that have lost their coding ability 10, , Figure 3. Harrow et al. (2012) Genome Research 22:

33 Pseudogenes 863 pseudogenes are transcribed and associated with active chromatin in the human genome PTENP1 pseudogene protects PTEN from mirna silencing, and therefore has a tumor suppressive function Can pseudogenes have a function or they are just what remains of inactivated genes? Figure 1. Poliseno et al. (2010) Nature 465:

34 Transcript profiling across tissues Human Mouse Data from Su et al. (2004) PNAS 101:

35 Transcript profiling across individuals Different expression levels of a given gene are detected in different individuals Figure 1. Cheung and Spielman (2009) Nature Reviews Genetics 10:

36 Coding vs. Regulatory changes Regulatory changes have unique properties that could make them especially important in phenotypic evolution Reduced pleiotropical effects Fine-tuning of gene function Co-dominance and more efficient selection

37 Persistence of lactase expression In most mammals ability to digest milk disapears with age and is related to the production of the lactase enzyme Lactase production in adults shows large variability in human populations and seems related with pastoralism Figure 1. Itan et al. (2010) BMC Evolutionary Biology 10:36

38 Persistence of lactase expression Figure 1. Tishkoff et al. (2007) Nature Genetics 39: 31-40

39 Figure 1. Ong and Corces (2011) Nature Reviews Genetics 12: Regulatory elements are difficult to predict: Small ( <50 pb) Variable sequence motifs Few nucleotide positions are really important Poorly conserved and with not defined locations Regulatory elements Regulatory elements: Core promoter Proximal elements Distal enhancers (upstream / downstream)

40 Figure 1. Massie and Mills (2008) EMBO reports 9: Figure 2. Park (2009) Nature Reviews Genetics 10: ChIP-seq Chromatin immunoprecipitation (ChIP) + Sequencing Detection of transcription factor binding sites and other DNA-protein interactions

41 ENCODE project ENCyclopedia Of DNA Elements International project funded by the National Human Genome Research Institute (NHGRI) with the goal to identify all functional elements in the human genome. PHASES Pilot phase ( ) 1% of human genome (44 regions, a total of 30 Mb) Production phase (2007-) Whole genome

42 Figure 1. Ecker et al. (2012) Nature 489: ENCODE project Functional elements

43 ENCODE project data Maher (2012) Nature 489: ,640 genome-wide data sets prepared from 147 cell types

44 ENCODE project main results A total of 62.1% and 74.7% of the human genome is covered by either processed or primary transcripts, respectively No cell line expresses more than 56.7% of the union of the expressed transcriptomes across all cell lines A large number of previously unknown transcription start sites and new transcript isoforms have been identified Thousands of new non-coding transcripts have been detected (22,531 longnoncoding RNAs) An initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features have been described 80% of the genome has been annotated with potentially functional elements

45 ENCODE project data

46