High-throughput Transcriptome analysis
|
|
- Richard Reeves
- 5 years ago
- Views:
Transcription
1 High-throughput Transcriptome analysis CAGE and beyond Dr. Rimantas Kodzius, Singapore, A*STAR, IMCB for KAUST 2008
2 Agenda 1. Current research - PhD work on discovery of new allergens - Postdoctoral work on Transcriptional Start Sites a) Tag based technologies allow higher throughput b) CAGE technology to define promoters c) CAGE data analysis to understand Transcription - Work in Singapore on Comparative Transcriptomics 2. Research outlook at KAUST - Nanofluidic devices for Genomics - Production of high volume mol. biology/ Genomics data - Collaboration with bioinformatics to analyze the data
3 PhD work on identification of allergens
4 Picking/ ReArraying/ Spotting robot
5 Examples of DNA filter hybridization 5 patients, allergen A 5 patients, allergen B
6 Work in Japan, Genomic Sciences Centre Supported by: EU FP5 INCO2 program Prof. Yoshihide Hayashizaki (RIKEN) Dr. Piero Carninci (RIKEN) >200 co-authors on publication in Science
7 Genomics goes hand to hand with Transcriptomics To understand phenotypes, diseases need to know transcriptional regulatory networks timing and quantity of controlled transcripts TF binding sites = at promoters
8 Gene structure & EST cloning Promoter TSS ATG exon TAA 5 -UTR x x transcription 3 -UTR AAAAAAA splicing AAAAAAA reverse transcription AAAAAAA TTTTTTT 2 nd strand synthesis in vivo in vitro AAAAAAA TTTTTTT cloning, sequencing genomic alignment
9 Transcripts contain lots of information IRES internal ribosome entry sites CPE cytoplasmic polyadenylation element mirna
10 Full-length cdna libraries Transcriptome allows a snapshot about cell activity Experimental evidence of transcribed region Alternative (promoters splicing - polyadenylation sites) Defined TSS and predicted promoter ORF Open reading frame 5 - and 3 -UTRs, Transcript stability Quantitative analysis of gene expression Trancriptionally interacting partners Gene Networks
11 Tag based technologies Promoter TSS ATG TAA RE 1.SAGE tag Serial Analysis of Gene Expression SAGE Cap Analysis of Gene Expression 3. CAGE 5 -UTR 3 -UTR 3 -tag AAAAAAA TTTTTTT AAAAAAA TTTTTTT AAAAAAA TTTTTTT 5 -tag Gene Identification Signature Paired End ditaq 4. GIS-PET 5 -tag 3 -tag AAAAAAA TTTTTTT
12 CAGE tags represent cdna Genome annotation - Experimental evidence of TSS and transcribed region UTR location - Alternative promoter sites Promoter analysis - Regulatory elements - TF binding sites - CpG islands - repetitive elements Quantitative analysis of gene expression
13 CAGE steps from RNA to 20 bp tags Cap AAAAA Reverse transcription N 20 Biotin Cap AAAAA Full-length cdna selection ssdna release Biotin XmaJI MmeI 5 bp + ssdna capture by CAGE linker Second strand synthesis Biotin XmaJI MmeI 5 bp MmeI digestion of dsdna Biotin MmeI-PCR Biotin + XmaJI MmeI XmaJI 5 bp Ligation of Second linker XmaJI Biotin 20mer tag Biotin Uni-PCR Biotin Biotin XmaJI XmaJI XmaJI tag 1 tag 2 tag 3 tag 4 XmaJI PCR amplification CAGE tag release Concatenation Fractionation Cloning Sequencing
14 Species Assenble Ver. Chromosomes Species Assenble Ver. Chromosomes Mus musculus UCSC-May ,X,Y Homo sapiens UCSC-May ,X,Y Current Statistics Fri, 12 Nov 2004 Number of CAGE Library 145 Number of CAGE Tissue 23 Number of CAGE Plate 8,862 Number of CAGE Clone 2,721,800 Number of CAGE Tag 11,567,973 Average of CAGE Tags/Clone 4.25 Number of mapped CAGE Tag [ at least 1 site ] Number of mapped CAGE Tag [ specified 1 site ] 8,825,172 7,151,511 Average of mapping rate 0.62 Number of CTSS 1,260,079 Number of TC 594,136 Number of TU 39,593 Number of TU in whole genomes 50,612 Current Statistics Thu, 13 Jan 2005 Number of CAGE Library 41 Number of CAGE Tissue 17 Number of CAGE Plate 3,327 Number of CAGE Clone 1,035,181 Number of CAGE Tag 10,165,217 Average of CAGE Tags/Clone 9.82 Number of mapped CAGE Tag [ at least 1 site ] Number of mapped CAGE Tag [ specified 1 site ] 6,475,536 5,312,921 Average of mapping rate 0.52 Number of CTSS 1,057,486 Number of TC 629,716 Number of TU 33,903 Number of TU in whole genomes 39,903
15 5 -RACE validation of Opioid receptor 1
16 Example of tissue-specific TSS UDP-glucuronyl transferase gene example Usage of seven alternative promoters
17 Definitions: CTSS and tag clusters CAGE-tag starting site (CTSS) = CAGE tags with identical 5 -site Tag cluster = overlapping CTSS on same strand TC can be defined by start, end positions, count of tags, distribution of counts
18 TC with >100 tags analyzed Four main classes of tag clusters Four different shape classes for tag clusters
19 Sharp or focused Broad or dispersed
20 TSS sequence representation TATA box in - sharp TSS, - minority of promoters, - tissue-specific genes, - high conservation CpG islands in broad TSS TATA site ~ -30nt from TSS nt
21 The consensus initiator sequence TATA-box -1,+1 Py-Pu (C,T A,G) Most preferred initiators are CG, CA and TG 3 -UTR TSSs GGG motif
22 Dinucleotide frequency in dominant TSS
23 Over-represented k-mers
24 New concept of genes
25 Conclusions for FANTOM3 data After accessing 145 mouse and 41 human CAGE libraries, inclusive GIC/GSC, 5 ESTs, FANTOM3 clones potential 736,403 mouse TC; 665,278 human TC 159,075 mouse TC; 177,563 human TC by >1 tags 181,047 independent transcripts in mouse genome, 62.5% of genome is transcribed (not only 2% protein coding) 65% of TU contain alternatively splicing variants ~ TUs, protein-coding and non-coding TUs, 51,135 proteins 78,393 splicing variants > (72% TU) sense-antisense transcript pairs
26 In summary There are more different transcripts than genes (~10x) More than half (58%) or TUs have two or more alternative promoters, polyadenylation sites; 65% have multiple splice variants Four categories of promoters can be defined TATA-box containing promoters are a minor subset - majority of promoters lie within CpG islands There are transcription forests and deserts
27
28 Complementary information CAGE-TSSchip can be used for measuring promoterbased transcriptional activity Next generation sequencing technologies boost the tag approach data output (Roche Genome Sequencer 20 (454), ABI SOLiD Analyzer, Illumina Genome Analyzer, Helicos HeliScope) Improved promoter and TSS prediction algorithms Encode project Genome annotation (TSS with evidence of 5 or more CAGE tags used) CAGE tags can be found in USCS Genome browser
29 CAGE data in UCSC browser HoxA cluster
30 Still in touch with RIKEN RIKEN president visits Alumni in Singapore
31 Work in Singapore Comparative Genomics (Marine Genomics) laboratory at IMCB Institute of Molecular and Cell Biology ---belongs to A*STAR organization---
32 Marine Genomics group in Singapore
33 Work on Comparative Transcriptomics Elephant shark (Callorhinchus milii) as a model Phylogenetically the oldest group of living jawed vertebrates (separated 450 million years ago) Genome is smaller than H. sapiens (1.2 Gb) Genome is being sequenced at WUGSC Transcriptome (full-length cdna libraries) at IMCB in Singapore UCE Ultraconserved elements
34 Future research plans Get to know and introduce nano instruments for molecular biology/ Genomics Generate experimental data to support hypothesis Collaborate with computer people to analyze the highvolume data
35 Acknowledgements Teachers and Scientists who introduced me to science Colleagues and collaborators for enriching my research Joint KAUST-HKUST laboratory for inviting me today Thanks everyone for listening!