Gene expression analysis. Gene expression analysis. Total RNA. Rare and abundant transcripts. Expression levels. Transcriptional output of the genome

Size: px
Start display at page:

Download "Gene expression analysis. Gene expression analysis. Total RNA. Rare and abundant transcripts. Expression levels. Transcriptional output of the genome"

Transcription

1 Gene expression analysis Gene expression analysis Biology of the transcriptome Observing the transcriptome Computational biology of gene expression Recent examples Transcriptonal response to an anti-cancer drug The new empirical maps of human transcription Co-expression networks to derive gene function Transcriptional output of the genome Total RNA GENES TRANSCRIPTS Templates for protein synthesis (mrna) Amino acid carriers (trna) Catalytic functions (rrna) Regulate protein synthesis (mirna) Regulate decay of other transcripts (mirna) UNKNOWN FUNCTIONS (TUF:s) Expression levels Rare and abundant transcripts Number of genes Tnnt2 TROPONIN T, CARDIAC MUSCLE ISOFORM Actc1 ACTIN, ALPHA CARDIAC Srf SERUM RESPONSE FACTOR X axis: organ (skin, heart, brain and so on) Y axis: mrna expression level Transcripts per cell 1

2 Expression patterns Co-expression Gene regulation: control of expression levels d[mrna]/dt = synthesis - decay 1. Synthesis Changes in the chromatin Transcription 2. Decay Degradation by RNases Examples of gene regulation Physiology Activation of genes Ins1 and Ins2 after eating Development Activation of gene Actc1 and Myh7 genes during heart muscle cell differentiation Disease Disturbed upregulation of the gene Erbb2 in many human cancers Accessible and non-accessible chromatin Transcription initiation cis and trans regulators 2

3 RNA polymerase II The enzyme that makes protein-coding transcripts RNA pol II is non-specific in its pure form: ANY DNA RNA COPY ~40 basal transcription factors are needed to make RNApol II promoter-specific Example deciding whether or not to express the gene Acta2 Myocd Nkx3.1 AND GATA4/6 CRP1/2 AND GATA 4/6 OR SRF... CC[A/T] 6 GG... CC[A/T] 6 GG... THE DNA OF A GENE RNA COPY ~2000 transcription factors are needed to regulate the action of RNApol II THE DNA OF THE RIGHT GENE RNA COPY AND Transcription of Acta2! See Gary Owens et al, Molecular regulation of vascular smooth muscle cell differentiation in development and disease. Physiol Rev Summary so far The transcriptome is rich in different RNA species of known and unknown function The expression level of a transcript is controlled by synthesis and decay rates. About ~10% of genes in the human genome are directly involved in gene regulation Differences/changes in expression levels are interesting in computational biology: Do we believe in the observed differences/changes? In what way do the differences/changes reflect or predict function? Observing the transcriptome We want to quantify the expression level of genes x,y,z.. in samples a,b,c... Three main technological principles Hybridization on a surface Polymerase chain reaction Sequencing Principle 1: hybridization on a surface Principle 2: PCR In situ hybridization Northern blot Micro-array Tissue distribution can be observed Transcript size and alternative splice products can be observed Many transcripts can be observed at low cost More sensitive than micro-array, but higher cost and time / gene 3

4 Principle 3: Sequencing The Serial Analysis of Gene Expression technique is elegant since the precision is theoretically unlimited. Costly. 1. Each spot is contains multiple copies of single-stranded DNA, with a sequence corresponding to an individual transcript ATTTCGTGCGCG ATTTCGTGCGCG GAGCATTTCGCA ATATTGGCGCGT GAGCATTTCGCA ATATTGGCGCGT AAAAAATTTTT AAAAAATTTTT 2. A solution of labeled nucleic acids corresponding to the transcriptome of a biological sample are incubated over the surface 3. The spots act as receptors for a mixture of transcripts (RNA converted into cdna) from a biological sample 4. The surface is washed so that unbound nucleic acids vanish 5. The chemical labeling of the bound nucleic acids is used to generate a fluorescence signal 4

5 The surface scan results in a 16-bit TIFF image (Double red-green labeling shown) A computer program is used to calculate the average intensity of each spot Example of a log-log scatterplot used to compare the signals from two arrays Normalization is a set of statistical procedures necessary to account for systematic errors in microarray measurements Two main array technologies Affymetrix genechips Photolitography based array construction with many replicates per gene More reproducible results (depends on experience of lab) Always single-color experiments Easier to compare across experiments/labs cdna microarrays / spotted oligo arrays Less sophisticated array construction, and fewer spot replicates Cost-efficient Often two-color experiments Harder to compare across experiments/labs Limitations Arrays do not observe all transcripts Arrays do not typically observed splicing diversity Cross-reaction between similar nucleic acids Experimental noise At least 1000 cells needed the signals will be cell population averages Amount of data can make it difficult to interpret 5

6 Next lecture Examples of gene regulation Biology of the transcriptome Observing the transcriptome Computational biology of gene expression Recent examples Transcriptonal response to an anti-cancer drug The new empirical maps of human transcription Co-expression networks to derive gene function Physiology Activation of genes Ins1 and Ins2 after eating Development Activation of gene Actc1 and Myh7 genes during heart muscle cell differentiation Disease Disturbed upregulation of the gene Erbb2 in many human cancers 6