Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu
Gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. For a specific cell at a specific time, only a subset of the genes coded in the genome are expressed. 2
Gene expression regulation Gene expression is highly regulated In mutant cells vs wild-type cells In response to various stimuli (e.g. drug, light, sleep etc) At different developmental stages In different cell types (e.g. muscle cells, fibroblasts) In disease states vs healthy Gene expression is primarily regulated at the transcriptional level The number of mrna copies in a cell for a particular gene is a good indicator of corresponding protein expression level 3
Expression of a typical eukaryotic protein-coding gene mrna is the subject of measurement in the technologies that will be introduced today graph courtesy of Wikipedia 4
Candidate gene approaches Northern Blot RNA is isolated, electrophoresed on an agarose gel, transferred to a nylon membrane through a capillary or vacuum blotting system, and probed with a radioactive cdna derived from an individual gene. Radioactive signal is measured. RNase protection Extracted RNA is first mixed with radioactive cdna probes from a gene of interest to form DNA-RNA hybrid. The mixture is then exposed to RNase that specifically cleaves single-stranded RNA. Radioactive signal is measured. Real-Time PCR RNA is reverse transcribed to cdna and then quantified by real-time PCR using a fluorescent probe specific to a gene of interest. Fluorescent signal is measured. 5
High-throughput transcriptome profiling Transcriptome: the set of all messenger RNA (mrna) molecules, or "transcripts, produced in one or a population of cells. Hybridization based approaches: incubating fluorescently labeled cdna with microarrays. Hybridization signal is measured. cdna microarray High density olio arrays Sequencing based approaches: directly determine the cdna sequence. Count is measured. Sanger sequencing of cdna or EST libraries Serial Analysis of Gene Expression (SAGE) Massively Parallel Signature Sequencing (MPSS) RNA-Seq 6
Low-throughput vs High-throughput Chalcone synthase Protein kinase Actin Northern 0 10m 30m 1h 3h 6h 24h Microarray 10m 30m 1h 3h 6h 24h Advantages of high-throughput technologies High-throughput Exploratory analysis Relationship between genes Challenges in high-throughput technologies Cost Data analysis 7
A simplified protocol for DNA microarray experiment Prepare or purchase DNA microarray Isolate mrna from cell cultures or tissue samples Reverse transcribe mrna into cdna Label cdna or crna by incorporating fluorescently-labeled nucleotides Hybridize labeled cdna or crna to DNA microarray Wash and scan microarray in scanner Analyze data 8
DNA microarrays DNA microarray: a solid support (glass slide, silicon chip, etc) on which DNA of known sequence is deposited in a regular grid-like array. Spotted or printed arrays: DNA feature physically transferred from a plate or reservoir and transferred to a solid support, typically a chemically modified glass microscope slide. (Agilent, GE, ABI) Synthesized arrays: DNA features chemically synthesized in-situ on the substrate. (Affymetrix, NimbleGen, Combimatrix) 9
Array preparation: cdna spotted array Inserts from cdna collections or libraries are amplified using either vector-specific or gene-specific primers. PCR products are printed at specified sites on glass slides using high-precision arraying robots. Through the use of chemical linkers, selective covalent attachment of the DNA strand to the glass surface can be achieved. A movie: http://www.youtube.com/watch?v=3zxq_adfsb8 Schulze and Downward, Nature Cell Biol, 3:E190, 2001 10
Array preparation: high-density oligo arrays Based on the concept of photolithography A chip with initial starting strands where the DNA will be built from Shine light through a mask to deprotect specific strands Add free nucleotides Repeat until desired length, e.g. 25 base graph courtesy of Affymetrix 11
Affymetrix oligonucleotide microarrays Probes are complementary to exon sequences located near the 3 end of the gene Each probe is usually 25-base long Typically 11-20 pairs of oligonucleotide probes per probe set Each pair of oligonucleotides contains a perfect match oligonucleotide and a mismatch oligonucleotide (single base-pair mismatch) Mismatch probes are used as controls for nonspecific hybridization One gene is represented by one to a few probe sets HG-U133 Plus 2.0 Array comprises more than 54,000 probe sets 12
Affymetrix GeneChip expression array design A movie on GeneChip array preparation http://www.youtube.com/watch?v=ui4botwjexs graph courtesy of Affymetrix 13
Microarray experiment: two-color arrays RNA from two different tissues or cell populations is used to synthesize singlestranded cdna in the presence of nucleotides labelled with two different fluorescent dyes (for example, Cy3 and Cy5). Both samples are mixed in a small volume of hybridization buffer and hybridized to the array surface, usually by stationary hybridization under a coverslip, resulting in competitive binding of differentially labelled cdnas to the corresponding array elements. High-resolution confocal fluorescence scanning of the array with two different wavelengths corresponding to the dyes used provides relative signal intensities and ratios of mrna abundance for the genes represented on the array. Schulze and Downward, Nature Cell Biol, 3:E190, 2001 14
Microarray experiment: single-color arrays polya+ RNA from different tissues or cell populations is used to generate doublestranded cdna carrying a transcriptional start site for T7 DNA polymerase. During in vitro transcription, biotin-labelled nucleotides are incorporated into the synthesized crna molecules. Each target sample is hybridized to a separate probe array and target binding is detected by staining with a fluorescent dye coupled to streptavidin. Signal intensities of probe array element sets on different arrays are used to calculate relative mrna abundance for the genes represented on the array. Schulze and Downward, Nature Cell Biol, 3:E190, 2001 15
Exon arrays Classic Affymetrix arrays keep the probes on the 3 end Affymetrix Gene 1.0 ST arrays spread the probes across all known and predicted exons in order to generate a more accurate transcript-level signal graph courtesy of Affymetrix 16
Tiling arrays Classic arrays use a few probes for each known or predicted gene. Tiling arrays design probes to cover the entire genome or selected genomic regions in order to produce an unbiased look at gene expression. GeneChip Human Tiling 1.0R Array Set 14-array set 25-mers oligos 35-base pair resolution over 6.5 million probes per array graph courtesy of Affymetrix 17
Large-scale transcriptional activity in chr. 21 and 22 Kampa et al. Genome Res, 14:331, 2004 18
Novel RNAs identified from tiling array analysis Kampa et al. Genome Res, 14:331, 2004 19
SAGE: overview Serial analysis of gene expression (SAGE) is a method for rapid and comprehensive analysis of gene expression. A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that that the tag is obtained from a unique position within each transcript. Sequence tags can be linked together to from long serial molecules that can be cloned and sequenced. Quantitation of the number of times a particular tag is observed provides the expression level of the corresponding transcript. graph courtesy of SAGEnet 20
SAGE: tagging cdna synthesized on oligo(dt) beads Digested with an anchoring enzyme that recognizes and cuts specific 4-bp DNA sequences to reveal the 3'-most restriction site Ligated to two different linkers containing the recognition site for an tagging enzyme that cuts 10-bp 3' from the anchoring enzyme recognition site to generate a SAGE tag. SAGE tags released from the oligo(dt) beads are then blunted and ligated to each other to give rise to ditags. The ditags are PCR amplified, released from the linkers by the anchoring enzyme, gel purified, serially ligated, cloned, and sequenced using an automated sequencer. Patino et al. Circulation Res, 91:565, 2002 21
Technology comparison Hybridization based method x x x x Relatively inexpensive Customizable (for genes of interest) Analog signal Rely on existing knowledge about gene annotation or genome sequence Cross-hybridization Limited dynamic range of detection owing to both background and saturation of signals Sequencing based method x x x Digital gene expression level Doesn t rely on existing knowledge Based on expensive Sanger sequencing technology A significant portion of the short tags cannot be mapped to the genome Transcript isoforms are generally indistinguishable 22
RNA-Seq: new kid on the block 23 RNA-Seq: direct ultra-high-throughput sequencing of cdna.
RNA-Seq: overview RNAs are fragmented and converted into cdna by random priming Sequencing adaptors are added to each cdna fragment and a short sequence is obtained from each cdna using high-throughput sequencing technology. Resulting sequence reads are aligned with the reference genome or transcriptome, and classified as three types: exonic reads, junction reads and poly(a) end-reads. Alignments are used to generate a single-base resolution expression profile for each gene. Wang et al. Nature Rev Genet, 10:57, 2009 24
RNA-Seq: technology comparison Wang et al. Nature Rev Genet, 10:57, 2009 25
RNA-Seq: challenges Library construction Mapping the short reads from RNA-Seq to the reference genome Appropriate assignment of multi-mapping reads Identification of new alternative splice junctions Classification of reads mapping outside annotated boundaries Comparison of samples to identify differentially expressed genes 26
Key references Schulze and Downward. Navigating gene expression using microarrays a technology review. Nature Cell Biol, 3:E190, 2001 Wang et al. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev Genet, 10:57, 2009 27