Genomic techniques. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona.

Size: px
Start display at page:

Download "Genomic techniques. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona."

Transcription

1 enomic techniques Marta Puig Institut de Biotecnologia i Biomedicina Universitat utònoma de Barcelona marta.puig@uab.cat

2 he genomic revolution enes enomes Whole genome sequences of tens of species and individuals (and more to come!) ene expression profiling in multiple conditions, tissues, individuals and species Mapping of functional regions in the genome

3 Specific-region techniques DN RN PR Quantitative PR Southern blot R-PR Quantitative R-PR Northern blot Sanger sequencing

4 Fluorescence intensity Single-region techniques PR Quantitative PR Southern blot Detection Quantification Structure

5 enome-wide techniques Microarrays Next-generation sequencing technologies

6 enomic techniques usage microarrays NS Release of the 454 sequencing system (2005) First description of oligonucleotide arrays (Lockhart et al. 1996) First description of cdn microarrays (Schena et al.1995) Source: PubMed searches ( October 2013

7 Microarrays

8 ommon features of microarrays Parallelism/high-throughput (thousands of genes analyzed simultaneously) Miniaturization (small feature and chip size) utomation (chip manufacture, processing and analysis) Based on nucleic-acid hybridization and base pairing Probes fixed on a glass slide Labeling of target samples Detection and quantification (of hybridized molecules)

9 wo-sample competitive hybridization 2 colors / relative quantification Single sample hybridization 1 color / absolute quantification Microarray principles

10 ustom/spotted/two-color microarrays (cdns, Bs) lasses of microarrays probes High-density oligonucleotide arrays (enehip-ffymetrix) Long oligonucleotide microarrays - gilent (25-60 bases) - Illumina (50 bases) - Nimblegen (50-75 bases)

11 ustom microarrays technology 1. Isolation or PR amplification of DN fragments ( Kb) including the genes or regions of interest 2. Spotting of DNs at high density onto a glass microscopy slide ( spots per slide) and cross-linking to the glass surface

12 ustom microarrays technology 3. wo independent mrn or DN samples are fluorescently labeled with y3 (green) or y5 (red) sample (labeled) y3 y5 4. he two labeled populations are combined in equal amounts and hybridized to the array probe (on chip) 5. laser scans the slide and calculates the ratio of fluorescence intensities between the two samples Higher expression in sample 1 Expression in both samples Higher expression in sample 2

13 ene expression arrays he array elements are a series of 25-mer oligos designed from known sequence and synthesized directly on the surface he entire array is formed by >500,000 cells, each containing a different oligo

14 ene expression arrays

15 ene expression arrays Detection of transcription and quantification of expression levels Short oligonucleotides (25 bases) Multiple probes used for each transcript Probes hybridize to the exons located near the 3 end of the analyzed transcript End of DS or 3 UR ( bp) High expression level Intermediate expression level No expression

16 Illumina Bead arrays 3 µm beads coated with probes are located in microwells on the array surface Each probe has an address to identify the location of each bead through an hybridization-based procedure High redundancy (30 beads/probe) his technology can be used to create different types of arrays (gene expression, SNP genotyping ) 29 bases 50 bases

17 Figure 3. Shoemaker et al. (2001) Nature 409, enome tiling arrays Long oligonucleotides (60 bases) Overlapping probes that cover completely the region of interest Identification of new transcribed sequences haracterization of a novel testis transcript using tiling arrays

18 Box 2. Matlin et al. (2005) Nature Reviews Molecular ell Biology 6: lternative splicing arrays Probes located within exons, introns or overlapping exon-exon junctions llow the quantification of previously known RN isoforms

19 SNP arrays enotyping of SNPs DN is hybridized to the microarray an genotype millions of SNPs in a single experiment Single-base extension Figure 2. LaFramboise (2009) Nucleic cids Research 37:

20 pplications of microarrays Uses of microarrays depend on: Probes in the microarray Molecules hybridized to the microarray

21 pplications of microarrays Measuring transcript abundance ENE PROBES (expression arrays) Identification of transcribed sequences PROBES OVERIN LL REION (tiling arrays) nalysis of alternative splicing PROBES IN EXON OR EXON JUNIONS (alternative splicing arrays) RN SNP genotyping (SNP oligonucleotide arrays) Estimating DN copy number (H on B or oligonucleotide arrays) Identifying protein binding sites (hip-chip on tiling or oligonucleotide arrays) Detection of epigenetic modifications (hip-chip on tiling or oligonucleotide arrays) DN hromatin

22 Limitations of microarrays Reliance on available genomic sequence and gene annotations ross hybridization Limited dynamic range and saturation High amount of RN/DN required High cost

23 Sequencing methods

24 Several cycles 1. DN amplification Seqüenciació Sanger clàssica lassical pel mètode sequencing Sanger loning PR Plasmid vector + DN fragments DN Denaturalization + Primer annealing loning aq aq opy of template DN with a thermotolerant DN polymerase ransformation into bacteria mplified fragment Isolation of plasmid DN

25 Figure 1. Shendure and Ji (2008) Nature Biotechnology 26: Sanger lassical sequencing 2. Sequencing reaction 3. apillary electrophoresis romatograma Primer Separation by size in a polyacrylamide gel + Fluorescence detection Dideoxynucleotides (ddnps) Did No other nucleotide can be added. Each dideoxynucleotide is labeled with a different fluorophore. RESULS Long reads ( pb) Small scale (96 reactions/run)

26

27 Mètodes de seqüenciació de 2nd and 3rd generation sequencing technologies segona i tercera generació

28 2nd generation sequencing technologies 454/Roche Pyrosequencing Illumina Reversible termination SOLiD Sequencing by ligation

29 ommon steps in next-generation methods 1. Preparation of sequencing library with adapters DN Fragmentation 2. Solid-phase amplification 3. Sequencing reaction dapters dapter ligation DN fragments of proper size MSSIVELY PRLLEL SEQUENIN MEHODS SEQUENIN LIBRRY

30 Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: /Roche Pyrosequencing 1. DN fragmentation and adapter ligation LIBRRY 2. Emulsion PR within water-in-oil droplets beads

31 Figure 3. Metzker (2010) Nature Reviews enetics 11: Figure 1. England and Pettersson (2005) Nature Methods 2: pplication Note 454/Roche Piroseqüenciació Pyrosequencing 3. Bead distribution in individual wells 4. Pyrosequencing >1 million reads per run Read length = bp

32 Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: Illumina Reversible erminació termination reversible 1. DN fragmentation and adapter ligation LIBRRY 2. Solid-phase amplification and cluster generation by bridge PR >100 million reads per run

33 Figure 2. Metzker (2010) Nature Reviews enetics 11: Illumina Reversible termination 3. Flowing of fluorescent reversible terminator dnps and incorporation of a single base per cycle 4. Reading of the identity of each base of a cluster from sequential images taken after each nucleotide incorporation Read length = bp

34 Figure 3. Metzker (2010) Nature Reviews enetics 11: SOLiD Sequencing by ligation mplification by emulsion PR Oligonucleotides labelled with different fluorescence according to the two first bases Hybridization and ligation to the initial sequencing primer Image capture, elimination of fluorophore and repetition for several cycles hange of the initial primer for one slightly displaced to restart the ligation cycles Bases sequenced in non-consecutive pairs million reads per run

35 SOLiD Sequencing by ligation Read length = bp

36 able 1. Kircher and Kelso (2010) Bioessays 32: omparison of sequencing technologies

37 hallenges of existing NS methods Increase read length Improve sequence accuracy Single-molecule sequencing (no amplification) De novo assembly of complex genomes

38 hird generation sequencing method mplification by emulsion PR Ion orrent Semiconductor sequencing Each bead is located in a well of a semiconductor chip connected to an ion-sensitive layer able to detect changes in ph Figure 1. Rothberg et al. (2011) Nature 475:

39 Ion orrent Semiconductor sequencing Unlabelled nucleotides flow sequentially through the chip Incorporation to DN strand in synthesis causes the release of a proton. Detection of ph change within the well allows the conversion of chemical information into digital information Non-optical DN sequencing (cheaper, faster) 1 million sensors 6 million sensors 11 million sensors

40 Figure 4. Metzker (2010) Nature Reviews enetics 11: Pacific Biosciences SMR Single-molecule real-time sequencing (no amplification step) Immobilized DN polymerase Fluorescence pulse of the corresponding color when a nucleotide is incorporated

41 omparison of sequencing technologies Roche/454 Illumina SOLiD Pacific Biosciences Library amplification empr Bridge PR empr method Sequencing method Detection method Post incorporation N method Error model Polymerase-mediated incorporation of unlabelled nucleotides Light emitted from secondary reactions initiated by release of phosphates (PPi) Substitution errors rare, insertion/deletion errors at homopolymers Polymerase-mediated incorporation of endblocked fluorescent nucleotides Fluorescent emission from incorporated dye-labelled nucleotides hemical cleavage of fluorescent dye and 3 blocking group End of read substitution errors Ligase-mediated addition of 2-base encoded fluorescent oligonucleotides N (single molecule detection) Polymerase-mediated incorporation of terminal phosphate labelled fluorescent nucleotides Fluorescent emission from Real time detection of ligated dye-labelled oligonucleotides hemical cleavage removes fluorescent dye and 3 end of oligonucleotide End of read substitution errors Read length 400 bp 150 bp 75 bp >1,000 bp fluorescent dye in polymerase active site during incorporation N Random insertion/deletion errors Based on able 1. Mardis, ER (2011) Nature 470: N = not applicable

42 NS results >read1 >read2 >read3 >read4 >read5. Base quality measure phred score (Q) Q = -10 log 10 P P = Error probability of each base Q > 20 considered reliable base (P = 0.01)

43 Processing of NS results Reference mapping De novo assembly ssembly of reads by similarity with a reference genome sequence omparison of each read with the reference sequence New sequences and structural rearrangements can NO be detected Not applicable the first time a genome is sequenced ssembly of reads based on the overlap of their sequences omparison of each read with all the other reads New sequences and structural rearrangements N be detected More complex, slower and requires more computational resources

44 pplications of NS technologies Whole-genome sequencing Exome sequencing RN-Seq Figure 2. Simon and Roychowdhury (2013) Nature Reviews Drug Discovery 12:

45 pplications of NS technologies pplication omplete genome resequencing Exome sequencing RN-seq hip-seq Paired-end sequencing Metagenomics Objective Discovery of variants among individuals Resequencing of specific regions within a genome Sequencing of transcripts and quantification of gene expression levels enome-wide mapping of protein-dn interactions Discovery of structural variants Sequencing of all the genomes in a given ecological habitat

46 Figure 1. nirke et al. (2009) Nature Biotechnology 27: argeted capture Exome sequencing ll exons rray hybridization Solution hybridization microrns Probes andidate regions

47 RN-seq Sequencing of all the transcripts in a sample using NS technologies Figure 1. Wang et al. (2009) Nature Reviews enetics 10: 57-63

48 RN-seq DVNES Independence of the existence of an available genomic sequence Detection of new transcripts Single-nucleotide precision Detection of splicing variants and alternative transcription starts and ends Detection of SNPs in transcribed regions Detection of allele-specific transcription ccurate quantification of expression levels (wide range of measurements) reat reproducibility Small amount of initial RN needed

49 Figure 1. Massie and Mills (2008) EMBO reports 9: Figure 2. Park (2009) Nature Reviews enetics 10: hip-seq hromatin immunoprecipitation (hip) + Sequencing Detection of transcription factor binding sites and other DN-protein interactions

50