DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu Office hours: MW 10:30-11:30am Class Meeting: MWF 11:30am-12:20pm, SEARS 356 Prerequisites: EECS340, 454 Course Website: http://vorlon.cwru.edu/~jxl175/teaching.htm eecs458@eecs.cwru.edu Course number: eecs458 (15091) 1
DNA sequencing How we obtain the sequence of nucleotides (A,T,G,C) of a species. ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT Polymorphism, SNPs (0.1%) General strategies Break whole genome into fragments Sequence short fragments Assemble them together DNA DNA fragments DNA fragment cut many times at random (Shotgun) Get one or two reads from each segment reads ~500 bp ~500 bp 2
reads Fragment Assembly Different methods 1. Hierarchical Clone-by-clone yeast, worm, human i. Break genome into many long fragments ii. Map each long fragment onto the genome iii. Sequence each fragment with shotgun 2. Whole Genome Shotgun fly, human, mouse, rat, fugu One large shotgun pass on the whole genome 3. Online version of (1) Walking rice genome i. Break genome into many long fragments ii. Start sequencing each fragment with shotgun iii. Construct map as you go 4. $1000 genome (Church s paper) 5. SNP genotyping (Hapmap.org) Commonly used techniques Restriction Enzyme Digests Gel Electrophoresis Cloning PCR 3
Restriction Enzyme Digests Bacteria produce enzymes (called restriction enzymes) that cut double stranded DNA, discovered by H. Smith et al. in 70 s. One example (the first one): EcoR1 cleaves DNA between G and A when it encounters a sequence 5 -GAATTC-3. There are hundreds of enzymes. They can be used to digest long DNA sequences into short fragments. Gel Electrophoresis A method to separate DNA/RNA/Protein fragments according to their length. A complete digestion of a genome may yields hundreds of thousands of DNA fragments, gel electrophoresis can be used to separate them. In running a gel, all the fragments are loaded into wells at one end of a gel-like matrix. 4
Gel Electrophoresis DNA is a negatively charged molecule, when an electric field is applied to the gel, it will migrate toward the other end of the gel. The distance traveled is inversely proportional to the length/size of the molecules. Cloning In order to investigate the molecules, biologists need many copies of them, this can be done by cloning them. Cloning inserts a fragment of DNA in to an artificially constructed DNA molecule called a vector. Cloning vectors with DNA inserts are introduced into a self-replicating host, hundreds of thousands of copies of the fragment thus can be created with the selfreplication process of the host. Different types: plasmid, YAC, BAC, etc. 5
Library Digest a genome with a restriction enzyme and clone all the fragments to form a library. Need multiple copies of genome in order to make sure the coverage. cdna library: mrna reverse-transcribe cdna first and then clone the cdna to a library Much shorter, only limited to genes Polymerase Chain Reaction (PCR) Polymerase chain reaction (PCR), is a common method of creating copies of specific fragments of DNA. PCR rapidly amplifies a single DNA molecule into 2 n of molecules in n rounds. Requires to know a few bases of the sequences (called primer) of the region 6
Sequencing short fragments Sanger invented the chain-termination method for sequencing short DNA fragments. To sequence a single-stranded DNA, need a primer, normal letters (dntps) and terminators (ddntps). Steps: generate all suffixes of a fragment, with the last letters (terminators) labeled by colors that can be read. Separate and order the suffixes using gel electrophoresis. Read length ~500 bp. Some measure of quality Fragment Assembly Problem Repeats As many as million times, but with variation No orientation Errors Shortest superstring problem a simplified version Three steps: overlap, layout/order, consensus 7