NB536: Bioinformatics

Size: px
Start display at page:

Download "NB536: Bioinformatics"

Transcription

1 NB536: Bioinformatics

2 Instructor Prof. Jong Kyoung Kim Department of New Biology Office: E Homepage:

3 Course website /nb536-bioinformatics/

4 Office hours Monday 17:00 18:00

5 Evaluation Midterm exam 30% Final exam 30% Homework 40%

6 Objectives We will learn about 1. a statistical and computational framework for representing, analyzing and integrating the high-throughput sequencing data 2. a suite of tools and resources that are widely used in analyzing the sequencing data and their assumptions and limitations within the probabilistic and statistical framework

7 Analyzing and manipulating DNA

8 Nucleotide: subunit of the nucleic acids A nucleotide consists of a nitrogen-containing base, a fivecarbon sugar, and one or more phosphate groups.

9 Nucleotide and nucleoside P Nucleotide: Base + Sugar + Phosphate Nucleoside: Base + Sugar

10 Nucleotide

11 Nucleotides are joined together to form nucleic acids

12 Complementary base pairs in the DNA double helix

13 Recombinant DNA technology The ability to manipulate DNA with precision in a test tube or an organism: 1. Cleavage of DNA at specific sites by restriction nucleases 2. DNA ligation, which makes it possible to seamlessly join together DNA molecules from widely different sources 3. DNA cloning in which a portion of a genome is purified away from the remainder of the genome by repeatedly copying it to generate many billions of identical molecules

14 4. Nucleic acid hybridization, which makes it possible to identify any specific sequence of DNA or RNA with great accuracy and sensitivity based on its ability to selectively bind a complementary nucleic acid sequence. 5. DNA synthesis, which makes it possible to chemically synthesize DNA molecules with any sequence of nucleotides, whether or not the sequence occurs in nature. 6. Rapid determination of the sequence of nucleotides of any DNA or RNA molecule.

15 Cleavage of DNA How?

16 Restriction nucleases GGCC CCGG HaeIII GG CC + GG CC GAATTC CTTAAG EcoRI G CTTAA + AATTC G Restriction nucleases restrict the transfer of foreign DNA into bacteria. Different bacterial species produce different restriction nucleases, each cutting at a different, specific nucleotide sequence. These target sequences are short (4-8 bp), many sites of cleavage will occur by chance in any long DNA sequence.

17 Gel electrophoresis

18 Restriction map

19 Mapping with restriction map Cleavage sites for restriction nucleases A, B, C, and D Random fragments

20 DNA ligation 5 3 OH P P OH 3 5 Ligase + ATP Ligase + ATP Sticky ends Blunt ends

21 DNA cloning DNA cloning refers to 1. the act of making many identical copies of a DNA molecule 2. the isolation of a particular stretch of DNA from the rest of the cell s genome

22 DNA cloning

23 Genomic DNA library DNA library: the collection of cloned plasmid molecules Genomic DNA library: the DNA fragments derived directly from the chromosomal DNA of the organism of interest, representing the entire genome of that organism.

24 Hybridization DNA double helices Heat Denaturation to single strands Slowly cool Renaturation (Hybridization) to DNA double helices

25 The chemistry of DNA synthesis

26 Polymerase chain reaction

27 First-generation sequencing technologies

28 Sanger sequencing

29 Sanger sequencing

30 Sanger sequencing Developed by Dr. Frederick Sanger in 1977 Read-length up to 1000 bp* Per-base accuracies as high as %* Low-throughput High-cost (0.5$ per kb) *Nature Biotechnology 26, (2008)

31 Shutgun sequencing

32 Top-down sequencing

33 Next-generation sequencing technologies

34 Sequencing cost per Mbp

35 Sequencing cost per genome

36 Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read NGS Synthetic approach

37 General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively parallel DNA sequencing reactions Analyze data

38 Illumina: Library preparation

39 Illumina: Library preparation

40 Illumina: Cluster amplification

41 Illumina: Cluster amplification

42 Illumina: Sequencing by synthesis

43 Illumina: Summary