Advanced Technology in Phytoplasma Research

Size: px
Start display at page:

Download "Advanced Technology in Phytoplasma Research"

Transcription

1 Advanced Technology in Phytoplasma Research Sequencing and Phylogenetics Wednesday July 8 Pauline Wang pauline.wang@utoronto.ca

2 Lethal Yellowing Disease Phytoplasma Healthy palm Lethal yellowing of palm Why are we here? To learn molecular techniques to study the bacterial phytoplasma pathogen that is devastating coconut crops in Cote D'ivoire.

3 Approaches: How? 1. Identify many different strains that are able to infect coconut and other plant hosts. i. The more strains the better. ii. Identification by phylogenetics. 2. Sequence the genomes of many of the strains above to Identify genetic elements which allow them to be pathogens on different hosts. i. Differences in gene content between different strains may identify important virulence factors. Both approaches involve sequencing.

4 Why is phylogenetics important? Phylogenetics is the study of evolutionary relationships, often among species, individuals or genes (or taxa). Today almost all evolutionary relationships are inferred from molecular sequence data. This is because: 1. DNA is the inherited material; 2. We can now easily, quickly, inexpensively and reliably sequence genetic material; 3. Sequences are highly specific and are often information rich.

5 Applications of Phylogenetics Phylogenetics contributes to our knowledge of how genes, genomes and species evolve. We learn how sequences came to be the way they are today. We can predict how they will change in the future. This is important for many applications.

6 Phylogenetic Tree The evolutionary relationships among a set of species is represented by an evolutionary tree or phylogeny. Topology is the branching structure of the tree and tells the relatedness of the taxa.

7 Stages of Phylogenetic Analysis

8 Classic Bacterial Phylogenetics The small subunit ribosomal RNA (16S rrna) has long been the marker of choice in bacterial phylogenetics. 1. Ubiquitously distributed. 2. Easy to polymerase chain reaction amplify and sequence. 3. Shows little evidence of lateral gene transfer (LGT) (Hodgetts et. al.2008.int J Syst Evol Microbiol 58; )

9 16Sr RNA subgroups (Hodgetts et. al.2008.int J Syst Evol Microbiol 58; )

10

11 PCR Screening Exponential amplification of template

12 Sequence positive clones on AB3730

13 Sanger Sequencing In a Sanger sequencing reaction, a new DNA strand is synthesized based on matching the original strand using the principle of complementarity. The synthesis occurs in the presence of fluorescently labelled ddntps that, when incorporated into the growing DNA strand, cause synthesis to stop. The length and color of the truncated DNA products can be used to infer what bases were present at each position in the original DNA sequence.

14 Sequencing

15 Review of Phylogenetics Labs Classify four unknown strains of phytoplasma 1. PCR amplify the 16S rrna gene from the genomic DNA i. Use a high fidelity, highly robust DNA Polymerase (Q5). 2. Pour agarose gels to check amplification products 3. Column clean successful products i. Strong single bands 4. Set up Sanger Sequencing of the full length products. i. Sequence on A Edit chromatograms. 6. Identify homologs by BLASTn 7. Align and build phylogenetic tree.

16 Advanced Technology in Phytoplasma Research Genomics Wednesday July 8 Pauline Wang pauline.wang@utoronto.ca

17 Why Genome Sequencing? The bacterial tree of life based on a single gene is usually not well resolved because a single gene does not contain sufficient phylogenetic signal to resolve either the ancient or very recent relationships. In addition, because a gene usually represents no more than 0.1% of an average bacterial genome, it has been questioned whether one gene can adequately represent the evolutionary history of a genome (Dagan and Martin 2006). Wang and Wu Mol Biol Evol. doi:1093

18 What is Genomics? Genome Science is the study of the structure, content, and evolution of genomes. With increasing technology, genomics also includes the analyses of expression and function of all the genes and proteins within genomes.

19 DNA replication What is Genomics? Functional genomics (transcriptomics) Proteomics Identify all genes by DNA sequencing Identify the functions of the genes expression analysis (microarray) Identify all expressed proteins by structure and function by protein interactions

20 Genome Sequencing First 5 years: Human Genome Project Goals: 1. Generate high resolution genetic and physical maps to help locate disease associated genes 2. Stimulate the creation of new sequencing technologies in order to complete the genome sequence 3. Identify every gene bioinformatically by annotation of ORFs, ESTs, and functional data 4. Sequence genomes of other model organisms: mouse, E. coli, S. cerevisiae, C. elegans, and D. melanogaster 5. Compile polymorphism databases for SNPs, human diversity, and evolution

21 Genome Sequencing Last 5 years: Human Genome Project Goals: 1. Identify all the structural and functional components encoded in the human genome 2. Determine the heritable variation in the human genome, and eventually across species. 3. Develop genomic approaches to predict disease susceptibility and drug response. 4. Develop policies for the use of genomics in research and clinical settings. 5. Address the legal, ethical and social issues that arise from the project (race and ethnicity)

22 Ethical Issues: Whose genome was sequenced? Gibson GG, Muse SV A primer of Genome Science, 3r d ed. Sinauer Assoc, Mass.

23 Mapping Determining the linear order of markers across the genome. Genetic Mapping distantly located genetic markers in linkage groups Physical Mapping contiguous stretches of assembled chromosomal DNA Genet ic Map (cm) Physical Map (kb) DNA Sequence D. Guttman (2007) Synteny conservation of gene order when chromosome segments are compared Comparative Genomics relationship between different genomes

24 Sanger (Dideoxy) Sequencing Melting Melting 1974 Fred Sanger Base Chain Terminator Radioactivity Fluorescence

25 Dideoxy Sequencing Radioactive Fluorescent

26 Genome Sequencing Two competing methods used to sequence the human genome (Scaffold) (All high resolution) Physical mapping Mapless sequencing USE CURRENTLY Gibson GG, Muse SV A primer of Genome Science, 1st ed. Sinauer Assoc, Mass.

27 Genome Size and Gene Number Surpising finding: human genome contains about the same number of genes as the genomes of other organisms. Increase in size can be due to whole genome duplications, expansion of gene families,etc. Gibson GG, Muse SV A primer of Genome Science, 3 rd ed. Sinauer Assoc, Mass.

28 Metagenomics Since the technology exists, this prompted the sequencing of many new genomes from different organisms

29

30 Microbiome characterize microbial communities found at multiple human body sites and to look for correlations between changes in the microbiome and human health characterize the global microbial taxonomic and functional diversity for the benefit of the planet and mankind

31 Arabidopsis thaliana Whole genome sequencing gives insights into the structure and evolution of genomes. Gave rise to Comparative Genomics Benfey PN, Protopapas AD Genomics, 1st ed. Pearson Educ, New Jersey.

32 Future of Sequencing

33 In Oct. 2012, an integrated genetic variation map from 1092 individuals has been released. In Dec. 2012, the complete genomics dataset from 57 individuals was released.

34 As of Apr. 2013, 1049 genomes have been released.

35 Growth of Genomics 2007: the birth of Next Generation sequencing which employs massively parallel sequencing methods The number of sequences from a single run is much greater than the 96 from modern day capillary elecrophoresis-based methods. Roche 454 (Pyrosequencing) Mb per run (400 base reads) Illumina (Sequencing by Synthesis) Genome Analyzer IIx - 90 GB per run ( base reads) HiSeq MB per run (100 base reads) Applied Biosystems SOLiD (Sequencing by Ligation) 60 GB per run (up to 100 base reads) All require high performance computing and huge data storage capacity.

36

37 454 Pyrosequencing Fragment DNA and immobilize one piece on a bead. Amplify millions of copies of a single fragment on a bead in a well in a picoliter well. Add one nucleotide (A,T,C,or G). If binds, releases pyrophosphate which drives the production of light. The amount of light is proportional to the number of nucleotides bound (repeat stretch). Degrade unincorporated nucleotides, and starts again with a different nucleotide. Gibson GG, Muse SV A primer of Genome Science, 3 rd ed. Sinauer Assoc, Mass.

38 Illumina Genome Analyzers HiSeq2000 NextSeq500 GAIIx Miseq

39 Sequencing by Synthesis Repeat cycles of solid phase-bridge amplification GAIIx output

40 AB SOLiD (Supported Oligonucleotide + Ligation Detection) Sequencing by Ligation Prepare amplified clonal bead populations, as in 454 sequencing. 3 modification allows Bonding to the glass slide Add sequencing primer. Add four fluorescently labeled di-base probes. Ligation to the primer will occur if first two bases match. Detect color tag. Cleave off tag and repeat cycles.

41 Direct sequencing from a single molecule of DNA or RNA

42 Microfluidics Sequencing Ion Torrent - an electronic sequencer that reads DNA on a semiconductor chip by measuring the release of hydrogen ions as nucleotides get incorporated by DNA polymerase.

43 Nanopore Sequencing

44 Cellular function Genomics Applications to Biology At the cellular level we use microarrays to study global RNA expression Newest area is to look at global protein expression, called proteomics. o Determine the 3D structure of all proteins o Identify all protein interactions within an organism Biological networks Evolutionary mechanisms Comparative genomics o The analysis and comparison of sequences from different organisms

45 Systems Biology Newest field which uses the insights of systems engineering combined with sophisticated computer modeling to analyze biological networks Integrates genomic sequencing, gene expression profiling, proteomics, functional genomics and theorical data. Benfey PN, Protopapas AD Genomics, 1st ed. Pearson Educ, New Jersey.

46 Applications of Genomics Medicine Identify genes for disease susceptibility Identify genes responsible for multigenic disease traits (genotype individuals) Improved diagnosis Gene expression from different types of cancers Pharmacogenomics ie. Design therapies based on an individual s genome Agriculture Sequencing crop genomes Gene discovery for useful traits Biofuels Sequencing of farm animals (pigs, cows, sheep, poultry) Identify agricultural pathogens

47 Sequenced phytoplasma genomes to date