Fungal Comparative Genomics. Jason Stajich Duke University UC Berkeley (Aug 2006)

Size: px
Start display at page:

Download "Fungal Comparative Genomics. Jason Stajich Duke University UC Berkeley (Aug 2006)"

Transcription

1 Fungal Comparative Genomics Jason Stajich Duke University UC Berkeley (Aug 2006)

2 Evolutionary genomics Evolutionary & Organismal Biology Phylogeny Population genetics and structure Phenotype Ecological adaptations Comparative Genomics Molecular evolution Gene order Gene families Gene and genome structure Gene content Conserved elements Rates of molecular evolution Gene function inference Model Systems Genetic tools Gene function & expression Regulatory networks Pathways Molecular & cellular biology Disease models

3 Industrial uses of fungi Bread, beer, wine - Saccharomyces cerevisiae Sake and soy sauce - Aspergillus oryzae Dairy - Penicillium roqueforti, Kluyveromyces lactis Citric acid - Aspergillus niger Riboflavin - Ashbya gossypii Stonewashed jeans - Trichoderma reesei Penicillin antibiotic - Penicillium notatum Button Mushrooms - Agaricus bisporus

4 Agricultural impact of fungi Two-thirds of plant disease is caused by fungi Wheat blight (Fusarium) Strawberry grey mold (Botrytis) Leaf rusts (Puccinia) Wheat and maize smuts (Ustilago). Also deposit mycotoxins - e.g. ergot Mycorrhizal fungi provide nutrient exchange and nitrogen fixation

5 USDA A.G. Bölker

6 Fungal Comparative Genomics Problems Many fungal genomes No central place for annotations, interlinking homolog information Want to visual gene structures and genome context Need system for good database system for scripting genome questions

7 Getting the Data in GFF3 as the data transfer format Write GenBank -> GFF3 scripts Read in data from genome Centers (Broad, Sanger, WashU, JGI, SGD) Pipeline for Genome Annotatation

8 37 Fully sequenced fungal Zygomycota Euascomycota Hemiascomycota Archiascomycota Basidiomycota Million years ago genomesrhizopus oryaze Neurospora crassa Podospora anserina Chaetomium globosum Magnaporthe grisea Fusarium verticillioides Fusarium graminearum Trichoderma reesei Sclerotinia sclerotiorum Botrytis cinerea Stagonospora nodorum Uncinocarpus reesii Coccidioides immitis Histoplasma capsulatum Aspergillus fumigatus Aspergillus nidulans Aspergillus terreus Aspergillus oryzae Ashbya gosspyii Kluyveromyces lactis Saccharomyces cerevisiae Candida glabrata Candida lusitaniae Debaryomyces hansenii Candida guilliermondii Candida tropicalis Candida albicans Candida dubliniensis Yarrowia lipolytica Schizosaccharomyces pombe Cryptococcus neoformans Cryptococcus neoformans H99 Cryptococcus gattii WM276 Cryptococcus gattii R265 Phanerochaete chrysosporium Coprinus cinereus Ustilago maydis Bread mold, Opp Hum pathogen Saprophyte Saprophyte Saprophyte Opp Hum Pathogen Hemibiotroph - Rice Hemibiotroph - maize Hemibiotroph - wheat Saprophyte Necrotroph Necrotroph - fruits Hemibiotroph - wheat Primary Hum pathogen Primary Hum pathogen Opp Hum pathogen Saprophyte Opp Hum pathogen Saprophyte/Industrial uses Biotroph/Industrial uses Industrial uses Industrial uses Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Industrial uses Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Opp Hum pathogen Saprophyte Saprophyte Biotroph - maize

9 51+ More funded and in progress world-wide

10 Rfam trnascan Methodology Analysis SNAP Twinscan ZFF to GFF3 GFF2 to GFF3 GFF to AA predicted proteins FASTA all-vs-all Proteins Genome Proteins Glimmer Genscan BLASTZ BLASTN exonerate protein2genome Tools::Glimmer Tools::Genscan SearchIO GFF2 to GFF3 Bio::DB::GFF protein to genome coordinates SearchIO Tools::GFF Final Genes HMMER GLEAN (combiner) Find orthologs SearchIO Multiple sequence alignment MCL Gene families ESTs exonerate est2genome Intron analysis Bio::AlignIO aa2cds alignment Intron mapping into alignment

11 What I needed Database for storing and querying genome annotations Bio::DB::GFF (BioPerl & Gbrowse) Visualization - Gbrowse Analyses Ability to query for a gene s exon-intron structure and sequences Are gene families clustered on chromosome? Are functional classes of genes clustered on chromosome?

12 Gbrowse Visualization of annotation data Does not have to be for whole/finished genomes Most projects are unfinished so many contigs (100s s) BLAST interface with link to Gbrowse view allows user to start with query sequence and get to the genomic location

13 Gbrowse View Click Here

14 Gene Page (1)

15 Gene Page (2)

16 Gene Page (3)

17 GENE Page (4)

18 Similarity Database Organism Gene Transcript Translation Exon Seq Seq string Alignment Pair Similarity Pairs

19 Other tools BLAST interface Search your sequence and get marked up results with links to Gbrowse Yeast protein to genomic visualization of locus in your organism of interest

20 BLAST Tool

21 Re-formatted BLAST

22 With Links

23 Additional DAta To Integrate Curated life-history information about sequenced fungi (with Anne Pringe, Harvard) Expression data... Mart-enabled?

24 What s Missing Homolog/Ortholog/Paralog capturing Pairwise focused summary stastistics Multiway ortholog summaries Ensembl Compara --> GMOD Compara? Linking to gene trees

25 Queries to address All the genes in closely related pathogenic fungi not present in non-pathogenic outgroup Species-tree defined unique genes,etc Rapidly evolving cell-surface associated genes Gene family size change (paralogous expansions)