Fungal ITS Bioinformatics Efforts in Alaska

Size: px
Start display at page:

Download "Fungal ITS Bioinformatics Efforts in Alaska"

Transcription

1 Fungal ITS Bioinformatics Efforts in Alaska D. Lee Taylor Institute of Arctic Biology University of Alaska Fairbanks Shawn Houston Minnesota Supercomputing Institute University of Minnesota Photograph: Roger Ruess

2 How does fungal ITS sequence data relate to your project? What fungal ITS data does your project currently provide? What fungal ITS data is your project hoping to provide? We have generated ITS sequences from ~2000 Alaskan sporocarps ~200,000 Sanger sequenced ITS-LSU soil clones from boreal + arctic regions Developing Illumina fungal ITS approach with the Broad Institute

3 Coupling Diversity with Function: Metagenomics of Boreal Forest Fungi USDA-NSF Microbial Genome Sequencing Program, IPY: A Community Genomics Investigation of Fungal Adaptation to Cold NSF OPP International Polar Year,

4 Summary of General Protocol 1. Clip ends, remove vector and assemble forward and reverse reads for each clone in CodonCode Aligner 2. Assign clones to samples using in-house tag-finder script 3. Convert all clone consensus base calls below phred=20 to N using in-house script 4. Further clip ends using EMBOSS TrimSeq utility 5. Orient all clones in ribosomal 5-3 direction using in-house script 6. Extract ITS region for clustering 7. Single-linkage clustering with Cap3; now moving to multiple linkage with Uclust 8. Chimera check with Uchime (works well for fungi!) 9. Submit representative, non-chimeric OTU sequences to BLAST nucleotide searches of our in-house, curated fungal ITS database (see Visualize BLAST results using MEGAN to find NON-FUNGALS 11. Select LSU sequences to represent each clone and align them with ClustalW, improve the alignment manually, where feasible 12. Construct ML and Bayesian LSU trees 13. Carry out species-based diversity and similarity analyses from the abundances of OTUs across samples

5 Saturation of species rarefaction curve for black spruce soil fungi 1008 observed species 1044 predicted species (Chao I)

6

7

8 Is your project involved with curating fungal ITS sequences? If so, what curation strategies are being implemented for your project? What tools for working with fungal ITS sequences does your project currently provide? We have developed sequence analysis approaches/tools due to lack of fungal pipelines when we started in 2003 Developed web tools for cleanup, orienting raw sequences (Mask, Orient) Utilize rigorous sequence cleanup (phred scores), elimination of non-fungal sequences, and chimera-removal (Uchime) Public web BLAST + FASTA searches of databases extracted from GenBank; includes taxonomic summaries, linkout, etc Web interface to Uchime with GenBank search added Tool for creating OTUs via an automated phylogenetic approach (PhyloTable)

9

10

11

12

13

14

15

16 Our Phylobinning Approach: - cluster with Cap3 at low % identity (90%) - extract sequences from clusters - find related sequences in GenBank (everything & uncultured excluded) - generate alignments for each cluster using Muscle - feed alignments to RAxML - use fast-bootstraping method and find best tree using maximum likelihood - parse tree to determine phylobins If branch length > AND bootstrap >= 98, then name new phylobin If branch length < 0.01 AND bootstrap < 98, move to next cluster If branch length >= 0.01 AND bootstrap < 70, then move to next cluster If branch length >= 0.01 AND bootstrap >= 70, then name new phyobin If branch length >= 0.03, then name new phylobin (even if individual sequence) All sequences from a contig that are not assigned to a phylobin at this point go into a last, default phylobin

17

18

19

20 TKN7_3179P22 phylobin18 *gi Uncultured fungus clone G20_OT phylobin18 *gi Uncultured fungus clone TD9_OT phylobin18 Uncultured fungus clone IH_Tag102_ TKN12_3255J11 phylobin19 TKN12_3249H16 TKN12_3258A12 phylobin20 TKN9_3238J10 TKN9_3238J10 phylobin21 *gi Mycorrhizal fungal sp. pkc18 1 phylobin21 TKN12_3258A12 *gi Mycorrhizal fungal sp. pkc33 1 phylobin21 20 *gi Mycorrhizal fungal sp. pkc38 1 phylobin21 Mycorrhizal fungal sp. pkc09 TKN10_3235I22 phylobin21 Mycorrhizal fungal sp. pkc18 TKN11_3260O3 phylobin21 *gi Mycorrhizal fungal sp. pkc12 1 phylobin21 Mycorrhizal fungal sp. pkc33 *gi Mycorrhizal fungal sp. pkc22 1 phylobin21 21 *gi Uncultured fungus clone IH_Tag phylobin22 Mycorrhizal fungal sp. pkc38 TKN12_3249H16 phylobin22 TKN10_3235I22 TKN11_3260O3 Mycorrhizal fungal sp. pkc12 Mycorrhizal fungal sp. pkc22 TKN12_3255J11 19 TKN7_3179P2 Uncultured fungus clone G20_OTU5 Uncultured fungus clone TD9_OTU5 18 Hyalodendriella betulae Meliniomyces bicolor Cistella acuum

21 What tools are you developing / planning to develop? What framework of fungal taxonomy does your project use? Would like to help with developing an improved fungal ITS database suitable for QIIME, RDP, etc. Consensus from GenBank taxonomy (e.g. MEGAN LCA) Tree-building

22 Thoughts on the way forward: 1. The ultimate solution will involve massive support of taxonomy and systematics 2. BUT, we need a useable database NOW 3. Carefully identified sequences, as from UNITE, are helpful, but not sufficient 4. I believe we can quickly prune current INSD ITS to remove sequences with severe taxonomic conflicts with respect to their current NCBI taxonomies 5. In parallel, we need a system for automated placement and taxonomic assignment of novel OTUs

23 252,185 seqs, automated exclusion of environmental

24 Taxmap in QIIME format

25 Funding Sources and Supporting Agencies

26

27

28

29 Least Common Ancestor (LCA) algorithm in MEGAN

30

31

32

33 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences -> Orienting to fix direction II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uchime works well! IV. Defining OTUs -> TGICL/Cap3 Genome Assemblers, now moving to Uclust V. Identifying OTUs -> Curated database from INSD/GenBank