NGS the subterranean realm: from RNA-seq to bait design for a groundwater isopod Danielle Stringer

Size: px
Start display at page:

Download "NGS the subterranean realm: from RNA-seq to bait design for a groundwater isopod Danielle Stringer"

Transcription

1 NGS the subterranean realm: from RNA-seq to bait design for a groundwater isopod Danielle Stringer Michelle Guzik, Karen Meusemann, Simon Tierney, Rachael King, Steve Cooper and Andy Austin

2 Aridification and Refugia Continuous mesic environment once existed over central and Western Australia during the middle Miocene Large freshwater lakes and wetlands Late Miocene aridification: drier landscape, contraction of habitats and isolation of fauna in remnant habitats Groundwater refugia Diversification of taxa: now contain relictual, short-range endemic species

3 Groundwater Habitats

4 Fauna Yilgarn subterranean fauna GAB spring surface fauna

5 Knowledge Gaps Were these ecosystems and their populations once inter-connected across the arid zone? Are there evolutionary links between species from the three locations? Are Haloniscus species relicts from a wetter time in Australia s history?

6 Aims Investigate systematics and evolution of groundwater-dependent isopods (Haloniscus) from the Australian arid zone using NGS techniques Explore the biogeographic history of the fauna: assess evolutionary links between Haloniscus from Yilgarn calcretes (WA), Ngalia Basin calcretes (NT) and the Great Artesian Basin springs (SA) Use an integrative taxonomic approach to describe new species of Haloniscus from GAB springs based on combined morphological and molecular analyses

7 NGS Methods We are using a next-generation sequencing approach with a range of novel methods to identify new genes for phylogenetic analysis Sequenced Haloniscus sp. and other isopod transcriptomes (5) Porcellionides pruinosis (Mohammad Javidkar) Paraplatyarthrus sp. (Mohammad Javidkar) Paraplatyarthrus subterraneus (Mohammad Javidkar) Sphaeroma sp. (Andreas Zwick) Porcellio scaber (Andreas Zwick) Reiner Richter Ken Walker

8 RNA-Seq Transcriptome Sequencing Pooled individuals RNA extracted using QIAGEN RNeasy Plus Micro Kit cdna libraries generated using a Clontech SMARTer PCR cdna Synthesis Kit Produces high quality DNA from small quantities of RNA Able to enrich for full-length cdnas Illumina HiSeq 2000 platform 100bp paired-end reads Reiner Richter Assembled using Trinity (Haas et al. 2013) Ken Walker

9 Orthology Prediction Orthology prediction using a graph based, reciprocal approach with profile hidden Markov models

10 Orthograph Daphnia pulex, Ixodes scapularis, Tribolium castaneum, Zootermopsis nevadenis Clusters of orthologous genes Petersen et al

11 Orthograph Petersen et al

12 Orthograph Petersen et al

13 Orthograph Petersen et al

14 Orthograph Result: 532 single-copy orthologous genes (all 6 transcriptomes) Advantages: Convenient, fast and reliable identification of orthologous nucleotide or amino acid sequences Circumvents redundant transcript assignments and maps transcripts to globally best matching orthologous group Reports transcripts that are potential alternative transcripts or splice variants Easy to install and can run on standard workstation computer (Linux/Mac)

15 Optimise Gene Set Removed phylogenetically uninformative genes using MARE software (Meyer et al. 2011) Selects optimised data subsets from supermatrices for phylogenetics using geometry-weighted quartet mapping Genes with information content > 0.5 chosen = 479 genes

16 Bait Design

17 BaitFisher BaitFisher package consists of two programs: BaitFisher and BaitFilter BaitFisher Constructs target enrichment baits from multiple sequence alignments Provides all possible bait designs suitable for enriching a given target locus Uses a tiling design to increase capture efficiency Tiling design: seven successive 120bp baits, spanning 240bp. Starting coordinates separated by an offset of 30bp. Is able to split aligned sequences into exons (with user-provided genome reference)

18 BaitFisher BaitFilter Selects a specific bait set by determining the optimal bait region in a multiple sequence alignment Can be the most conserved region in the alignment or the region with the highest number of sequences without gaps or ambiguous nucleotides

19 Laboratory Methods Library Preparation Meyer and Kircher (2010) protocol Issues with small, degraded specimens and sonication Pooling prior to capture: 4 individuals Target Enrichment/Sequence Capture MYbaits Target Enrichment Kit (v3 manual) Illumina MiSeq 300/150bp paired-end sequencing Pooled individuals in 1 lane

20 Sequence Data BWA: software for mapping sequences against reference genes Intron problems

21 Sequence Data BWA: software for mapping sequences against reference genes

22 Next Steps Data analysis, phylogenetics and dating: SNP calling Coverage assessment Consensus sequences and alignment RAxML phylogeny Estimate divergence dates with BEAST Morphology: Morphological species descriptions based on delineation of lineages from phylogenetic trees

23 Acknowledgments Funding for this project has come from ABRS, The Nature Conservancy, The Thomas Foundation, Bioplatforms Australia (sequencing) and an ARC Linkage grant Christoph Mayer Malte Petersen Oliver Niehuis Andreas Zwick Mohammad Javidkar Terry Bertozzi Tessa Bradford Kathy Saint Austin Lab group Bill Humphreys Newhaven Managers NT Landowners Anangu Luritjiku Rangers