Using New ThiNGS on Small Things Shane Byrne
Next Generation Sequencing New Things Small Things
NGS Next Generation Sequencing = 2 nd generation of sequencing 454 GS FLX, SOLiD, GAIIx, HiSeq, MiSeq, Ion Torrent, Ion Proton
Fluorescence Detecting H + Inside the boxes
3 rd Generation Sequencing 3 rd Generation sequencing Oxford Nanopore GridION, Helicos, PacBio RS
Nanopore sequencing 3 rd Generation Sequencing
Economics of Sequencing
Sequencing Approaches Shotgun, sequence the fragment library and then put back together Assemble with help of reference sequence(s) scaffold for aligning De-Novo assembly Multiple templates: cdna from total RNA sequence all transcripts present in a sample. Match to database, subtract things not required. Amplicon/Targeted Sequencing PCR process first Selection of target material for sequencing Multiplex with barcode tags Single or multiple organisms in starting samples
Sequencing Basics Library preparation preparation of fragments/amplicons/cdna for sequencing Addition of adaptors to each end of fragment Adaptors can also have barcodes indices Specific barcode for specific sample Recombine libraries together Application of library(ies) to flow cell Illumina = adaptors bind to flow cell forming clusters Other NGS systems libraries onto beads/emulsion Cluster density improvements increase sequencing output Bridge amplification/fluorescence reads occur in the clusters and captured by optics
Example of Applications
Coverage NGS generates multiple sequencing reads over genetic regions, this is termed coverage. Coverage conveys read depth to NGS, this is leveraged to find minor sequence variation Deep Sequencing means using a lot of coverage across an area of interest
Coverage Vs Multiplexing As an example: Sequence 1 organism in a run and get 1000x coverage but only need 50x coverage for acceptable quality of the sequencing data Could multiplex 20 organisms into run now get 50x coverage and more results for your $ If deep sequencing need to determine coverage required to detect minor variants NGS can discern mutants at 1% prevalence
Multiplexing (Indices/Barcodes) Adaptors = Adaptors with Specific Index sequence included Computer processing pulls all related sequences back together
Terms and Numbers Sequence yield/run (Mb,Gb) Reads/run (thousand, million, billion) Sequence Yield/Reads = Read Length (bp) Cost/run usually fixed. MiSeq $1,184 for sequencing reagents for a run $ Sample prep costs for library generation Need to have volume to start a run or costly Assembling bacterial genomes: 3-5Mb much easier than human genome 3Gb 5 mins on lab computer Vs 24 hours on high end server Viral genomes 2kb-2Mb
Quality Phred Q Score Phred scores describe the quality of the sequencing output. Q10, Q20, Q30 etc. Q30 quality is considered almost perfect with no errors/ambiguities NGS ~Q30 is benchmark Sanger = Q20 max
Time/Quality/Yield/Reads
Quality
Visual genome size comparison 24hrs, 4000 core server 5mins, 6 core 32Gb lab PC
Assembly
Assembly
Coverage more improves the contig sizes The more coverage you generate the easier it is to piece it all back together
Computing Overheads Increase as Coverage Increases As coverage increases, demand on memory for program assembling data increases
Virology and NGS Virology Detection of Unknown Viral Pathogens and Discovery of Novel Viruses Detection of Tumour Viruses Characterization of the Human Virome Full-Length Viral Genome Sequencing Investigation of Viral Genome Variability and Characterization of Viral Quasispecies Monitoring Antiviral Drug Resistance (1% popn. level) Epidemiology of Viral Infections and Viral Evolution Quality Control of Live-Attenuated Viral Vaccines
Bacteriology and NGS Bacteriology Bacterial whole genome sequencing Epidemiology Typing Microbiome analysis / dysfunction Treatment applications Ability to detect all unknown bacteria in a sample with high sensitivity Culture independent methodology Antimicrobial resistance mutants at low level Resistome Forensics Microbiome signatures
Metagenomic Workflow 16S
Amplicon Sequencing Multiple amplicons targeted sequencing Tumour somatic mutation genes Bacterial virulence genes, specific identification targets 96 samples x 12 targets = $48/patient
Pathogen Transcript Detection
Subtraction Pathogen Transcript Detection
Costs for multiple bacterial genomes MiSeq V3 48 whole 5MB bacterial genomes at 50x coverage/run
Emerging from the surroundings
What exactly is in the surroundings? Microbiome analysis using NGS is proving very interesting Moving from waiting to looking Viral metagenomic analyses of environmental samples suggest that the field of virology has explored less than 1% of the extant viral diversity. In the last decade, the culture-independent and sequenceindependent metagenomic approach has permitted the discovery of many viruses in a wide range of samples. Phylogenetically, some of these viruses are distantly related to previously discovered viruses. In addition, 60 99% of the sequences generated in different viral metagenomic studies are not homologous to known viruses.
The water looks good. How many viruses/litre sea water? = 100 Billion How many bacteria+protists/litre sea water? =10 Billion How many viruses on Earth = 10 31 Water
Soil How many viruses/g of soil? = 1.2x10 9 How many bacteria/g of soil? =100 Million - 3 Billion
Potential Pathogen Pyramid
Viral Discovery
Viral Discovery
Viral Discovery
Recent Papers
Recent Papers
DecodeX Wound Care
Faecal Microbiome Swap - FMT
Faecal Microbiome Swap - FMT
Faecal Microbiome Swap - FMT
Clinical Microbiology and NGS Still costly, $200-$1000, but no longer ridiculous Costs decreasing, data output increasing, run times decreasing Currently in the genome database creation phase Similar to 16S in Genbank in the 2000 s Metagenomic/bulk sequence analysis tools improving but a ways to go yet for routine/easy use Deep Sequencing to interrogate the resistome, low level mutants New appreciation for biomes around us Polymicrobial/biome based diagnostics including non-cultured organisms Profiling of FMT inocula (pre use) Resolution of NGS data starting to replace proxy genome interrogation techniques such as PFGE/MLST/VNTR for epidemiology