Using New ThiNGS on Small Things. Shane Byrne

Size: px
Start display at page:

Download "Using New ThiNGS on Small Things. Shane Byrne"

Transcription

1 Using New ThiNGS on Small Things Shane Byrne

2 Next Generation Sequencing New Things Small Things

3 NGS Next Generation Sequencing = 2 nd generation of sequencing 454 GS FLX, SOLiD, GAIIx, HiSeq, MiSeq, Ion Torrent, Ion Proton

4 Fluorescence Detecting H + Inside the boxes

5 3 rd Generation Sequencing 3 rd Generation sequencing Oxford Nanopore GridION, Helicos, PacBio RS

6 Nanopore sequencing 3 rd Generation Sequencing

7 Economics of Sequencing

8 Sequencing Approaches Shotgun, sequence the fragment library and then put back together Assemble with help of reference sequence(s) scaffold for aligning De-Novo assembly Multiple templates: cdna from total RNA sequence all transcripts present in a sample. Match to database, subtract things not required. Amplicon/Targeted Sequencing PCR process first Selection of target material for sequencing Multiplex with barcode tags Single or multiple organisms in starting samples

9 Sequencing Basics Library preparation preparation of fragments/amplicons/cdna for sequencing Addition of adaptors to each end of fragment Adaptors can also have barcodes indices Specific barcode for specific sample Recombine libraries together Application of library(ies) to flow cell Illumina = adaptors bind to flow cell forming clusters Other NGS systems libraries onto beads/emulsion Cluster density improvements increase sequencing output Bridge amplification/fluorescence reads occur in the clusters and captured by optics

10 Example of Applications

11 Coverage NGS generates multiple sequencing reads over genetic regions, this is termed coverage. Coverage conveys read depth to NGS, this is leveraged to find minor sequence variation Deep Sequencing means using a lot of coverage across an area of interest

12 Coverage Vs Multiplexing As an example: Sequence 1 organism in a run and get 1000x coverage but only need 50x coverage for acceptable quality of the sequencing data Could multiplex 20 organisms into run now get 50x coverage and more results for your $ If deep sequencing need to determine coverage required to detect minor variants NGS can discern mutants at 1% prevalence

13 Multiplexing (Indices/Barcodes) Adaptors = Adaptors with Specific Index sequence included Computer processing pulls all related sequences back together

14 Terms and Numbers Sequence yield/run (Mb,Gb) Reads/run (thousand, million, billion) Sequence Yield/Reads = Read Length (bp) Cost/run usually fixed. MiSeq $1,184 for sequencing reagents for a run $ Sample prep costs for library generation Need to have volume to start a run or costly Assembling bacterial genomes: 3-5Mb much easier than human genome 3Gb 5 mins on lab computer Vs 24 hours on high end server Viral genomes 2kb-2Mb

15 Quality Phred Q Score Phred scores describe the quality of the sequencing output. Q10, Q20, Q30 etc. Q30 quality is considered almost perfect with no errors/ambiguities NGS ~Q30 is benchmark Sanger = Q20 max

16 Time/Quality/Yield/Reads

17 Quality

18 Visual genome size comparison 24hrs, 4000 core server 5mins, 6 core 32Gb lab PC

19 Assembly

20 Assembly

21 Coverage more improves the contig sizes The more coverage you generate the easier it is to piece it all back together

22 Computing Overheads Increase as Coverage Increases As coverage increases, demand on memory for program assembling data increases

23 Virology and NGS Virology Detection of Unknown Viral Pathogens and Discovery of Novel Viruses Detection of Tumour Viruses Characterization of the Human Virome Full-Length Viral Genome Sequencing Investigation of Viral Genome Variability and Characterization of Viral Quasispecies Monitoring Antiviral Drug Resistance (1% popn. level) Epidemiology of Viral Infections and Viral Evolution Quality Control of Live-Attenuated Viral Vaccines

24 Bacteriology and NGS Bacteriology Bacterial whole genome sequencing Epidemiology Typing Microbiome analysis / dysfunction Treatment applications Ability to detect all unknown bacteria in a sample with high sensitivity Culture independent methodology Antimicrobial resistance mutants at low level Resistome Forensics Microbiome signatures

25 Metagenomic Workflow 16S

26 Amplicon Sequencing Multiple amplicons targeted sequencing Tumour somatic mutation genes Bacterial virulence genes, specific identification targets 96 samples x 12 targets = $48/patient

27 Pathogen Transcript Detection

28 Subtraction Pathogen Transcript Detection

29 Costs for multiple bacterial genomes MiSeq V3 48 whole 5MB bacterial genomes at 50x coverage/run

30 Emerging from the surroundings

31 What exactly is in the surroundings? Microbiome analysis using NGS is proving very interesting Moving from waiting to looking Viral metagenomic analyses of environmental samples suggest that the field of virology has explored less than 1% of the extant viral diversity. In the last decade, the culture-independent and sequenceindependent metagenomic approach has permitted the discovery of many viruses in a wide range of samples. Phylogenetically, some of these viruses are distantly related to previously discovered viruses. In addition, 60 99% of the sequences generated in different viral metagenomic studies are not homologous to known viruses.

32

33 The water looks good. How many viruses/litre sea water? = 100 Billion How many bacteria+protists/litre sea water? =10 Billion How many viruses on Earth = Water

34

35 Soil How many viruses/g of soil? = 1.2x10 9 How many bacteria/g of soil? =100 Million - 3 Billion

36 Potential Pathogen Pyramid

37 Viral Discovery

38 Viral Discovery

39 Viral Discovery

40 Recent Papers

41 Recent Papers

42 DecodeX Wound Care

43 Faecal Microbiome Swap - FMT

44 Faecal Microbiome Swap - FMT

45 Faecal Microbiome Swap - FMT

46 Clinical Microbiology and NGS Still costly, $200-$1000, but no longer ridiculous Costs decreasing, data output increasing, run times decreasing Currently in the genome database creation phase Similar to 16S in Genbank in the 2000 s Metagenomic/bulk sequence analysis tools improving but a ways to go yet for routine/easy use Deep Sequencing to interrogate the resistome, low level mutants New appreciation for biomes around us Polymicrobial/biome based diagnostics including non-cultured organisms Profiling of FMT inocula (pre use) Resolution of NGS data starting to replace proxy genome interrogation techniques such as PFGE/MLST/VNTR for epidemiology