Add 2016 GBS Poster As Slide One

Size: px
Start display at page:

Download "Add 2016 GBS Poster As Slide One"

Transcription

1 Add 2016 GBS Poster As Slide One

2 GBS Adapters and Enzymes Barcode Adapter P1 Sticky Ends Common Adapter P2 Illumina Sequencing Primer 2 Barcode (4 8 bp) Restriction Enzymes Illumina Sequencing Primer 1 ApeKI G CWGC PstI CTGCA G EcoT22I ATGCA T 5 3

3 Barcode Design Considerations Barcode sets are enzyme specific Must not recreate the enzyme recognition site Must have complementary overhangs Sets must be of variable length Bases must be well balanced at each position Must different enough from each other to avoid confusion if there is a sequencing error At least 3 bp differences among barcodes Must not nest within other barcodes No mononucleotide runs of 3 or more bases adapters

4 PCR PRC Primers: primers: Pooled Digestion/Ligation Reactions (n=94) PCR GBS Library Insert Insert P1 P2 O1 P1 P2 O2

5 Perform Titration to Minimize Adapter Dimers Before Sequencing NOTE: Done once with a small number of samples Adapter dimers constitute only 005% of raw sequence reads Size Standards Optimal adapter amount Fluorescense intensity 15 bp Adapter Dimer Library 1500 bp Time Time

6 Size Selected GBS Library Fragments are of optimal length No titration of adapters necessary Increased reduction of genome complexity Allows for SNP calls and improves linkage mapping

7 Overview of Genotyping by Sequencing (GBS) Restriction site ( ) sequence tag Sample1 Loss of cut site < 450 bp Sample2 Focuses NextGen sequencing power to ends of restriction fragments Scores both SNPs and presence/absence markers

8 Genome Sampling Strategies Vary By Species Dependent On Factors That Affect Diversity Mating System Out crosser, in breeder, clonal? Ploidy Haploid, diploid, auto or allopolyploid? Geographical Distribution Island population, cosmopolitan? Genome Relevance Within Population How closely related are individuals?

9 Why Modify The GBS Protocol? Sequencing technology Patterned flow cells Size selected GBS libraries optimize fragment lengths for sequencing on the Illumina HiSeq 3000 More markers Use more frequent cutting enzyme (4 5 base cutter) Fewer markers Use a less frequent cutting enzyme (6 8 base cutter) Deeper sequence coverage per locus Increase/decrease multiplexing Balance of information and depth of coverage

10 Other Considerations Genome appropriate Enzyme choice relative to amount of methylation Genome Size How much coverage will an enzyme yield? The genome size has some bearing on the size of the fragment pool Amount of repetitive DNA directly correlated with the size of the genome Genome Composition The composition of the genome can effect the frequency and distribution of cut sites

11 Sampling large genomes with methylationsensitive restriction enzymes GBS Library Sequenced Fragments Methylated DNA 5 base cutter Unmethylated DNA 6 base cutter

12 Enzymes Currently Used At The CGRB ApekI 5 base cutter, Partially Methylation Sensitive PstI/MspI (double digest) 6 base cutter, Not Methylation Sensitive HindIII/MspI (double digest) 6 base cutter, Not Methylation Sensitive SbfI/MspI (double digest) 8 base cutter, Not Methylation Sensitive

13 Estimate Sequencing Coverage Coverage depth calculator (must have reference) 1) (150 cycles) X (total of HiSeq sequences; eg 300 million for single end vs 600 million for pairedend) = total number of base pairs 2) Total number of BP / total # of samples = average per sample that you can expect 3) Average per sample / genome of interest size in bp = AVERAGE COVERAGE PER SAMPLE Example below uses 48 samples with a 302Mb genome run on a paired end flow cell 150 cycles * 600 million sequences (for paired end) = 90,000,000,000 bp for PE 90,000,000,000 total number of bp / 48 samples = 1,875,000,000 bp per sample 1,875,000,000 avg bp per sample / 302,000,000 bp = *620 (avg coverage per sample) *this is a reduced representation of the genome, therefore this is an underestimate

14 The Take Home Increasing evidence suggests that more coverage is needed to make high confidence SNP calls across multiple analysis pipelines Enzyme choice, level of multiplexing, and size selection are major factors effecting depth of coverage CGRB Bioinformatics can help with experimental design, and downstream analysis