Isoform sequencing PacBio RSII. Anna Bratus PacBio User Meeting, Barcelona, November 10, 2015

Size: px
Start display at page:

Download "Isoform sequencing PacBio RSII. Anna Bratus PacBio User Meeting, Barcelona, November 10, 2015"

Transcription

1 Isoform sequencing PacBio RSII Anna Bratus PacBio User Meeting, Barcelona, November 10, 2015

2 SCHEDULE I. CASE STUDY II. III. LIBRARY PREPARATION SEQUENCING IV. DATA OUTCOME V. CONCLUSIONS 2

3 CASE STUDY E. coli diarrhea (ETEC, enterotoxigenic E. coli) in the pig How a pig gets diarrhea? 3

4 CASE STUDY E. coli diarrhea of the pig: Phenotyping Microscopic E. coli adhesion test Obtain enterocytes from slaughtered pigs Add bacteria F4 Bacteria adhere to the brush border of enterocytes of pigs with a susceptible phenotype Bacteria do not adhere to the brush border of enterocytes of resistant pigs brush boarders enterocytes bacterias Python,

5 New boundaries of the F4bcR locus 131 Mbp SW Mbp ZDHHC MUC4GT S S SW TNK MUC KIAA LRCH LMLN ZNF SLC12A HEG MUC ITGB KARLN MYLK SW SSC Mbp 148 Mbp Sscrofa 10.2 assembly 5

6 CASE STUDY Muc13 Zhang, kb Ren,

7 CASE STUDY Facts and questions: 1. DNA sequences of Muc13A and Muc13B cloned in BACs showed differences in tandem repeat regions: PacBio sequencing 2. Are those TRs transcribed? 3. Are there any other candidates in the critical region in pig chromosome 13? 4. Are there any other E.coli receptor candidiates in the pig genome? 5. Are we able to improve pig reference genome annotation? 7

8 LIBRARY PREPARATION Isoform Sequencing (Iso-Seq ) Using the Clontech SMARTer PCR cdna Synthesis Kit and BluePippin Size-Selection System 8

9 LIBRARY PREPARATION 1 INPUT MATERIAL SAMPLE SOURCE: PIG EPITHELIAL CELLS ISOLATED FROM SMALL INTESTINE NO. OF SAMPLES IN EXPERIMENT: 2 SAMPLES: FROM RESISTANT AND SUSCEPTIBLE ANIMALS SAMPLE QUANTITY: 1µg of TOTAL RNA SAMPLE QUALITY: RINs: 9.0 and 9.2 9

10 LIBRARY PREPARATION 2 GENERATION OF FULL-LENGTH cdna: Clontech SMRTer PCR cdna Synthesis Kit Single step 10

11 LIBRARY PREPARATION 3 PCR CYCLE OPTIMIZATION SAMPLE 1 SAMPLE CYCLE NUMBER 11

12 LIBRARY PREPARATION 3 PCR CYCLE OPTIMIZATION SAMPLE 1 12

13 LIBRARY PREPARATION 4 LARGE SCALE PCR FOR SIZE SELECTION ON THE BluePippin SYSTEM (KAPA HiFi) SAMPLE 1 SAMPLE 1 SAMPLE 2 SAMPLE 2 13

14 LIBRARY PREPARATION 5 SIZE SELECTION ON THE BluePippin SYSTEM cdna FRACTIONS: I. 1-2 kb II. 2-3 kb III. 3-6 kb IV kb I. II. III. IV. 14

15 LIBRARY PREPARATION 6 LARGE SCALE PCR FOR SMRTbell LIBRARY PREPARATION (KAPA HiFi) cdna FRACTIONS: I. 1-2 kb II. 2-3 kb III. 3-6 kb IV kb I. II. III. IV. 15

16 LIBRARY PREPARATION 7 cdna SMRTbell TEMPLATE PREPARATION cdna FRACTIONS: I. 1-2 kb II. 2-3 kb III. 3-6 kb IV kb I. II. III. IV. 16

17 LIBRARY PREPARATION 8 OPTIONAL 2nd BluePippin SIZE SELECTION FOR LONG INSERT LIBRARIES cdna fraction IV: BEFORE size-selection cdna fraction IV: AFTER size-selection 17

18 SEQUENCING cdna Fraction Sample No of SMRT cells/fraction No of reads (*1000)/cell Mean polymerase read length (*1000) Throughput (Gbp) 1-2 kb ,7 16, ,9 17, kb ,6 16, ,9 16, kb ,1 16, ,4 15, kb ,3 12,3 2.0 CHEMISTRY: P6/C4, 1X240min movie ,1 11,

19 SEQUENCING SMRTbell libraries Fraction III-NON SIZE SELECTED Fraction III-NON SIZE SELECTED Fraction IV-SIZE SELECTED Fraction IV-SIZE SELECTED 19

20 DATA OUTPUT Iso-SeqTM: Full-Length Transcript Analysis Using SMRT Analysis V2.3 Philip Lobb MSc, 18th March

21 DATA OUTPUT Read classification Reads are grouped as full-length non-chimeric (flnc) and non-full-length (nfl) cdna Fraction Sample No of reads of insert (*1000) No of flnc reads (*1000) Flnc reads (%) Average flnc length (*1000) 1-2 kb 1 89,0 43,4 48 1,2 2 86,2 43,4 50 1,2 2-3 kb 1 93,2 42,1 45 2,3 2 88,3 39,2 44 2,3 3-6 kb 1 171,0 55,8 32 2, ,4 54,1 31 3, kb 1 158,9 45,1 28 5, ,5 40,8 30 5,9 21

22 DATA OUTPUT Reads clustering, consesnsus calling and quality filtering Sample No of consensus isoforms (*1000) Average consensus isoforms read length No of polished high-quality isoforms (*1000) No of polished low-quality isoforms(*1000) 1 106,7 3,3 38,4 68, ,2 3,6 35,6 74,5 high quality clusters =99% accuracy after quiver polishing 22

23 DATA OUTPUT Coverage of the transctiptoms All annotated junctions in the transcriptomes were covered by the reads. Coverage by the hq isoforms is close to reach plateau. SAMPLE 1 reads hq clusters 23

24 Conclusions 1. DNA sequences of Muc13A and Muc13B cloned in BACs showed differences in tandem repeat regions: PacBio sequencing 2. Are those TRs transcribed? Yes. 3. Are there any other candidates in the critical region in pig chromosome 13? 4. Are there any other E.coli receptor candidiates in the pig genome? Phenotype (res. Vs. sus.) specfic transcripts were detected, will be confirmed with deep sequencing data. 5. Are we able to improve pig reference genome annotation? Yes, novel junctions detected. 24

25 ACKNOWLEDGEMENTS: Weihong Qi, FGCZ Andrea Patrignani, FGCZ Stefan Neuenschwander, ETHZ 25