Barcode Sequence Alignment and Statistical Analysis (Barcas) tool
|
|
- Chad Fisher
- 6 years ago
- Views:
Transcription
1 Barcode Sequence Alignment and Statistical Analysis (Barcas) tool Mun, Jihyeob and Kim, Seon-Young Korea Research Institute of Bioscience and Biotechnology
2 Barcode-Sequencing Ø Genome-wide screening method based on sequencing the counts of tens of thousands of individual tags (barcodes) for each gene for a given condition Ø Originally developed as yeast deletion libraries such as Saccharomyces cerevisiae and Schizosaccharomyces pombe Ø Now applied for genome-wide sirna or shrna screening to measure the effects of knock-down of genes Ø Or, using CRISPR-Cas9, applied for genome-wide sgrna screening for the effects of gene knock-out 2
3 Examples of genome-wide barcode-sequencing libraries Contents Organism # of genes # of barcodes References Yeast deletion consortium S. cerevisiae 6,343 2 (UP and DN) www-sequence.stanford.edu/group/ Bioneer pombe collection S. pombe 4,836 2 (UP and DN) MISSION shrna (human) H. sapiens 20, ,696 shrna MISSION shrna (human) M. musculus 21, ,072 shrna TRC1 (human) shrna H. sapiens 16,019 80,717 shrna TRC1 (mouse) shrna M. musculus 15,960 77,819 shrna Human DECIPHER (shrna) H. sapiens 15, shrnas Mouse DECIPHER (shrna) M. musculus 9, shrnas Cellecta Genome-wide shrna H. sapiens 19,276 8 shrnas Cellecta Genome-wide CRISPR H. sapiens 19,001 8 sgrnas Human GeCKO v2 H. sapiens 19, ,411 sgrna Mouse GeCKO v2 M. musculus 20, ,209 sgrna Mouse genome-wide v1 (yusa) M. musculus 19,150 87,897 sgrna Oxford fly Drosophila 13,501 40,279 sgrna CRISPRa H. sapiens 15, ,810 sgrna CRISPRi H. sapiens 11, ,421 sgrna 3
4 Workflow : barcoded yeast deletion strains 4
5 Workflow : genome-wide shrna screening 5
6 Basic format of barcode-seq data MID (Multiplexing Index, 4-6 bp) Universal Primer (20-25 bp) Barcode (20-30 bp) 6
7 Steps of barcode-seq data analysis Pre-processing and QC Multiplex Index (4-6 bp) Universal Primer (20-bp) Barcode (20-30 bp) Trim index Trim primer Map and count each TAG Visualization Statistical Analyses Normalization sample1 Sample2 sample3 tag tag tag tag
8 Current tools and methods for barcode-seq data analysis Tool (or method) QC Normal ization Statistical Analysis Visuali zation Software format Barcas O O O O O Java GUI Mun 2016 BMC Bioinfo Barcode Deconvoluter Preprocessing BiNGS!LSseq & edger O X X X X Windows or Mac GUI Ref. software O O O O X R package Kim 2012 Method Mol Biol edger O X O O X R package Dai 2014 F1000 Res HiTSelect X X X Multi-objective ranking O Matlab runtime MAGeCK O O O O X Python, C source code MAGeCK- VISPR O O O Robust rank aggregation RIGER X X X RNAi Gene Enrichment Ranking RSA X X X Iterative hypergeometric P- value Diaz 2015 Nuc Acids Res Li 2014 Genome Bio O Python script Li 2015 Genome Bio O GENE-E (=> Morpheus) Java GUI X Windows GUI (C#), R, Perl Luo 2008 PNAS Konig 2007 Nat Methods ScreenBEAM X X X Pooled scoring X R package Yu 2015 Bioinformatics shalign & shrnaseq O O O O X Perl and R script Sims 2011 Genome Bio 8
9 Barcas (Barcode sequence Alignment and Statistical Analysis) - Barcas is an all-in-one program for the analysis of multiplexed barcode sequencing (barcode-seq) data - Available at Input: Barcode-seq data Genome-wide shrnas (Cellecta, TRC, Sigmaaldrich, etc) Genome-wide sgrnas (Addgene, Cellecta, etc) barcoded yeast deletion strains: S. cerevisae or S. pombe Ø Preprocessing & Mapping Filtering, trimming, and mapping with mismatches and indels Ø Quality Control (of barcodes and samples) Ø Normalization Ø Statistical Analysis Two-condition comparison, multiple time points. Ø Visualization Various graphs and heatmap 9
10 All in one package with user-friendly GUI Step 1: Pre-processing & Mapping Step 2: QC of data quality Step 3: Design experiment Step 4: Statistical analysis 10
11 Step 1: Data preprocessing and mapping Ø De-multiplexing and trimming (universal primers) Ø Mapping with imperfect matches (mismatches and indels) Ø Searching for individual tag sequences 11
12 Step 2: Data quality evaluation Ø Sequence level: overall sequence quality Ø Sample level: mapping counts and percentage, etc Ø Barcode (or tag) level: mapping counts and percentage, etc 12
13 Step 3: Experimental design Ø Comparison of two conditions Ø Across several different time points 13
14 Step 4: Statistical analysis and Visualization Ø Calculates z-score and p-value for each barcode Ø Ranks each barcode by z-score Ø Plots z-score graph Ø Plots time dependent intensity heat-map Ø Allows searching for individual target gene 14
15 Novel functions of Barcas for data pre-processing and QC Ø Flexible mapping with support for both substitution s and indels Ø Detection of erroneous barcodes in the library Ø Checking similarity among barcodes in the library collection 15
16 Existing tools for data preprocessing Name Mismatches Shifts of the position BiNGS!LSseq shalign Indel Backend tool O X X bowtie O X X Perl script (or bowtie) edger O O X edger Barcas O O O Trie data structure Ref. Kim (2012) Methods Mol Bio Sims (2011) Genome Bio Dai (2014) F1000Res Mun (2016) BMC Bioinfo Original barcode Perfect match Mismatches Position shift Indel MID Universal Primer Barcode (shrna) TCAAAGATAGTCACGCGACCTCATCGACGAGCTACC TCAAAGATAGTCACGCGACCTCATCGACGAGCTACC TCAAAGATAGTCACGCGACCTCATCGACGAGCTACC TCAAAGATAGTCACGCGACC-ATCGACGAGCTACC TCAAAGATAGTCACGCGACCTCATCGA--AGCTACC 16
17 Algorithm : List based Maximum time : N * M (N: read count, M: reference count) read AGCT Library reference CGCT GCCAA TTAG TCAGT GCAG TTAT AGCT Trie data structure Ø Data structure based on prefix tree Ø Useful data structure to store a dynamic set or associate array in which the keys are usually strings Ø More efficient than hash table (or dictionary) or lists in terms of look-up speed an d memory 1:M sequence matching processing 1:1 sequence matching processing Algorithm : Tree based read AGCT Maximum time : N (N: read count) Library reference root A T G C G C T T T A G C A G A G C C A G C T T A
18 1. Data structure of Barcas for mapping - Based on trie data structure, Barcas supports imperfect matching allowing mismatches, base shifting and indels - Dynamic sequence lengths - Dynamic start positions 18
19 Comparison of speed and mapping rate of barcas with bowtie and edger package of R Data 215 million reads were mapped to 4,832 heterozygous diploid deletion strains in S. pombe. 45-bp sequences were used as barcode library. Option Result Barcas was 1.7 times faster than bowtie and 13 times faster than edger. Owing to indel mapping, Barcas mapped at least 8-12% more than the other two programs.
20 2. Detection of erroneous barcodes from the genome-wide barcode library Ø We are likely to assume that barcode sequences in the li brary are perfectly error-free from the original design Ø However, errors can creep in the barcodes during many steps including barcode synthesis, random mutations during library maintenance, erroneous incorporation of barcodes into the genome in case of yeast strains. 20
21 Erroneous barcodes in the yeast library Eason et al (2004) Characterization of synthetic DNA bar codes in Saccharomyces cerevisiae gene-deletion strains PNAS 101(30): Smith et al (2009) Quantitative phenotyping via deep barcode sequencing Genome Res 19: # correct by Smith % correct by Smith # correct by Easton % correct by Easton U1 UpTag U2 D2 DnTag D1 4,242 4,369 4,045 4,207 4,320 3, % 82.5% 82.9% 80.9% 83.1% 83.7% ,764 4,057 4,343 3,807 4, % 71.1% 83.2% 83.5% 73.2% 88.7% % Agreed 86% 84.4% 89.2% 92.6% 85.1% 92% 21
22 A simple method to detect erroneous barcodes Measure the amount of gains in count between perfect match only and (PM + MM) Original design Dominant Perfect Match with minor Mismatches ACTGACTGACTGACTGACTG Counts Perfect ACTGACTGACTGACTGACTG 50,000 Mismatch 1 ACTGACTGACTGACTGCCTG 10 Mismatch 2 ACTCACTGACTGACTGACTG 9 Mismatch 3 ACTGACAGACTGACTGACTG 20 Mismatch 4 ACTGACTGACTTACTGACTG 3 Mismatch 5 AGTGACTGACTGACTGACTG 7 Mismatch 6 ACTGACTGACTGACTGTCTG 12 Mismatch 7 ACTGACTGACTAACTGACTG 5 PM only 50,000 PM + MM 50,065 Gain 50,565/50,000 = 1.013% 0.13% gain One dominant Mismatch with minor Perfect Match and other Mismatches Original design ACTGACTGACTGACTGACTG Counts Perfect ACTGACTGACTGACTGACTG 200 Mismatch 1 ACTGACTGACTGACTGCCTG 40,000 Mismatch 2 ACTCACTGACTGACTGACTG 11 Mismatch 3 ACTGACAGACTGACTGACTG 12 Mismatch 4 ACTGACTGACTTACTGACTG 3 Mismatch 5 AGTGACTGACTGACTGACTG 12 Mismatch 6 ACTGACTGACTGACTGTCTG 9 Mismatch 7 ACTGACTGACTAACTGACTG 5 PM only 20 PM + MM 40,071 Gain 40,071/200 = % 200% gain
23 Detection of erroneous barcodes Ø Library : 1,230 shrna sequences of TRC library. Ø Data : Control samples in neuroepithelial (NE), early radial glial (ERG) and mid radial glial (MRG) Ø We found 25 erroneous barcodes (2.03%). Ziller,MJ. et al., Nature 2015, 518,
24 Detection of erroneous barcodes (TRC) Gene ID Original sequence Major mapped (Two mismatch/indels) PM count MM count PBX2 TRCN ATACTCCCACTTGCAACTATT ATACTCCCACTTGTAACTATT 10,785 34,084 SKI TRCN GAATCTGCCACTCTCAGAATA -AATCTGCCACTCTCAGAATA 14 5,935 TERF2IP TRCN GAGAGTTCTTGCATTGGAACT -AGAGTTCTTGCATTGGAACT 4 1,244 SKI TRCN GATCGAAGACCTGCAGGTGAA -ATCGAAGACCTGCAGGTGAA MYC TRCN GAATGTCAAGAGGCGAACACA -AATGTCAAGAGGCGAACACA JDP2 TRCN CGGGAGAAGAACAAAGTCGCA CGGGAGAAGAACAAAAACGCA TFAP2B TRCN CGGTTCTTTCGAGTTTAGTAA CGGTTCTTTTGAGTTTTGTAA NFFKB TRCN CAGGGAGGTTGCATCATTGTT CAGGGAGGGTGCATCATTGTT KLF13 TRCN CGGGCGAGAAGAAGTTCAGCT CGGGCGAGAAGAAGTTCATGGT
25 3. Check for sequence similarity among barcodes in a reference Ø Erroneous barcodes can potentially be generated during the production of many barcodes. Ø If two barcodes were designed similarly (i.e only 1 bp difference) and mutations or sequencing errors occur, then it will be hard to distinguish errors from true differences. Ø Thus, barcodes originally designed to be similar should be identified (and flagged) in advance. Ø For this purpose, Barcas allows checking of sequence similarity among barcode sequences. 25
26 Library reference QC Tested public library sets (11) Screen Library Date Species Module TRC 05/Apr/11 Barcode length Barcode count Gene count 21-bp 61,621 15,435 shrna sgrna Human Module1 18-bp 27,500 5,046 Cellecta 15/Feb/12 Module2 18-bp 27,500 5,421 Module3 18-bp 27,500 4,923 yusa Mouse 19-bp 87,437 19,149 CeCKOv2 09/Mar/15 Human Library A 20-bp 63,950 21,669 Library B 20-bp 56,869 19,834 Mouse Library A 20-bp 65,959 22,486 Library B 20-bp 61,139 21,263 Deletion mutant strains Heterozyg ous diploid Saccharomyces cerevisiae Schizosaccharomyces pombe 20-bp 20-bp 6,318/UP 6,126/DN 4,832/UP 4,832/DN 6,131 4,832 26
27 Library reference QC Barcode counts having similar pairs within one base Library Static sequence length comparison Dynamic sequence length Comparison (indels) GeCKOv2.Human.A 517 (0.81%) 538 (0.84%) GeCKOv2.Human.B 437 (0.77%) 441 (0.78%) GeCKOv2.Mouse.A 736 (1.12%) 755 (1.14%) GeCKOv2.Mouse.B 850 (1.39%) 860 (1.41%) yusa 517 (0.59%) 3,944 (4.51%) Cellecta.Human.M1 0 (0 %) 412 (1.5%) Cellecta.Human.M2 0 (0 %) 398 (1.45%) Cellecta.Human.M3 0 (0 %) 410 (1.49%) TRC 790 (1.28%) 1,909 (3.10%) S. cerevisiae 0 (0 %) 0 (0 %) S. pombe 0 (0 %) 0 (0 %) 27
28 Conclusions Ø Barcas is an all-in-one software for barcode-seq data analysis with user-friendly interface and a few new useful functions for data pre-processing and quality control of barcode library Ø Future improvements Supports for diverse statistical analyses Sophisticated gene-level summary statistics for shrna and sgrna RSA, RIGER, MAGeCK, HiTSelect, ScreenBEAM, etc Multiple-condition comparison (MAGeCK-VISPR) Utilization of metadata and gene-set level analysis (HiTSelect) Ø We hope Barcas will be useful for many researchers with minimal bioinformatics skills for barcode-seq data analysis 28
29 Thank you for your attention 29
30 Limits of the mapping of edger package 1. Indels in the barcode reads are not supported 2. Only shifts of the barcode positions allowed 3. Mismatches in the MID, universal primers not allowed 4. Indels in the MID and universal primers not allowed Loss of sequences with indels in any of the MID, primers and barcodes Loss of sequences with mismatches in the MID and primers Read format MID Universal Primer Barcode (shrna) Example 1: TRC Library Different primer lengths of universal primers: Forward: 37 bp, reverse 42 bp Example 2: Cellecta library Different MID lengths: From 9 to 17 bp Universal Primer (sense) Barcode (shrna) MID Universal Primer Barcode (shrna) Universal Primer (anti-sense) Barcode (shrna) MID Universal Primer Barcode (shrna) MID Universal Primer Barcode (shrna) 30
Analysis of barcode sequencing
Analysis of barcode sequencing Department of Functional Genomics, UST Jihyeob Mun 2016.12.07 Pooled library screen analysis experience knowledge gene A is a target? High-throughput Simplicity Fail Pooled
More informationImproving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm
Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm Anja Smith Director R&D Dharmacon, part of GE Healthcare Imagination at work crrna:tracrrna program Cas9 nuclease Active crrna is
More informationaxe Documentation Release g6d4d1b6-dirty Kevin Murray
axe Documentation Release 0.3.2-5-g6d4d1b6-dirty Kevin Murray Jul 17, 2017 Contents 1 Axe Usage 3 1.1 Inputs and Outputs..................................... 4 1.2 The barcode file......................................
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationLong and short/small RNA-seq data analysis
Long and short/small RNA-seq data analysis GEF5, 4.9.2015 Sami Heikkinen, PhD, Dos. Topics 1. RNA-seq in a nutshell 2. Long vs short/small RNA-seq 3. Bioinformatic analysis work flows GEF5 / Heikkinen
More informationL3: Short Read Alignment to a Reference Genome
L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list
More informationSupporting Information
Supporting Information Ho et al. 1.173/pnas.81288816 SI Methods Sequences of shrna hairpins: Brg shrna #1: ccggcggctcaagaaggaagttgaactcgagttcaacttccttcttgacgnttttg (TRCN71383; Open Biosystems). Brg shrna
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationSUPPLEMENTAL MATERIALS
SUPPLEMENL MERILS Eh-seq: RISPR epitope tagging hip-seq of DN-binding proteins Daniel Savic, E. hristopher Partridge, Kimberly M. Newberry, Sophia. Smith, Sarah K. Meadows, rian S. Roberts, Mark Mackiewicz,
More informationRNA-Seq Software, Tools, and Workflows
RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:
More informationTechnical note: Molecular Index counting adjustment methods
Technical note: Molecular Index counting adjustment methods By Jue Fan, Jennifer Tsai, Eleen Shum Introduction. Overview of BD Precise assays BD Precise assays are fast, high-throughput, next-generation
More informationSupplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna
Supplementary Figure 1: sgrna library generation and the length of sgrnas for the functional screen. (a) A diagram of the retroviral vector for sgrna expression. It contains a U6-promoter-driven sgrna
More informationCRISPR/Cas9 Mouse Production
CRISPR/Cas9 Mouse Production Emory Transgenic and Gene Targeting Core http://cores.emory.edu/tmc Tamara Caspary, Ph.D. Scientific Director Teresa Quackenbush --- Lab Operations and Communications Coordinator
More informationIllumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme
Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in
More informationIntroductory Next Gen Workshop
Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview
More informationWhat is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.
What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer
More informationGalina Gabriely, Ph.D. BWH/HMS
Galina Gabriely, Ph.D. BWH/HMS Email: ggabriely@rics.bwh.harvard.edu Outline: microrna overview microrna expression analysis microrna functional analysis microrna (mirna) Characteristics mirnas discovered
More informationData Analysis with CASAVA v1.8 and the MiSeq Reporter
Data Analysis with CASAVA v1.8 and the MiSeq Reporter Eric Smith, PhD Bioinformatics Scientist September 15 th, 2011 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense
More informationIntroduction to RNA-Seq
Introduction to RNA-Seq Monica Britton, Ph.D. Sr. Bioinformatics Analyst March 2015 Workshop Overview of RNA-Seq Activities RNA-Seq Concepts, Terminology, and Work Flows Using Single-End Reads and a Reference
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationDharmacon TM solutions for studying gene function
GE Healthcare Capabilities Overview Dharmacon TM solutions for studying gene function RNAi Gene Expression Gene Editing RNA Interference Our RNAi products encompass the most complete portfolio of innovative
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationRNA-seq Data Analysis
Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture
More informationAnalysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004
Analysis of large deletions in human-chimp genomic alignments Erika Kvikstad BioInformatics I December 14, 2004 Outline Mutations, mutations, mutations Project overview Strategy: finding, classifying indels
More information3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome
Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts
More informationAn Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis
An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis Lina L. Faller Department of Computer Science University of New Hampshire June 2008
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationAssembly of Ariolimax dolichophallus using SOAPdenovo2
Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.
Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a
More informationSingle Cell Genomics
Single Cell Genomics Application Cost Platform/Protoc ol Note Single cell 3 mrna-seq cell lysis/rt/library prep $2460/Sample 10X Genomics Chromium 500-10,000 cells/sample Single cell 5 V(D)J mrna-seq cell
More informationVariant calling in NGS experiments
Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, August 2017 Friederike Dündar with Luce Skrabanek & Ceyda Durmaz Day 4 overview (brief) theoretical
More informationGenome Sequence Enables Systematic Approaches. Genome Sequence. High-throughput studies:
Genome Sequence Enables Systematic Approaches Genome Sequence High Throughput Genetic Reagents Deletion Libraries CRISPR-Cas9 Libraries RNAi Libraries Expression-based Libraries Experiments Gene Function
More informationADNA barcode is a short DNA sequence that uniquely
Design of 240,000 orthogonal 25mer DNA barcode probes Qikai Xu a, Michael R. Schlabach a, Gregory J. Hannon b, and Stephen J. Elledge a,1 a Department of Genetics, Center for Genetics and Genomics, Brigham
More informationPackage DNABarcodes. March 1, 2018
Type Package Package DNABarcodes March 1, 2018 Title A tool for creating and analysing DNA barcodes used in Next Generation Sequencing multiplexing experiments Version 1.9.0 Date 2014-07-23 Author Tilo
More informationIntroduction: Methods:
Eason 1 Introduction: Next Generation Sequencing (NGS) is a term that applies to many new sequencing technologies. The drastic increase in speed and cost of these novel methods are changing the world of
More informationQIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd
QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s
More informationBioinformatics Advice on Experimental Design
Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics
More informationHigh-throughput functional genomics using CRISPR Cas9
Nature Reviews Genetics AOP, published online 9 April 2015; doi:10.1038/nrg3899 REVIEWS High-throughput functional genomics using CRISPR Cas9 Ophir Shalem, Neville E. Sanjana and Feng Zhang Abstract Forward
More informationBST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data
BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%
More informationHLA and Next Generation Sequencing it s all about the Data
HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public
More informationGene Expression Technology
Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene
More informationTargeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales
Targeted Sequencing Using Droplet-Based Microfluidics Keith Brown Director, Sales brownk@raindancetech.com Who we are: is a Provider of Microdroplet-based Solutions The Company s RainStorm TM Technology
More informationSerial Analysis of Gene Expression
Serial Analysis of Gene Expression Cloning of Tissue-Specific Genes Using SAGE and a Novel Computational Substraction Approach. Genomic (2001) Hung-Jui Shih Outline of Presentation SAGE EST Article TPE
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationGenetics Lecture 21 Recombinant DNA
Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of
More informationTSSpredator User Guide v 1.00
TSSpredator User Guide v 1.00 Alexander Herbig alexander.herbig@uni-tuebingen.de Kay Nieselt kay.nieselt@uni-tuebingen.de June 3, 2013 1 Getting Started TSSpredator is a tool for the comparative detection
More informationWordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar
W412 W416 Nucleic Acids Research, 2005, Vol. 33, Web Server issue doi:10.1093/nar/gki492 WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar Guandong
More informationSupplementary Material. Manuscript title: Cross-immunity and community structure of a multiple-strain pathogen in the
Supplementary Material Manuscript title: Cross-immunity and community structure of a multiple-strain pathogen in the tick vector Description of the primers used in the second PCR: The forward and the reverse
More informationRNA Sequencing Analyses & Mapping Uncertainty
RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly
More informationPersonal Genomics Platform White Paper Last Updated November 15, Executive Summary
Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,
More informationHigh Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays
High Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays Ali Pirani and Mohini A Patil ISAG July 2017 The world leader in serving science
More informationSeqStudio Genetic Analyzer
SeqStudio Genetic Analyzer Optimized for Sanger sequencing and fragment analysis Easy to use for all levels of experience From a leader in genetic analysis instrumentation, introducing the new Applied
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Origin use and efficiency are similar among WT, rrm3, pif1-m2, and pif1-m2; rrm3 strains. A. Analysis of fork progression around confirmed and likely origins (from cerevisiae.oridb.org).
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationThe String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More information2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome
Protein-Protein Interactions Protein Interactions A Protein may interact with: Other proteins Nucleic Acids Small molecules Protein-Protein Interactions: The Interactome Experimental methods: Mass Spec,
More informationRelationship of Gene s Types and Introns
Chi To BME 230 Final Project Relationship of Gene s Types and Introns Abstract: The relationship in gene ontology classification and the modification of the length of introns through out the evolution
More informationFinding the LIMS of Your Dreams
Finding the LIMS of Your Dreams A Practical Guide for the Next Generation Sequencing Lab Today, the technologies and methods pioneered during the Human Genome Project have revolutionized the life-science
More informationWhole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist
Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, August 2017 Friederike Dündar with Luce Skrabanek & Ceyda Durmaz Day 1: Introduction into high-throughput
More informationSOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit
SOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit Agenda SOLiD Total RNAseq Kit Overview Kit Configurations Barcoding Kit Introduction New Small RNA and WT Workflow Small RNA Workflow Step-by-step Workflow
More informationEcole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech
GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG
More informationRegulation of eukaryotic transcription:
Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:
More informationBio-Reagent Services. Custom Gene Services. Gateway to Smooth Molecular Biology! Your Innovation Partner in Drug Discovery!
Bio-Reagent Services Custom Gene Services Gateway to Smooth Molecular Biology! Gene Synthesis Mutagenesis Mutant Libraries Plasmid Preparation sirna and mirna Services Large-scale DNA Sequencing GenPool
More informationALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG
Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press
More informationGenome research in eukaryotes
Functional Genomics Genome and EST sequencing can tell us how many POTENTIAL genes are present in the genome Proteomics can tell us about proteins and their interactions The goal of functional genomics
More informationSupplementary Figure 1
number of cells, normalized number of cells, normalized number of cells, normalized Supplementary Figure CD CD53 Cd3e fluorescence intensity fluorescence intensity fluorescence intensity Supplementary
More informationRNA-Seq with the Tuxedo Suite
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationNext Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes
Next Generation Sequencing Technologies Some slides are modified from Robi Mitra s lecture notes What will you do to understand a disease? What will you do to understand a disease? Genotype Phenotype Hypothesis
More informationGenomic Data Analysis Services Available for PL-Grid Users
Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space PLGrid Plus Domain-oriented services and resources of Polish Infrastructure
More informationFigure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.
Summary of Supplemental Information Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA. Figure S2: rrna removal procedure is effective for clearing out
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationRNASEQ WITHOUT A REFERENCE
RNASEQ WITHOUT A REFERENCE Experimental Design Assembly in Non-Model Organisms And other (hopefully useful) Stuff Meg Staton mstaton1@utk.edu University of Tennessee Knoxville, TN I. Project Design Things
More informationVCGDB: A Virtual and Dynamic Genome Database of the Chinese Population
VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population Jiayan Wu Associate Professor Director of Science and Technology Department Director of Core Facility Beijing Institute of Genomics,
More informationDe novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly
Additional File 1 De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly Monika Frazier, Martin Helmkampf, M. Renee Bellinger, Scott Geib, Misaki
More informationInnovative Trait Development Tools in Plant Breeding will be Crucial for Doubling Global Agricultural Productivity by 2050
Innovative Trait Development Tools in Plant Breeding will be Crucial for Doubling Global Agricultural Productivity by 2050 Greg Gocal, Ph.D., Senior Vice President, Research and Development CRISPR Precision
More informationAnalysing genomes and transcriptomes using Illumina sequencing
Analysing genomes and transcriptomes using Illumina uencing Dr. Heinz Himmelbauer Centre for Genomic Regulation (CRG) Ultrauencing Unit Barcelona The Sequencing Revolution High-Throughput Sequencing 2000
More informationIntroduction. Lance Martin, BIOFAB Operations
Introduction Lance Martin, BIOFAB Operations Several capacities needed to support EOU engineering Design Libraries Feature 1 Variants Feature 2 Variants Assemble Clone Assay Analyze Feature 1 Feature 2
More informationarxiv: v1 [q-bio.gn] 25 Nov 2015
MetaScope - Fast and accurate identification of microbes in metagenomic sequencing data Benjamin Buchfink 1, Daniel H. Huson 1,2 & Chao Xie 2,3 arxiv:1511.08753v1 [q-bio.gn] 25 Nov 2015 1 Department of
More informationNext Generation Sequencing: Data analysis for genetic profiling
Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction
More informationComplementary Technologies for Precision Genetic Analysis
Complementary NGS, CGH and Workflow Featured Publication Zhu, J. et al. Duplication of C7orf58, WNT16 and FAM3C in an obese female with a t(7;22)(q32.1;q11.2) chromosomal translocation and clinical features
More informationRNA Secondary Structure Prediction Computational Genomics Seyoung Kim
RNA Secondary Structure Prediction 02-710 Computational Genomics Seyoung Kim Outline RNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction
More informationCancer Genetics Solutions
Cancer Genetics Solutions Cancer Genetics Solutions Pushing the Boundaries in Cancer Genetics Cancer is a formidable foe that presents significant challenges. The complexity of this disease can be daunting
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationMutations during meiosis and germ line division lead to genetic variation between individuals
Mutations during meiosis and germ line division lead to genetic variation between individuals Types of mutations: point mutations indels (insertion/deletion) copy number variation structural rearrangements
More informationQuick reference guide
Quick reference guide Our Invitrogen GeneArt CRISPR Search and Design Tool allows you to search our database of >600,000 predesigned CRISPR guide RNA (grna) sequences or analyze your sequence of interest
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationNext Generation Sequencing. Target Enrichment
Next Generation Sequencing Target Enrichment Next Generation Sequencing Your Partner in Every Step from Sample to Data NGS: Revolutionizing Genetic Analysis with Single-Molecule Resolution Next generation
More informationRADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé
RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi
More informationBio 311 Learning Objectives
Bio 311 Learning Objectives This document outlines the learning objectives for Biol 311 (Principles of Genetics). Biol 311 is part of the BioCore within the Department of Biological Sciences; therefore,
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationNext Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms
Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality
More informationRIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)
Application Note: RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP) Introduction: Innovations in DNA sequencing during the 21st century have revolutionized our ability to obtain nucleotide information
More informationRNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia
RNA-Seq Workshop AChemS 2017 Sunil K Sukumaran Monell Chemical Senses Center Philadelphia Benefits & downsides of RNA-Seq Benefits: High resolution, sensitivity and large dynamic range Independent of prior
More informationBioinformatics small variants Data Analysis. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More information