Construction of a De Bruijn Graph for Assembly from a Truncated Sux Tree

Similar documents
Chapter 9. Quadratics

Tree Shelters Fail to Enhance Height Growth of Northern Red Oak in the Upper Peninsula of Michigan. 1

CHAPTER 2 RELATIONAL MODEL

Name Period Date. Grade 7 Unit 1 Assessment. 1. The number line below shows the high temperature in Newark, in degrees Fahrenheit, on Monday.

Crystal Structure. Dragica Vasileska and Gerhard Klimeck

Primer in Population Genetics

Population Distribution

Three-Phase Wound-Rotor Induction Machine with Rotor Resistance

SUPPLEMENTARY INFORMATION

Return Temperature in DH as Key Parameter for Energy Management

European Treaty Series - No. 158 ADDITIONAL PROTOCOL TO THE EUROPEAN SOCIAL CHARTER PROVIDING FOR A SYSTEM OF COLLECTIVE COMPLAINTS

Service Architecture. T.C. Lea-Cox, A Lesson for the CMDB from Containerised Cargo Services. Introduction. Overview of Container Movement

P6.1. Magnetic position sensor with low coercivity material

CORRELATION BETWEEN MELT POOL TEMPERATURE AND CLAD FORMATION IN PULSED AND CONTINUOUS WAVE ND:YAG LASER CLADDING OF STELLITE 6

2016 Prelim Essay Question 2

Academic. Grade 9 Assessment of Mathematics. Spring 2009 SAMPLE ASSESSMENT QUESTIONS

Some EOQ Model for Weibull Deterioration Items with Selling Price Dependent Demand

Appendix B Equivalency Tables

COMPUTER PROGRAM FOR CLIMATOLOGICAL PARAMETERS CALCULATION AND RADIATION SIMULATION

Technical data Sliding door fittings

The basic model for inventory analysis

A Genetic Algorithm based Approach for Cost worthy Route Selection in Complex Supply Chain Architecture

Modular ( agent-agnostic ) Human-in-the-loop RL. Owain Evans University of Oxford

6.1 Damage Tolerance Analysis Procedure

Small Business Cloud Services

Coordinate geometry. In this chapter. Areas of study. Units 3 & 4 Functions, relations and graphs Algebra

[ HOCl] Chapter 16. Problem. Equilibria in Solutions of Weak Acids. Equilibria in Solutions of Weak Acids

The Effect of Nitrogen Fertilizers (Urea, Sulfur Coated Urea) with Manure on the Saffron Yield

AUTOMATICALLY DEFINED FUNCTIONS IN GENE EXPRESSION PROGRAMMING

Food Arthropod Abundance Associated with Rest-Rotation Livestock Grazing. Hayes B. Goosey. Department of Animal and Range Sciences

Three-Phase Wound-Rotor Induction Machine with a Short- Circuited Rotor

Conservation Tillage Strategies For Corn, Sorghum And Cotton

INTERSTITIAL VOIDS IN TETRAHEDRALLY AND IN THREE-FOLD BONDED ATOMIC NETWORKS

Lecture 5: Minimum Cost Flows. Flows in a network may incur a cost, such as time, fuel and operating fee, on each link or node.

Supplementary Material

ISO 6947 INTERNATIONAL STANDARD. Welding and allied processes Welding positions. Soudage et techniques connexes Positions de soudage

Report to the Southwest Florida Water Management District. Effects of Microsprinkler Irrigation Coverage on Citrus Performance

Two level production inventory model with exponential demand and time dependent deterioration rate

AN EXTENDED NEWSVENDOR MODEL FOR SOLVING CAPACITY CONSTRAINT PROBLEMS IN A MULTI-ITEM, MULTI-PERIOD ENVIROMENT

ELECTRICALLY CONDUCTIVE STRUCTURAL ADHESIVES BASED ON BUCKYPAPERS

The advanced agronomic training system in Morocco

p Coaches j i m Recruitment Dimensions Report Name Ali Example Date of Report: 29/06/2016 Elements report 3

We engineer your success. All over the world. Semi Automatic

Abstract # Strategic Inventories in a two-period Cournot Duopoly. Vijayendra Viswanathan Jaejin Jang. University of Wisconsin-Milwaukee

Economic Profitability and Sustainability of Canola Production Systems in Western Canada

High strength fine grained structural steel, thermo-mechanically rolled, for high temperature application

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

PAPER CHEMISTRY, APPLETON, WISCONSIN IPC TECHNICAL PAPER SERIES NUMBER 163 W. J. WHITSITT OCTOBER, 1985

(b) Is already deposited in a waste disposal site without methane recovery.

Multinational Logistics: a rapidly evolving, complex capability. Brigadier Jonathan Downes Head of Defence Logistic Operations and Plans UK MOD

PY2N20 Material Properties and Phase Diagrams

TSNAs in Burley and Dark Tobacco

ROLE OF THE FABRICATOR IN LABOR PRODUCTIVITY

Ie.P. Chvertko, A.Ie. Pirumov, M.V. Shevchenko

Soil-atmosphere N 2 O exchange in natural savannah, non-fertilized and fertilized agricultural land in Burkina Faso (W. Africa)

Exercising Market Power in Proprietary Aftermarkets

H. Randall Smith; Ph.D. Agronomy and Wayne Porter: Ph.D. Horticulture Mississippi State University Extension Service

Building better lithium-sulfur batteries: from LiNO 3 to solid oxide catalyst

Fibre-reinforced plastic composites Declaration of raw material characteristics Part 4: Additional requirements for fabrics

Irrigation Costs for Tomato Production in Florida * 1

a b c Nature Neuroscience: doi: /nn.3632

MONITORING OF RESISTANCE SPOT WELDING PROCESS WITH DECREASED SPOT-TO-SPOT DISTANCE

Building initial configuration. Ideal crystals (I)

The higher education strategy for agriculture in Yugoslavia

TSNAs in Burley and Dark Tobacco

Observing Patterns in Inherited Traits. Chapter 10

Silver-Tin Oxide. (Ag/SnO²)

2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT-2012)

ESTIMATION AND UTILIZATION OF STRUCTURE ANISOTROPY IN FORMING PIECES

ECONOMY-WIDE GAINS FROM DECENTRALIZED WATER ALLOCATION IN A SPATIALLY HETEROGENOUS AGRICULTURAL ECONOMY

3rd IASME/WSEAS Int. Conf. on Energy & Environment, University of Cambridge, UK, February 23-25, 2008

Effect of Transplant Size on Yields and Returns of Bell Peppers. Nathan Howard, Brent Rowell, and John C. Snyder Department of Horticulture

CHAPTER 5 SEISMIC RESERVOIR CHARACTERIZATION.

Copyright 1982 by ASME. Combined Cycles

PERFORMANCE ANALYSIS OF HYBRID COOLING TOWER BASED ON THE EFFECTIVENESS-NTU APPROACH

nm nm nm nm nm nm. Seed surface. oi-ab. oi-ad. ii-ab. ii-ad/endothelium. endosperm.

Dynamic AGV-Container Job Deployment Strategy

The point at which quantity demanded and quantity supplied come together is known as equilibrium. Price of a slice of pizza $2.00. Demand $2.50 $3.

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Key words: extrinsic optical fiber sensor, X-ray detector, X-ray radioluminescence

Chapter 9: Phase Diagrams

Optimal Solutions to Large Logistics Planning Domain Problems Detailed Proofs

THE AUTOMATIC CLASSIFICATION OF B&W AERIAL PHOTOS

The Retail Ombudsman complaint form

STATUS OF LAND-BASED WIND ENERGY DEVELOPMENT IN GERMANY

Phase Equilibria: Solubility Limit PHASE DIAGRAMS 10 0 Solubility 8 0 Limit ENT 145 Materials Engineering (liquid) atu (liquid solution 4 0

Cellular automata urban growth model calibration with genetic algorithms

API6A MATERIAL SERVICE CATEGORIES & RATING LEVELS

The Exploration and Application of Urban Agriculture in China. Dr. WEI Lingling Managing Director Beijing IEDA Protected Horticulture Co., Ltd.

CONSERVATION TILLAGE IMPROVES SOIL PHYSICAL PROPERTIES ON DIFFERENT LANDSCAPE POSITIONS OF A COASTAL PLAIN SOIL.

A vital connection. Getting the job done: What to expect. How we work

Progress in Business Innovation & Technology Management. Analysis of Hotel Service Quality Perceptions Using Fuzzy TOPSIS

Scrambling on Electrical Power Grids

Pre- and post-emergence applications of herbicides for control of resistant fineleaf sheep fescue

Seismic Response of Reinforced Concrete Diagonal- Braced Frames

NUTRIENT MANAGEMENT IN DUAL-USE WHEAT PRODUCTION

Bridge or Barrier? The Role of Transportation in Visiting National Parks by Racial/Ethnic Minorities

Robust Auctions for Revenue via Enhanced Competition

Laboratory Plant for Studies Regarding the Behaviour of the Sintered Materials in Oxygen Isotopic Distillation

Linked List Implementation of Discount Pricing in Cloud

Transcription:

Construction of De Bruijn Grph for Assemly from Truncted Sux Tree Bstien Czux, Thierry Lecroq, Eric Rivls LIRMM & IBC, Montpellier - LITIS Rouen Mrs 3, 2015

Introduction De Bruijn Grph for ssemly R = {c, cc, c, cc, cc} Czux, Lecroq, Rivls Truncted Sux Tree & DBG 1 / 30

Introduction De Bruijn Grph for ssemly R = {c, cc, c, cc, cc} c c c c Czux, Lecroq, Rivls Truncted Sux Tree & DBG 1 / 30

3 2 Introduction Generlized Sux Tree (GST) R = {c, cc, c, cc, cc} c 7 7 6 6 5 c c 6 6 5 4 c 1c 4 5 5 c 3 3c 4 5 4 1c 2c 3 2c 2c 1 4 3 2 1c 1c Czux, Lecroq, Rivls Truncted Sux Tree & DBG 2 / 30

3 2 Introduction Generlized Sux Tree with cut R = {c, cc, c, cc, cc} c 7 7 c 6 6 c 5 6 6 5 4 1 c 4 5 5 c c 3 3c 4 5 4 1c 2c 3 2c 2c 1 4 3 2 1c 1c Czux, Lecroq, Rivls Truncted Sux Tree & DBG 2 / 30

Introduction Truncted Sux Tree (TST) R = {c, cc, c, cc, cc} c c c 1 4 1 6 6 5c 3 4 c c 5 2 5 2 1 3 3 4 3 2 5 2 4 2 4 1 3 1 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 2 / 30

Introduction Motivtion De Bruijn Grph is lrgely used in de novo genome ssemly. [Pevzner et l., 2001] One uilds sux tree efore the ssemly for some pplictions, for instnce for the error correction. [Slmel, 2010] There exist lgorithms to uild directly the De Bruijn Grph [Onoder et l., 2013] [Rodlnd, 2013] nd the Contrcted De Bruijn Grph [Czux et l., 2014][Chikhi et l., 2014]. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 3 / 30

Introduction Indexing dt structures Numerous dt structures: sux tree, x tree, sux tle, fctor utomt, etc. to index one or severl texts (generlized index) functionnlly equivlent Czux, Lecroq, Rivls Truncted Sux Tree & DBG 4 / 30

Introduction Indexing dt structures Numerous dt structures: sux tree, x tree, sux tle, fctor utomt, etc. to index one or severl texts (generlized index) functionnlly equivlent Result: We cn directly uild the ssemly De Bruijn grph in the clssicl or contrcted form from n indexing dt structures.[czux et l., 2014] Czux, Lecroq, Rivls Truncted Sux Tree & DBG 4 / 30

Introduction Indexing dt structures Numerous dt structures: sux tree, x tree, sux tle, fctor utomt, etc. to index one or severl texts (generlized index) functionnlly equivlent Result: We cn directly uild the ssemly De Bruijn grph in the clssicl or contrcted form from n indexing dt structures.[czux et l., 2014] Question: How to do it without using more spce thn necessry? Czux, Lecroq, Rivls Truncted Sux Tree & DBG 4 / 30

Chin of sux-dependnt strings nd Tree 1 Chin of sux-dependnt strings nd Tree 2 Truncted Sux Tree (TST) 3 De Bruin Grph vi the TST 4 Conclusion Czux, Lecroq, Rivls Truncted Sux Tree & DBG 5 / 30

Chin of sux-dependnt strings nd Tree Chin of sux-dependnt strings nd Tree Czux, Lecroq, Rivls Truncted Sux Tree & DBG 5 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w v Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w v Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree String Denition [Guseld 1997] Let w string. sustring of w is string included in w, prex of w is sustring which egins w nd sux is sustring which ends w. n overlp etween w nd v is sux of w which is lso prex of v. w v u Czux, Lecroq, Rivls Truncted Sux Tree & DBG 6 / 30

Chin of sux-dependnt strings nd Tree Norm of set of words R = {c, cc, c, cc, cc} R = w i R w i R = 7 + 5 + 6 + 7 + 6 = 31 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 7 / 30

6 $ Chin of sux-dependnt strings nd Tree Sux Tree c$ $ c 1$c c$ 2$c c$ 4 3 5 7 c $ 1 2 3 4 5 6 7 1c 6 c 2c 3 c 4 5 c 1 2 3 4 5 6 Theorem The GST of set of words R tkes liner spce in R. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 8 / 30

Chin of sux-dependnt strings nd Tree Chin of sux-dependnt strings Denition A string x is sid to e sux-dependnt of nother string y if x[2.. x ] is prex of y. Let w e string nd m e positive integer smller thn w 1. A m-tuple of m strings (x 1,..., x m ) is chin of sux-dependnt strings of w if x 1 is prex of w nd for ech i [2, m], x i is prex of w[i, w ] such tht x i x i 1 1. w x 1 x 2 x 3 x 4 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 9 / 30

Chin of sux-dependnt strings nd Tree T (S) tree Denition Let R = {w 1,..., w n } e set of strings nd S = {C 1,..., C n } set of tuples such tht for i [1, n], C i is chin of sux dependnt strings of w i. T (S) is the tree of the contrcted Aho-Corsick tree of S. w x 1 x 2 x 3 x 4 3 1 4 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 10 / 30

Chin of sux-dependnt strings nd Tree Liner construction of T (S) Theorem For set of chins of sux-dependnt strings S of set of strings R, we cn construct T (S) in O( R ) time nd spce. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 11 / 30

Chin of sux-dependnt strings nd Tree Appliction to well known structures Exmple Let R = {w 1,..., w n } e set of strings nd S = {C 1,..., C n } set of tuples such tht for i [1, n], C i is chin of sux dependnt strings of w i. For n = 1, n S = {C 1 } the tuple of suxes of w 1, T (S) is the Contrcted Sux Tree of R, For C i the tuple of of suxes of w i for ll i [1, n],t (S) is the Generlised Contrcted Sux Tree of R. We cn construct the Truncted Sux Tree of [Peng et l., 2003] We cn construct the Generlised Truncted Sux Tree of [Schulz et l., 2008] Czux, Lecroq, Rivls Truncted Sux Tree & DBG 12 / 30

Truncted Sux Tree (TST) Truncted Sux Tree (TST) Czux, Lecroq, Rivls Truncted Sux Tree & DBG 13 / 30

Truncted Sux Tree (TST) Our Truncted Sux Tree Denitions For set of words R = {w 1, w 2,..., w n } nd n integer k > 0, we dene the following nottion. 1 F k (R) is the set of sustrings of length k of words of R. 2 Su k (R) is the set of suxes of length k of words of R. 3 For ll i [1, R ] nd j [1, w i k + 1], A k,i denotes the tuple such tht its j th element is dened y A k,i [j] := { w i [j, j + k] w i [j, w i ] if j w i k otherwise. 4 nd nlly A k is the set of these tuples: A k := n i=1 A k,i. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 14 / 30

Truncted Sux Tree (TST) Exmple of TST Proposition 1 A k,i is chin of sux-dependnt strings of w i. 2 Moreover, {w A k,i A k,i A k } = F k+1 (R) Su k (R). For R = {}, we hve A 4 = {(,,,,,, )}. w x 1 x 2 x 3 x 4 x 5 x 6 x 7 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 15 / 30

Truncted Sux Tree (TST) Liner construction of T (A k ) Corollry We cn construct T (A k ) in O( R ) time nd spce. For R = {}, we hve A 4 = {(,,,,,, )}. 3 4 6 1 7 5 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 16 / 30

Truncted Sux Tree (TST) Exmple of Truncted Sux Tree R = {c, cc, c, cc, cc} c c c 1 4 1 6 6 5c 3 4 c c 5 2 5 2 1 3 3 4 3 2 5 2 4 2 4 1 3 1 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 17 / 30

De Bruin Grph vi the TST De Bruin Grph vi the TST Czux, Lecroq, Rivls Truncted Sux Tree & DBG 18 / 30

De Bruin Grph vi the TST Exmple of construction: De Bruijn Grph DBG 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 19 / 30

De Bruin Grph vi the TST Exmple of construction: De Bruijn Grph DBG 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 19 / 30

De Bruin Grph vi the TST Exmple of construction: De Bruijn Grph DBG 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 19 / 30

De Bruin Grph vi the TST Exmple of construction: De Bruijn Grph DBG 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 19 / 30

De Bruin Grph vi the TST Truncted Sux Tree (TST) c c c 1 4 1 6 6 5c 3 4 c c 5 2 5 2 1 3 3 4 3 2 5 2 4 2 4 1 3 1 R = {c, cc, c, cc, cc} nd k = 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 20 / 30

De Bruin Grph vi the TST Truncted Sux Tree (TST) initil exct node 1 c 4 1 suinitil node c 2 1 initil node 5 2 4 2 R = {c, cc, c, cc, cc} nd k = 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 20 / 30

De Bruin Grph vi the TST Nodes of the de Bruijn Grph Nottion: Init(R) Let Init(R) denote the set of initil nodes of the TST of R. Property: node correspondence The set of k-mers of DBG k of R is isomorphic to Init(R). Czux, Lecroq, Rivls Truncted Sux Tree & DBG 21 / 30

De Bruin Grph vi the TST Arcs of the de Bruijn Grph Ide 1 Tke n initil node v 2 follow its sux link to node z (lose the rst letter of its k-mer) 3 if needed, go the children of z to nd its extensions 4 check whether the extensions re vlid Czux, Lecroq, Rivls Truncted Sux Tree & DBG 22 / 30

De Bruin Grph vi the TST Let v e n initil node, u its fther, nd z the node pointed t y the sux link of v. u SL(u) v Kinship property of sux links in sux trees z = SL(v) Let v e node of sux tree. If it exists, the sux link of v elongs to the su-tree of the sux link of p(v). Czux, Lecroq, Rivls Truncted Sux Tree & DBG 23 / 30

De Bruin Grph vi the TST Exmple of construction of rcs of DBG k v is initil exct with severl children Czux, Lecroq, Rivls Truncted Sux Tree & DBG 24 / 30

De Bruin Grph vi the TST DBG construction Theorem Given the TST of set of words R. The construction of the De Bruijn Grph tkes liner time in R. Proof All dierent cses of the typology re processed in constnt time. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 25 / 30

De Bruin Grph vi the TST DBG 2 of R emedded in the TST of R c c c 1 4 1 6 6 5c 3 4 c c 5 2 5 2 1 3 3 4 3 2 5 2 4 2 4 1 3 1 R = {c, cc, c, cc, cc} nd k = 2 Czux, Lecroq, Rivls Truncted Sux Tree & DBG 26 / 30

De Bruin Grph vi the TST Liner spce construction Theorem Given the TST of set of words R. The construction of the De Bruijn Grph tkes liner spce in the size of the De Bruijn Grph. Proof The size of the TST is liner in the size of the De Bruijn Grph of the sme order. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 27 / 30

Conclusion Conclusion Czux, Lecroq, Rivls Truncted Sux Tree & DBG 28 / 30

Conclusion Conclusion An lgorithm tht uilds the De Bruijn Grph from Truncted Sux Tree in liner time in the size of the input nd in liner spce in the size of the output. Czux, Lecroq, Rivls Truncted Sux Tree & DBG 29 / 30

Conclusion Funding nd cknowledgments Thnks for your ttention Questions? Czux, Lecroq, Rivls Truncted Sux Tree & DBG 30 / 30