Finishing Drosophila ananassae Fosmid 2410F24

Size: px
Start display at page:

Download "Finishing Drosophila ananassae Fosmid 2410F24"


1 Nick Spies Research Explorations in Genomics Finishing Report Elgin, Shaffer and Leung 23 February 2013 Abstract: Finishing Drosophila ananassae Fosmid 2410F24 Finishing Drosophila ananassae fosmid clone 2410F24 proved to be quite an undertaking. The initial assembly contained 6 large and 7 smaller contigs, along with an intimidating number of inconsistent mate pairs. In order to complete this project, the method that proved most helpful was a sequence anchoring algorithm that allowed for correct placement of repeat copies in a project where repeats are a major feature. Through sequence anchoring and force joins of individual and grouped contigs, a final assembly with only one gap was obtained. Introduction: The primary aim of this year s Bio4342 class, Research Explorations in Genomics, is to finish the remaining regions of the fourth Chromosome (F element), and the Muller D element, of the interesting species Drosophila ananassae. While Drosophila chromosome 4 is usually referred to as the dot chromosome, the intriguing factor of the D. ananassae fourth chromosome is its large expansion, to the point where it no longer appears as a dot under a microscope. A sequence analysis of this chromosome, followed by a comparative genomics approach, will shed light on the mechanisms by which such a large expansion occurred. Using Drosophila melanogaster as a reference, the data will be compared on the basis of gene and repeat structure as well as regulatory motifs. This paper will discuss the work done to progress towards finishing fosmid clone 2410F24. Workflow: Initial Assembly: Figure 1 shows the initial assembly of fosmid 2410F24. There are 6 major contigs, along with 7 minor contigs not shown. Contig 7 will prove to be the major problem region, as it contains major tandem and inverted repeat regions. Contigs 6 and 4 both contains regions of the major repeat structure. The left end read is in contig 5, while the right end is in contig 2. Drosophila ananassae fosmid clone 2410F24 1

2 Figure 1: Initial Assembly view for clone 2410F24 Solution attempt #1: In order to assemble this project, the first method attempted was to join the contigs in the assembly if any joins were possible. This did not yield any progress, so the next effort was to tear many of the inconsistent reads, and put them back at their appropriate site in the assembly. To do so, I scanned the left end of the minor contigs, in their high quality regions, ran a search for string, and attempted to make any joins that resulted from the matches to major contigs (and other minor contigs in some cases).

3 Figure 2: Assembly View of post- torn assembly (left), without replacing torn reads. In making a few of the joins between contigs, I created a few high quality discrepancies, which required individual navigation. Most of these discrepancies show groups of reads disagreeing with each other (Fig. 3). These observations lead to the hypothesis of a multiple repeat structure, with sequence divergence over evolutionary time. A simple tandem repeat is unlikely as there is no pattern in the discrepant reads. It was not possible to group them in such a way that one tandem repeat gives no discrepancies. Upon progressing closer to finishing the project, it has become clear that the repeat structure is incredibly complex, with a combination of tandem and inverted repeats. Drosophila ananassae fosmid clone 2410F24 3

4 Figure 3: High Quality Discrepancy Example Solution Attempt #2: After telling phrap not to overlap on many of the high quality discrepancies, miniassembly was run to see what progress was made with the newly built scaffold. Notable features of this scaffold are the inverted sequence matches in contigs 2, 3, and 5, the gap in mate pair density at 11,000 bp in contig 7, and the concentration of inconsistent mate pairs in the same region as the repeat contained in three of the contigs. The next step was to remove these inconsistent reads, in an attempt to place them in the correct position and orientation.

5 Figure 4: Miniassembled structure After pulling out the inconsistent reads, it was possible to close gaps between contigs 3, 4 and 6. Miniassemble was run again, resulting in many of the remaining individual reads coming together into the major contigs. There is clearly a problem area at the left end of contig 12, as well as a large build- up of reads between 17 and 18 kb on contig 18. Drosophila ananassae fosmid clone 2410F24 5

6 Figure 5: Miniassembly with consolidated individual reads.

7 Solution Attempt#3: Inconsistent reads were removed from the assembly again, yielding this assembly. Figure 6: Assembly before anchoring began This is the anchoring method began. Anchoring reads can be described by the following: Swipe about 25bp of sequence and search for that string. If it matched elsewhere in the assembly, comment match found elsewhere in high quality. Any sequence that didn t have a match however, receives a unique comment tag. If a unique region is found, highlight the reads that have high quality data in that region, as shown below, and scroll until those reads were no longer high quality, then search again downstream to repeat the procedure. Figure 7: Anchored reads example When I reached the right end of my left most contig, I used the Consed main window to search for mate pairs that I could add to this contig to extend it. I used only pairs that were anchored in unique sequence, in order to be sure I was not adding incorrect sequences to the contig. I ended up adding 8 reads, which I miniassembled into their own contig first. Drosophila ananassae fosmid clone 2410F24 7

8 This extended my project to the assembly below. I could join contig 21 to contig 24, extending the contig to the image in Figure 8a. Figure 8a: Extended Unique- anchored left end contig. Figure 9: Roadblock in anchoring reads method. At this point another mini- assembly was run to see what phrap could put together, given the new input structure. The result of that miniassembly is shown in Figure 10. There were many possible joins, as suggested by consistent mate pairs. Many of the joins looked like the image in Figure 11, with discrepancies concentrated in low

9 quality regions, the joins were made. Figure 10: New Miniassembly with join candidates (upper). Typical join interface with low quality discrepancies. (lower) After making all of the joins of sufficient quality, the Figure 11 assembly was left below. Due to the large build- up in reads at the right end of my major contig, and the fact that the sum of the size of all of the current contigs is 34 kb, and my project as a whole should be 52 kb (as concluded by the digest sums, including vector), we hypothesized that most of what is currently contig 56 is actually a compression of a very large repeat structure. This may explain the inability to get a good phrap assembly, as well as the many inconsistent mate pairs that have been aligning to this region in the sequence, along with the 18kb shortage of data. Drosophila ananassae fosmid clone 2410F24 9

10 Figure 11: Assembly with possible collapsing of two repeats into one large contig. Solution Attempt #4: Upon formation of this collapsed repeat hypothesis, I removed arbitrary reads from contig 56, such that I can make another completely contiguous sequence, without having the current contig fall apart. In order to do this, I opened a new notepad document in terminal, started at the left edge of my contig, and copied in names of reads which would allow me to extract an entire full contig from the data in my current sequence, as shown below.

11 Figure 12: Extracting an arbitrary contig from the hypothesized compression region. After finishing this procedure I obtained the assembly below. The most notable feature of this assembly is the breadth of the sequence matches, which was to be expected, as the two largest contigs are simply copies of each other. This new assembly had many high quality discrepancies, which I proceeded to deal with individually. Drosophila ananassae fosmid clone 2410F24 11

12 Figure 13: Post- extracted repeat assembly (above), and library of high quality discrepancies (below). Solution Attempt #5: Next, I navigated through the high quality discrepancies found in the project. I cleaned the assembly up as much as possible, then pulled in reads whose mate pairs are not currently in my project. I used the NCBI database, searched for reads whose mate pairs were not in my project, and received 52 hits. I added these reads to the assembly. My next task was to navigate through the new discrepancies that resulted from adding the new reads. I first miniassembled only the new reads, to make putting them into place more efficient. This resulted in the assembly pictured below. I

13 navigated many discrepancies that looked like the on below, with one or two reads (mostly new) discrepant from a group of the old ones. I used the do not overlap command and removed the new reads. Figure 14: Assembly with new reads (above), HQD navigation (below). Solution Attempt #6: Over the weekend professional finishers worked on the project. I reviewed the.ace files in order to get a clearer understanding of how they proceeded. Starting from the assembly in Figure 12, they made a join between three of the smaller contigs, then joined contigs 49 and the group of three just made, to make contig 83, giving the assembly below. There are clearly more sequence matches in the post- addition assemblies than in the previous, suggesting that either we have added good data to help elucidate the compressed structure, or we have torn apart a good sequence Drosophila ananassae fosmid clone 2410F24 13

14 into more copies than we should have. Figure 15: Assembly view of professional finisher s first major joins. The next assembly is the result of a series of large contig joins made by starting at the left end read and searching for sequence matches throughout the contig. This resulted in creation of a 20kb left end contig with a few high quality discrepancies and a few inconsistent mate pairs. Cleary quite a bit of progress has been made. There are still quite a few single- read contigs that are not shown in this assembly.

15 Figure 16: Professional finisher s assembly after building on the left end contig. Drosophila ananassae fosmid clone 2410F24 15

16 The next assembly shows some very clear progress, with joins having been made between many of the major contigs present in the previous assembly. A notable feature in this assembly is that the largest, main contig is separated from both of the end contigs by gaps. This is because the fosmid end reads had quite a few high quality discrepancies among the danafos reads in the main contig. The danafos reads themselves contained quite a few discrepancies between them and the other reads. This is likely due to their matching in my project from other copies of the repeat elsewhere in the genome. Figure 17: Nearly contiguous assembly

17 Finally, the ends were rejoined to the major contig. In the process, another large contig was created. The figure below shows her final assembly, which I took to work on over the weekend. I attempted to close the gap between 202 and 203, and 203 and 181. I could not find a way to do so without adding a large number of high quality discrepancies. I will be treating this assembly as my final assembly and base my conclusions based on the sequence provided. Figure 18: Final Assembly Drosophila ananassae fosmid clone 2410F24 17

18 Conclusions: The digests show that clearly the sequence is not assembled correctly. The EcoRV digest shows a band both larger and smaller than the in silico digests. There are bands that are the correct size, all of which result from contig 181. Figure 19: EcoRV digests

19 The HindIII digests look somewhat better, and they also allow me to estimate the size of the gap. In the real digests, there is a 1336 bp fragment. The in silico digests show a 1020 bp fragment in contig 181. This suggests the size of the gap is around 300 bp. Figure 20: HindIII digests Concluding Remarks: Drosophila ananassae fosmid clone 2410F24 19

20 Several obstacles kept me from finishing the fosmid. In retrospect, the repeat structure in the project made my initial attempts seem futile. I could not prime a reaction to close the gap because each side of the gap is flanked by multiple repeat series that match in multiple other places in the fosmid. Earlier in the project, I ran Autofinish to see if it would offer any insights. The results were not at all helpful, as the primers suggested not only matched elsewhere in the genome, but were composed mostly composed of AT regions. I decided not to call any of these reactions, as they were unlikely to yield good results. The figure below shows the final repeat structure of the fosmid. There are a series of tandem and inverted repeats, spread throughout regions of unique sequence. Figure 21: Full assembly with sequence matches In conclusion, I was unable to properly finish the 2410F24 fosmid due to the repeat structure throughout it. I was unable to call any reactions, nor was I able to run a BLAST search to search for contamination.

21 APPENDIX: After this report was due and submitted, I continued to work on finishing my assembly. I made one join between my two largest contigs. Giving me this assembly. Figure 22: Final Assembly (above) with highlighted gap flanked by repeats (below). Drosophila ananassae fosmid clone 2410F24 21

22 The digests are modified just a little by this join, and now appear as follows: These two digests were the most consistent of the digests. EcoRV on the left, and HindIII on the right. The summation of my in silico bands is still short of the stated size (44000 versus 48000). Clearly quite a bit of progress still needs to be made, but this assembly seems significantly closer to a finished product. In walking through the final checklist; there are some notable features. There are quite a few single- strand, single- chemistry regions but they are concentrated in the

23 single read contigs, as shown in Figure 23. Many regions of low quality that dictate the consensus are present, including long stretches on either side of the gap. There is one homopolymer run of 15 T s, however, it is in a very low quality region. There are also many high quality discrepancies till remaining in the assembly. Acknowledgments: I would very much like to thank Lee Trani and Sara Kohlberg for all the help they offered in assembling this project. I would also like to thank Wilson Leung, Dr. Shaffer and Dr. Elgin for making this class, and this research possible. Figure 23: Navigation window of low depth coverage regions (< 3 reads) Drosophila ananassae fosmid clone 2410F24 23

24 Figure 24: Low quality consensus region Figure 25: Homopolymer run of T s in low quality data

25 Figure 26: High quality discrepancy navigation window Drosophila ananassae fosmid clone 2410F24 25

Finishing Drosophila Ananassae Fosmid 2728G16

Finishing Drosophila Ananassae Fosmid 2728G16 Finishing Drosophila Ananassae Fosmid 2728G16 Kyle Jung March 8, 2013 Bio434W Professor Elgin Page 1 Abstract For my finishing project, I chose to finish fosmid 2728G16. This fosmid carries a segment of

More information

Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome

Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Ruth Howe Bio 434W 27 February 2010 Abstract The fourth or dot chromosome of Drosophila species is composed primarily of highly condensed,

More information

The goal of this project was to prepare the DEUG contig which covers the

The goal of this project was to prepare the DEUG contig which covers the Prakash 1 Jaya Prakash Dr. Elgin, Dr. Shaffer Biology 434W 10 February 2017 Finishing of DEUG4927010 Abstract The goal of this project was to prepare the DEUG4927010 contig which covers the terminal 99,279

More information

Finishing of DELE Drosophila elegans has been sequenced using Roche 454 pyrosequencing and Illumina

Finishing of DELE Drosophila elegans has been sequenced using Roche 454 pyrosequencing and Illumina Sarah Swiezy Dr. Elgin, Dr. Shaffer Bio 434W 27 February 2015 Finishing of DELE8596009 Abstract Drosophila elegans has been sequenced using Roche 454 pyrosequencing and Illumina technology. DELE8596009,

More information

Finishing of DFIC This project sought to finish DFIC , the terminal 45 kb of the Drosophila

Finishing of DFIC This project sought to finish DFIC , the terminal 45 kb of the Drosophila Lin 1 Kevin Lin Bio 434W Dr. Elgin 26 February 2016 Finishing of DFIC6622001 Abstract This project sought to finish DFIC6622001, the terminal 45 kb of the Drosophila ficusphila dot chromosome. The initial

More information

Finished (Almost) Sequence of Drosophila littoralis Chromosome 4 Fosmid Clone XAAA73. Seth Bloom Biology 4342 March 7, 2004

Finished (Almost) Sequence of Drosophila littoralis Chromosome 4 Fosmid Clone XAAA73. Seth Bloom Biology 4342 March 7, 2004 Finished (Almost) Sequence of Drosophila littoralis Chromosome 4 Fosmid Clone XAAA73 Seth Bloom Biology 4342 March 7, 2004 Summary: I successfully sequenced Drosophila littoralis fosmid clone XAAA73. The

More information

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome

More information

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang Ruth Howe Bio 434W April 1, 2010 INTRODUCTION De novo annotation is the process by which a finished genomic sequence is searched for

More information

Connect-A-Contig Paper version

Connect-A-Contig Paper version Teacher Guide Connect-A-Contig Paper version Abstract Students align pieces of paper DNA strips based on the distance between markers to generate a DNA consensus sequence. The activity helps students see

More information

CSE182-L16. LW statistics/assembly

CSE182-L16. LW statistics/assembly CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

Sequence assembly. Jose Blanca COMAV institute

Sequence assembly. Jose Blanca COMAV institute Sequence assembly Jose Blanca COMAV institute Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

1. A brief overview of sequencing biochemistry

1. A brief overview of sequencing biochemistry Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

High throughput omics and BIOINFORMATICS

High throughput omics and BIOINFORMATICS High throughput omics and BIOINFORMATICS Giuseppe D'Auria Seville, February 2009 Genomes from isolated bacteria $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ se q se uen q c se uen ing q c se uen ing qu c en ing c

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

7 Gene Isolation and Analysis of Multiple

7 Gene Isolation and Analysis of Multiple Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

De novo Genome Assembly

De novo Genome Assembly De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece

More information

Tutorial. In Silico Cloning. Sample to Insight. March 31, 2016

Tutorial. In Silico Cloning. Sample to Insight. March 31, 2016 In Silico Cloning March 31, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 In Silico Cloning

More information

Each cell of a living organism contains chromosomes

Each cell of a living organism contains chromosomes COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department

More information

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas ORTHOMINE - A dataset of Drosophila core promoters and its analysis Sumit Middha Advisor: Dr. Peter Cherbas Introduction Challenges and Motivation D melanogaster Promoter Dataset Expanding promoter sequences

More information


MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome

More information


BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain

More information


601 CTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGAGAGACCACATGGTCCTT GACAGGTGTGTTAGACGGGAAAGCTTTCTAGGGTTGCTTTTCTCTCTGGTGTACCAGGAA >>>>>>>>>>>>>>>>>> BIO450 Primer Design Tutorial The most critical step in your PCR experiment will be designing your oligonucleotide primers. Poor primers could result in little or even no PCR product. Alternatively, they

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

The first generation DNA Sequencing

The first generation DNA Sequencing The first generation DNA Sequencing Slides 3 17 are modified from slides 18 43 are from Chengxiang Zhai at UIUC. The strand direction

More information

Restriction Site Mapping:

Restriction Site Mapping: Restriction Site Mapping: In making genomic library the DNA is cut with rare cutting enzymes and large fragments of the size of 100,000 to 1000, 000bp. They are ligated to vectors such as Pacmid or YAC

More information

Contigs Built with Fingerprints, Markers, and FPC V4.7

Contigs Built with Fingerprints, Markers, and FPC V4.7 Methods Contigs Built with Fingerprints, Markers, and FPC V4.7 Carol Soderlund, 1,3 Sean Humphray, 2 Andrew Dunham, 2 and Lisa French 2 1 Clemson University Genomic Institute, Clemson, South Carolina 29634-5808,

More information

DNA vs. RNA DNA: deoxyribonucleic acid (double stranded) RNA: ribonucleic acid (single stranded) Both found in most bacterial and eukaryotic cells RNA

DNA vs. RNA DNA: deoxyribonucleic acid (double stranded) RNA: ribonucleic acid (single stranded) Both found in most bacterial and eukaryotic cells RNA DNA Replication DNA vs. RNA DNA: deoxyribonucleic acid (double stranded) RNA: ribonucleic acid (single stranded) Both found in most bacterial and eukaryotic cells RNA molecule can assume different structures

More information

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 twitter: @assemblathon web: Should N50 die in its role as a frequently used measure of genome assembly quality? Are there other

More information

GEP Project Management System: Order Finishing Reactions

GEP Project Management System: Order Finishing Reactions GEP Project Management System: Order Finishing Reactions Author Wilson Leung Document History Initial Draft 06/04/2007 First Revision 01/10/2009 Second Revision 01/07/2012 Third Revision

More information

The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience

The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience Building Excellence in Genomics and Computational Bioscience Wheat genome sequencing: an update from TGAC Sequencing Technology Development now Plant & Microbial Genomics Group Leader Matthew Clark

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly

More information

Recombinants and Transformation

Recombinants and Transformation Jesse Ruben Partner Roman Verner BMB 442 Recombinants and Transformation Introduction The goal of this experiment was to take two antibiotic resistance genes for ampicillin and kanamycin from plasmids

More information

Lecture Four. Molecular Approaches I: Nucleic Acids

Lecture Four. Molecular Approaches I: Nucleic Acids Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single

More information


COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Bionano Access 1.1 Software User Guide

Bionano Access 1.1 Software User Guide Bionano Access 1.1 Software User Guide Document Number: 30142 Document Revision: B For Research Use Only. Not for use in diagnostic procedures. Copyright 2017 Bionano Genomics, Inc. All Rights Reserved.

More information

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan

More information


PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5 Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate

More information

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!! 1 Whole Genome Shotgun Sequencing 2 New Technologies Revolutionize

More information

Protocols for cloning SEC-based repair templates using SapTrap assembly

Protocols for cloning SEC-based repair templates using SapTrap assembly Protocols for cloning SEC-based repair templates using SapTrap assembly Written by Dan Dickinson ( and last updated July 2016. Overview SapTrap (Schwartz and Jorgensen, 2016) is a

More information

Genetic Transformation of Drosophila with Transposable Element Vectors SCIENCE, VOL. 218, 22 OCTOBER 1982

Genetic Transformation of Drosophila with Transposable Element Vectors SCIENCE, VOL. 218, 22 OCTOBER 1982 Genetic Transformation of Drosophila with Transposable Element Vectors SCIENCE, VOL. 218, 22 OCTOBER 1982 Transposition of Cloned P Elements into Drosophila Germ Line Chromosomes SCIENCE, VOL. 218, 22

More information

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

Sequence Annotation & Designing Gene-specific qpcr Primers (computational) James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under

More information

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology Bioinformatics Support of Genome Sequencing Projects Seminar in biology Introduction The Big Picture Biology reminder Enzyme for DNA manipulation DNA cloning DNA mapping Sequencing genomes Alignment of

More information

GenBuilder TM Plus Cloning Kit User Manual

GenBuilder TM Plus Cloning Kit User Manual GenBuilder TM Plus Cloning Kit User Manual L00744 Version 11242017 Ⅰ. Introduction... 2 I.1 Product Information... 2 I.2 Kit Contents and Storage... 2 I.3 GenBuilder Cloning Kit Workflow... 2 Ⅱ.

More information

Protocol for cloning SEC-based repair templates using Gibson assembly and ccdb negative selection

Protocol for cloning SEC-based repair templates using Gibson assembly and ccdb negative selection Protocol for cloning SEC-based repair templates using Gibson assembly and ccdb negative selection Written by Dan Dickinson ( and last updated January 2018. A version

More information

Cold Fusion Cloning Kit. Cat. #s MC100A-1, MC101A-1. User Manual

Cold Fusion Cloning Kit. Cat. #s MC100A-1, MC101A-1. User Manual Fusion Cloning technology Cold Fusion Cloning Kit Store the master mixture and positive controls at -20 C Store the competent cells at -80 C. (ver. 120909) A limited-use label license covers this product.

More information

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly Céline CHANTRY-DARMON 1 CNRGV The French Plant Genomic Center Created

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information


BIOINFORMATICS IN BIOCHEMISTRY BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses on the analysis of molecular sequences (DNA, RNA, and

More information

CloG: a pipeline for closing gaps in a draft assembly using short reads

CloG: a pipeline for closing gaps in a draft assembly using short reads CloG: a pipeline for closing gaps in a draft assembly using short reads Xing Yang, Daniel Medvin, Giri Narasimhan Bioinformatics Research Group (BioRG) School of Computing and Information Sciences Miami,

More information

BS 50 Genetics and Genomics Week of Nov 29

BS 50 Genetics and Genomics Week of Nov 29 BS 50 Genetics and Genomics Week of Nov 29 Additional Practice Problems for Section Problem 1. A linear piece of DNA is digested with restriction enzymes EcoRI and HinDIII, and the products are separated

More information

KEY CONCEPTS AND PROCESS SKILLS. 1. Blood types can be used as evidence about identity and about family relationships.

KEY CONCEPTS AND PROCESS SKILLS. 1. Blood types can be used as evidence about identity and about family relationships. Evidence from DNA 40- to 1 2 50-minute sessions 69 M O D E L I N G ACTIVITY OVERVIEW SUMMARY Students learn how DNA fingerprinting is done by performing a simulation of the process used to generate different

More information

The replication of DNA Kornberg 1957 Meselson and Stahl 1958 Cairns 1963 Okazaki 1968 DNA Replication The driving force for DNA synthesis. The addition of a nucleotide to a growing polynucleotide

More information

Introduction to Bioinformatics. Genome sequencing & assembly

Introduction to Bioinformatics. Genome sequencing & assembly Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

Genetics Lecture 21 Recombinant DNA

Genetics Lecture 21 Recombinant DNA Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of

More information

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018 Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson

More information

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing Functional Genomics Research Stream Research Meetings: November 2 & 3, 2009 Next Generation Sequencing Current Issues Research Meetings: Meet with me this Thursday or Friday. (bring laboratory notebook

More information

Further Reading - DNA

Further Reading - DNA Further Reading - DNA DNA BACKGROUND What is DNA? DNA (short for deoxyribonucleic acid ) is a complex molecule found in the cells of all living things. The blueprint for life, DNA contains all the information

More information

Barnacle: an assembly algorithm for clone-based sequences of whole genomes

Barnacle: an assembly algorithm for clone-based sequences of whole genomes Gene 320 (2003) 165 176 Barnacle: an assembly algorithm for clone-based sequences of whole genomes Vicky Choi*, Martin Farach-Colton Department of Computer Science, Rutgers

More information

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer: Sequence Variations Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms NCBI SNP Primer: Overview Mutation and Alleles Linkage Genetic variation

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Glomus intraradices: Initial Whole-Genome Shotgun Sequencing and Assembly Results Permalink

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

Group Members: Lab Station: BIOTECHNOLOGY: Gel Electrophoresis

Group Members: Lab Station: BIOTECHNOLOGY: Gel Electrophoresis BIOTECHNOLOGY: Gel Electrophoresis Group Members: Lab Station: Restriction Enzyme Analysis Standard: AP Big Idea #3, SB2 How can we use genetic information to identify and profile individuals? Lab Specific

More information

Your name: BSCI410-LIU/Spring 2007 Homework #2 Due March 27 (Tu), 07

Your name: BSCI410-LIU/Spring 2007 Homework #2 Due March 27 (Tu), 07 BSCI410-LIU/Spring 2007 Homework #2 Due March 27 (Tu), 07 KEY 1. What are each of the following molecular markers? (Indicate (a) what they stand for; (b) the nature of the molecular polymorphism and (c)

More information

About Strand NGS. Strand Genomics, Inc All rights reserved.

About Strand NGS. Strand Genomics, Inc All rights reserved. About Strand NGS Strand NGS-formerly known as Avadis NGS, is an integrated platform that provides analysis, management and visualization tools for next-generation sequencing data. It supports extensive

More information

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? 1 Linkage & Recombination HUH? What? Why? Who cares? How? Multiple choice question. Each colored line represents

More information

Intro. ANN & Fuzzy Systems. Lecture 36 GENETIC ALGORITHM (1)

Intro. ANN & Fuzzy Systems. Lecture 36 GENETIC ALGORITHM (1) Lecture 36 GENETIC ALGORITHM (1) Outline What is a Genetic Algorithm? An Example Components of a Genetic Algorithm Representation of gene Selection Criteria Reproduction Rules Cross-over Mutation Potential

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes e_software_prod_sca ffold_download.html Follow installation instructions

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

3 Designing Primers for Site-Directed Mutagenesis

3 Designing Primers for Site-Directed Mutagenesis 3 Designing Primers for Site-Directed Mutagenesis 3.1 Learning Objectives During the next two labs you will learn the basics of site-directed mutagenesis: you will design primers for the mutants you designed

More information

Chapter 10 (Part I) Gene Isolation and Manipulation

Chapter 10 (Part I) Gene Isolation and Manipulation Biology 234 J. G. Doheny Chapter 10 (Part I) Gene Isolation and Manipulation Practice Questions: Answer the following questions with one or two sentences. 1. From which types of organisms were most restriction

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path

Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path Methods Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path Friedrich W. Engler, James Hatfield, William Nelson, and Carol A. Soderlund 1 Arizona Genomics Computational Laboratory, University

More information

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 1 short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 2 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:

More information

Answer: Sequence overlap is required to align the sequenced segments relative to each other.

Answer: Sequence overlap is required to align the sequenced segments relative to each other. 14 Genomes and Genomics WORKING WITH THE FIGURES 1. Based on Figure 14-2, why must the DNA fragments sequenced overlap in order to obtain a genome sequence? Answer: Sequence overlap is required to align

More information

4/26/2015. Cut DNA either: Cut DNA either:

4/26/2015. Cut DNA either: Cut DNA either: Ch.20 Enzymes that cut DNA at specific sequences (restriction sites) resulting in segments of DNA (restriction fragments) Typically 4-8 bp in length & often palindromic Isolated from bacteria (Hundreds

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle Figure S1 a Unrearranged locus Rearranged locus Concordant read pairs Region1 Concordant read pairs Cluster of discordant read pairs, bundle Region2 Concordant read pairs b Physical coverage 5 4 3 2 1

More information

Chapter 6 - Molecular Genetic Techniques

Chapter 6 - Molecular Genetic Techniques Chapter 6 - Molecular Genetic Techniques Two objects of molecular & genetic technologies For analysis For generation Molecular genetic technologies! For analysis DNA gel electrophoresis Southern blotting

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Supplementary Information

Supplementary Information Supplementary Information Deletion of the B-B and C-C regions of inverted terminal repeats reduces raav productivity but increases transgene expression Qingzhang Zhou 1, Wenhong Tian 2, Chunguo Liu 3,

More information