BIOINFORMATICS AN OVERVIEW

Size: px
Start display at page:

Download "BIOINFORMATICS AN OVERVIEW"

Transcription

1 BIOINFORMATICS AN OVERVIEW T.R. Sharma Genoinformatics Lab, National Research Centre on Plant Biotechnology I.A.R.I, New Delhi Introduction Bioinformatics is the computational analysis of biological data, consisting of the information stored in the form of DNA and protein sequences in various biological databases. The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics which assess relationships among members of large data sets, the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information." Analyses in bioinformatics focus on three types of datasets: genome sequences, macromolecular structures, and functional genomics experiments (e.g. microarray data). However, bioinformatics tools are also applied to various other data, e.g. phylogenetic and metabolic pathway analysis, the text of scientific papers, and plant varietal information and statistics. Analysis of biological data requires application of large number of techniques like primary sequence alignment, protein 3D structure alignment, phylogenetic tree construction, prediction and classification of protein structure, prediction of RNA structure, prediction of protein function, and expression data clustering. Development of suitable algorithms is an important part of bioinformatics. The techniques and algorithms were specifically developed for the analysis of biological data, for instance, the dynamic programming algorithm for sequence alignment is one of the most popular programmes among the biologists. The sequence information generated worldwide is stored systematically in different types of databases. Hence, it is necessary to understand about the databases and their different types. What is a database? A database is a collection of information stored in a computer in a systematic way, such that a computer program can consult it to answer questions. A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence.

2 Divisions of DNA databases Since the size of databases is growing rapidly, these have been further broken into divisions on the basis of the taxonomy of the organisms. The GenBank divisions are divided into two general categories like, organismal and functional. The sequences derived from specific organisms are stored in the organismal category. Whereas the functional category include databases which are independent of their taxonomic classification e.g. EST, STS and HTG etc. Respective Genbank divisions store sequence records of different organism which is identified from three letter codes indicated in the beginning of each sequence entry. For instance, HTG (high throughput genome) division contained sequences generated from different organisms. These sequences are generally unfinished and are further classified as Phase1(sequences which are unfinished, unordered and contained gaps) and Phase 2 (sequences which unfinished, ordered and contained a few gaps). Once sequences are finished and all gaps are resolved (Phase 3) it moved to a specific division e.g. PLN in case of plants. The huge wealth of information in the form of DNA and protein sequences and publications on molecular biology are stored in the data banks (Fig.1). Major public data banks which takes care of the DNA and protein sequences are GenBank in USA ( EMBL (European Molecular Biology Laboratory) in Europe ( and DDBJ (DNA Data Bank) in Japan ( The growth of DNA sequence data in GenBank is depicted in Fig. 2. This rapid growth in DNA sequence data is because of the fact that various Collaborative International Programmes have started during the past few years to sequence complete genomes of various organisms. The whole genomes of various microorganisms have already been sequenced by The Institute of Genome Research (TIGR) which can be seen on their website The large genomes like Human (3 billion bp) Rice (450 Mb bp), Arabidopsis (130Mb bp) and Mouse (2.5 billion bp) have also been sequenced and the data is in public domain in GenBank. Now these DNA sequences have to be used in meaningful ways for the welfare of mankind. Different types of sequences of important crops available in public domain are listed in Table1. Fig.1. Status of Sequences submitted in the GenBank (Source: NCBI) VI-78

3 Table1. Different types of sequences of important crops available in public domain* Type of database in public domain Plant species Whole genome Oryza sativa, Arabidopsis thaliana Partial genome EST mrna Protein BAC end Source: NCBI T. aestivum, Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, L. esculentum, V. vinifera, Poncirus trifoliate, Medicago truncatula, Lotus corniculatus Aegilops tauschii, Allium cepa, Arabidopsis thaliana, Avena sativa, Beta vulgaris subsp. vulgaris, Brassica napus, Brassica oleracea, Brassica rapa, Capsicum annuum, Coffea arabica, Glycine max, Gossypium arboreum, Gossypium hirsutum, Helianthus annuus, Hordeum vulgare, Lactuca sativa, Lolium perenne, Lotus corniculatus, Lycopersicon esculentum, Malus domestica, Medicago sativa, Medicago truncatula, Nicotiana benthamiana, Nicotiana tabacum, Oryza sativa, Phaseolus coccineus, Phaseolus vulgaris, Saccharum officinarum, Secale cereale, Solanum melongena, Solanum tuberosum, Sorghum bicolor, Triticum monococcum, Vitis vinifera, Zea mays T. aestivum, Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, L. esculentum, V. vinifera, Medicgo truncatula, L. corniculatus, O. sativa, A. thaliana Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, V. vinifera, C. sinensis, M. truncatula, E. globulus, O. sativa, A. thaliana Oryza australiensis, O. brachyantha, O. glaberrima, O. granulata, O. latifolia, O. minuta, O. officinalis, O. punctata, O. ridleyi, O. rufipogon, O. schlechteri, G. hirsutum Divisions of Protein databases Protein sequences are mainly stored in two databases EMBL and GenBank. Swiss-Prot which is a very well maintained and curetted database was established at the Swiss Institute of Bioinformatics. Though it is a small database, it has important annotations which are freely available to the academic users. GenBank created PIR a protein database as a translation of the Genbank. PIR database is further subdivided into four sections like PIR1, PIR2, PIR 3 and PIR4 on the bases of degree of annotation. DNA Sequence Analysis Bioinformatics tools are now easily available to the biologists with the advent of internet and various Web Browsers on World Wide Web. These tools are indispensable for any Genome Sequencing Centres. The analysis of DNA sequences started once these are out of the sequencing machines. The first and foremost task of a biologist is to look for the accuracy of sequence he got from the machine. One way is to go for finding cloning sites of inserts in the sequencing vector. If the insert is a PCR product then one should look for the primer sequences used in the amplification of that product. Then one can perform Basic Local alignment Search Tool (BLAST) search against the DNA sequence database in the GenBank and see the probable matches. If the unknown sequences shows hits with any sequence of the same or related organisms then it is considered as a true sequence. These are the basic steps, VI-79

4 which can be performed manually if the dataset is very small or if one has to deal with single or a few sequences. However, in large genome sequencing projects one has to handle thousands of sequences at a given time. Searching for Sequence Alignment Once high quality sequence is obtained once has to ask an important question whether this is a new sequence or the sequence similar to other DNA sequences available in the databases. For getting answer of this question, on has to perform database search for sequence comparisons. All sequence searching methods rely on the basic concepts of alignment and distance between the sequences and pair wise sequence alignment is performed. There are different algorithms to perform global and local alignments (Fig.2). In global alignment, complete alignment of the input sequence is performed with sequences available in the databases. Whereas in local alignment, most similar segments of the input sequence are aligned with the database sequences. Sequence comparison (DNA/protein) against database is one of the very important and powerful tools of bioinformatics. This type of sequence comparison is generally performed with two programmes BLAST and FASTA, which compares unknown sequence against a sequence database. In BLAST best local alignments between the unknown sequences and the database is found by using an approach based on matching short sequence fragments and a powerful statistical model. Whereas a method of approximation is used in FASTA which try to concentrate only on significant alignments. In BLAST search output, Expected (E) values and Bit scores are mentioned to determine the significant match of unknown sequences with that of sequences available in the database (Fig.3). The significance of a BLAST hit is very important for the interpretation of results. Generally 67% identity at DNA level shows 100% identity in protein level. It is also suggested that at least 75% sequence identity between two sequences should be observed for considering it as a significant hit. Fig.2. Global and local alignments between two DNA sequences VI-80

5 Fig.3. BLAST output showing Bit score and E values after similarity search Gene Prediction and Annotation Simply determining four alphabets (ATGC) of DNA sequences of any organism has no value until some meaning is derived from this by gene prediction. Gene prediction is complex work and there is no algorithm which can exactly predict the true exons in a DNA sequence. Basically two major considerations are taken into consideration while predicting a gene. 1) identification of structural elements such a start/ stop codon and splice sites of the unknown sequence and 2) performing homology search against protein, EST and cdna database to identify potential coding regions. For gene prediction, very commonly used software GENSCAN developed by MIT, USA ( which is freely available on Web and online analysis of DNA sequences, can be performed. The output obtained from the GENSCAN is then used for gene annotation by using BLAST to search the public or private DNA sequence databases to find out the matches to the unknown query sequence with millions of sequences available in the Gen Bank. A very popular Website is available for BLAST at NCBI`s Home page which performs searches by using various criteria and options (Fig.4). VI-81

6 Fig. 4. Performing BLAST search at NCBI Home page Primer Design Another important aspects in the use of genome sequence data after predicting genes are to design primers either for PCR or for sequencing. Such primers are used for the amplification of genes or its alleles from the known sources and making best use out of it. Though PRIME software within GCG package is mainly used for this purpose, PRIMER3- a web based software (www-genoem.wi.mit.edu /genome_software/other /primer3.html) is being commonly used for designing primers. PCR Primer pairs are designed to amplify a welldefined target sequences from the template. Some of the important considerations while designing primers are, the GC content, melting temperature, primer size, and size of the PCR product to be amplified. These parameters can be used either as default setting or one can change them as per their requirement. Phylogenetic Analysis Once similarity search is performed between unknown sequence and the database sequence to find per cent homology between them, it is obvious to know how these sequences are related to each other. The sequences derived from two closely related organisms shows more similarity at DNA level and distantly related organisms shows more dissimilarity at the sequence level. To find an evolutionary relationship among sequences derived from different organisms, a phylogenetic tree is constructed (Fig.5). Such evolutionary tree can also be constructed on the basis of phenotypic markers, molecular markers or sequence information. A typical phylogentic tree is comprised of nodes, branches and termini of the branches. When VI-82

7 all the branches are emerged from a common node it is termed as the root of a tree. Though some trees are constructed as un-rooted tree where common evolutionary point is not known. For constructing a phylogenetic tree the PILEUP option of GCG package is more commonly used. Besides, DNA STAR software ( also have options to construct tree from different DNA or protein sequences. However, web based tools like MacClade (//www. phylogeny.arizona.edu/macclade/) can also be used for evolutionary studies of different organisms based on their DNA sequences. Similarly, bioinformatics tools can be used for protein function analysis by database search. Finding SSR markers and SNP markers from the EST or genome sequences can be performed in silico by using different algorithms which will also be discussed in the presentation. Fig. 5. Phylogenetic analysis of resistance gene analogue sequences (sk21,sk95, sk10, sk3, sk76, sk101 and sk65) obtained from rice and known Resistance gene sequences (L6, M, N,RPS2 and Xa1) isolated from different crops. Analysis was performed with DNASTAR software. Conclusions In functional genomics, investigation of gene expression at whole genome levels under different stresses can be studied by using microarryas. Now-a-day this type of gene expression databases are being prepared in different organisms and even at different tissues. Bioinformatics tools are helpful in locating DNA sequences in the GenBank simply by putting accession numbers, making alignments of two or more than two sequences, performing similarity searches for unknown sequences in the GenBank, assembling short sequence reads and developing consensus sequences, finding genes and markers in silico and in performing comparative analysis of different genomes. Selected References and Web Resources Sobral, B.W.S Common language of bioinformatics. Nature. 389:418. Brown, S.M Bioinformatic: A Biologist`s Guide to Biocomputing and the Internet. Eton Publishing, Natick. MA, USA. Baxevanis, A.D. and Ouellette B.F.F Bioinformatics- A Practical Guide to the Analysis of Genes and Proteins. Second Edition. A John Wiley and Sons, Inc., Publication, NY. GENSCAN : FGENESH : VI-83

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Types of Databases - By Scope

Types of Databases - By Scope Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

I nternet Resources for Bioinformatics Data and Tools

I nternet Resources for Bioinformatics Data and Tools ~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

Worksheet for Bioinformatics

Worksheet for Bioinformatics Worksheet for Bioinformatics ACTIVITY: Learn to use biological databases and sequence analysis tools Exercise 1 Biological Databases Objective: To use public biological databases to search for latest research

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

B I O I N F O R M A T I C S

B I O I N F O R M A T I C S B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What

More information

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

Engineering Genetic Circuits

Engineering Genetic Circuits Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art

More information

Student Learning Outcomes (SLOS)

Student Learning Outcomes (SLOS) Student Learning Outcomes (SLOS) KNOWLEDGE AND LEARNING SKILLS USE OF KNOWLEDGE AND LEARNING SKILLS - how to use Annhyb to save and manage sequences - how to use BLAST to compare sequences - how to get

More information

Overview of Health Informatics. ITI BMI-Dept

Overview of Health Informatics. ITI BMI-Dept Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

Introduction to Molecular Biology

Introduction to Molecular Biology Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve

More information

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

Integration of data management and analysis for genome research

Integration of data management and analysis for genome research Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

Bioinformatics, in general, deals with the following important biological data:

Bioinformatics, in general, deals with the following important biological data: Pocket K No. 23 Bioinformatics for Plant Biotechnology Introduction As of July 30, 2006, scientists around the world are pursuing a total of 2,126 genome projects. There are 405 published complete genomes,

More information

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer

More information

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing

More information

Transcriptome analysis in the post-genomic era

Transcriptome analysis in the post-genomic era Transcriptome analysis in the post-genomic era Faccioli P., Ciceri G.P., Provero P., Stanca A.M., Morcia C., Terzi V. in Molina-Cano J.L. (ed.), Christou P. (ed.), Graner A. (ed.), Hammer K. (ed.), Jouve

More information

Sequencing the Human Genome

Sequencing the Human Genome The Biotechnology 339 EDVO-Kit # Sequencing the Human Genome Experiment Objective: In this experiment, DNA sequences obtained from automated sequencers will be submitted to Data bank searches using the

More information

European Commission Joint Research Centre Institute for Health and Consumer Protection

European Commission Joint Research Centre Institute for Health and Consumer Protection Extraction of DNA from Choline Chloride Feed Additive (CC) and from derived Pre-Mixes (PMCC) and Screening of CC and PMCC for (a) presence of rice and (b) presence of BT63 2014 Maria Grazia Sacco Francesco

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA

More information

Serial Analysis of Gene Expression

Serial Analysis of Gene Expression Serial Analysis of Gene Expression Cloning of Tissue-Specific Genes Using SAGE and a Novel Computational Substraction Approach. Genomic (2001) Hung-Jui Shih Outline of Presentation SAGE EST Article TPE

More information

RNA-seq Data Analysis

RNA-seq Data Analysis Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

7 Gene Isolation and Analysis of Multiple

7 Gene Isolation and Analysis of Multiple Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 2 Genes, Genomes, and Mendel Nicholas Wheeler & David Harry

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan

More information

CHAPTER 14 Genetics and Propagation

CHAPTER 14 Genetics and Propagation CHAPTER 14 Genetics and Propagation BASIC GENETIC CONCEPTS IN PLANT SCIENCE The plants we cultivate for our survival and pleasure all originated from wild plants. However, most of our domesticated plants

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

O C. 5 th C. 3 rd C. the national health museum

O C. 5 th C. 3 rd C. the national health museum Elements of Molecular Biology Cells Cells is a basic unit of all living organisms. It stores all information to replicate itself Nucleus, chromosomes, genes, All living things are made of cells Prokaryote,

More information

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007 Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases

More information

Eukaryotic Gene Prediction. Wei Zhu May 2007

Eukaryotic Gene Prediction. Wei Zhu May 2007 Eukaryotic Gene Prediction Wei Zhu May 2007 In nature, nothing is perfect... - Alice Walker Gene Structure What is Gene Prediction? Gene prediction is the problem of parsing a sequence into nonoverlapping

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)

More information

Bio 101 Sample questions: Chapter 10

Bio 101 Sample questions: Chapter 10 Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information

More information

Expressed Sequence Tags: Clustering and Applications

Expressed Sequence Tags: Clustering and Applications 12 Expressed Sequence Tags: Clustering and Applications Anantharaman Kalyanaraman Iowa State University Srinivas Aluru Iowa State University 12.1 Introduction... 12-1 12.2 Sequencing ESTs... 12-2 12.3

More information

DNA sequencing. Course Info

DNA sequencing. Course Info DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu

More information

Theory and Application of Multiple Sequence Alignments

Theory and Application of Multiple Sequence Alignments Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)

More information

CONSERVATION TILLAGE TRENDS IN VIRGINIA AGRICULTURAL PRODUCTION. Research and Extension Center, Painter, VA

CONSERVATION TILLAGE TRENDS IN VIRGINIA AGRICULTURAL PRODUCTION. Research and Extension Center, Painter, VA 2 CONSERVATION TILLAGE TRENDS IN VIRGINIA AGRICULTURAL PRODUCTION Mark S. Reiter 1 * 1 Department of Crop and Soil Environmental Sciences, Virginia Tech Eastern Shore Agricultural Research and Extension

More information

Access to Information from Molecular Biology and Genome Research

Access to Information from Molecular Biology and Genome Research Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is

More information

RNA Sequencing Analyses & Mapping Uncertainty

RNA Sequencing Analyses & Mapping Uncertainty RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly

More information

Chapter 20: Biotechnology

Chapter 20: Biotechnology Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter

More information

A legume genomics resource: The Chickpea Root Expressed Sequence Tag Database

A legume genomics resource: The Chickpea Root Expressed Sequence Tag Database Electronic Journal of Biotechnology ISSN: 0717-3458 Vol. 8 No. 2, Issue of August 15, 2005 2005 by Pontificia Universidad Católica de Valparaíso -- Chile Received December 9, 2004 / Accepted April 27,

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

MicroSEQ Rapid Microbial Identification System

MicroSEQ Rapid Microbial Identification System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based

More information

From AP investigative Laboratory Manual 1

From AP investigative Laboratory Manual 1 Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND

More information

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09 Molecular Biology Primer pts 580, omputational enomics, Spring 09 Starting 19 th century What do we know of cellular biology? ell as a fundamental building block 1850s+: ``DNA was discovered by Friedrich

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Chapter 15 Gene Technologies and Human Applications

Chapter 15 Gene Technologies and Human Applications Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect

More information

Hands-On Four Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

Examination Assignments

Examination Assignments Bioinformatics Institute of India H-109, Ground Floor, Sector-63, Noida-201307, UP. INDIA Tel.: 0120-4320801 / 02, M. 09818473366, 09810535368 Email: info@bii.in, Website: www.bii.in INDUSTRY PROGRAM IN

More information

Existing potato markers and marker conversions. Walter De Jong PAA Workshop August 2009

Existing potato markers and marker conversions. Walter De Jong PAA Workshop August 2009 Existing potato markers and marker conversions Walter De Jong PAA Workshop August 2009 1 What makes for a good marker? diagnostic for trait of interest robust works even with DNA of poor quality or low

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!

Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! buell@msu.edu! 1 Whole Genome Shotgun Sequencing 2 New Technologies Revolutionize

More information

Advances in analytical biochemistry and systems biology: Proteomics

Advances in analytical biochemistry and systems biology: Proteomics Advances in analytical biochemistry and systems biology: Proteomics Brett Boghigian Department of Chemical & Biological Engineering Tufts University July 29, 2005 Proteomics The basics History Current

More information

The use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015

The use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015 The use of bioinformatic analysis in support of HGT from plants to microorganisms Meeting with applicants Parma, 26 November 2015 WHY WE NEED TO CONSIDER HGT IN GM PLANT RA Directive 2001/18/EC As general

More information

NOTES - CH 15 (and 14.3): DNA Technology ( Biotech )

NOTES - CH 15 (and 14.3): DNA Technology ( Biotech ) NOTES - CH 15 (and 14.3): DNA Technology ( Biotech ) Vocabulary Genetic Engineering Gene Recombinant DNA Transgenic Restriction Enzymes Vectors Plasmids Cloning Key Concepts What is genetic engineering?

More information

PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA

PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE-452017 INDIA BIOINFORMATICS Bioinformatics is considered as amalgam of biological sciences especially Biotechnology with

More information

Function Prediction of Proteins from their Sequences with BAR 3.0

Function Prediction of Proteins from their Sequences with BAR 3.0 Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2

More information

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32. New Jersey Center for Teaching and Learning Slide 1 / 32 Progressive Science Initiative This material is made freely available at www.njctl.org and is intended for the non-commercial use of students and

More information

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with

More information

Sequence Analysis Lab Protocol

Sequence Analysis Lab Protocol Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136

More information

Hands on session: Advanced promoter analysis

Hands on session: Advanced promoter analysis Interactive workshop Hands on session: Advanced promoter analysis Thomas Werner CEO&CSO Genomatix Software GmbH Landsberger Strasse 6, D-80339 München http://www.genomatix.de Outline of tasks What you

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project

More information

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer: Sequence Variations Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms NCBI SNP Primer: http://www.ncbi.nlm.nih.gov/about/primer/snps.html Overview Mutation and Alleles Linkage Genetic variation

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

Entrez Gene: gene-centered information at NCBI

Entrez Gene: gene-centered information at NCBI D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National

More information

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Organisation de Coopération et de Développement Economiques Organisation for Economic Co-operation and Development

Organisation de Coopération et de Développement Economiques Organisation for Economic Co-operation and Development Unclassified ENV/JM/MONO(2002)7 ENV/JM/MONO(2002)7 Unclassified Organisation de Coopération et de Développement Economiques Organisation for Economic Co-operation and Development 20-Oct-2004 English -

More information