BIOINFORMATICS AN OVERVIEW
|
|
- Annabella Jennifer George
- 6 years ago
- Views:
Transcription
1 BIOINFORMATICS AN OVERVIEW T.R. Sharma Genoinformatics Lab, National Research Centre on Plant Biotechnology I.A.R.I, New Delhi Introduction Bioinformatics is the computational analysis of biological data, consisting of the information stored in the form of DNA and protein sequences in various biological databases. The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics which assess relationships among members of large data sets, the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information." Analyses in bioinformatics focus on three types of datasets: genome sequences, macromolecular structures, and functional genomics experiments (e.g. microarray data). However, bioinformatics tools are also applied to various other data, e.g. phylogenetic and metabolic pathway analysis, the text of scientific papers, and plant varietal information and statistics. Analysis of biological data requires application of large number of techniques like primary sequence alignment, protein 3D structure alignment, phylogenetic tree construction, prediction and classification of protein structure, prediction of RNA structure, prediction of protein function, and expression data clustering. Development of suitable algorithms is an important part of bioinformatics. The techniques and algorithms were specifically developed for the analysis of biological data, for instance, the dynamic programming algorithm for sequence alignment is one of the most popular programmes among the biologists. The sequence information generated worldwide is stored systematically in different types of databases. Hence, it is necessary to understand about the databases and their different types. What is a database? A database is a collection of information stored in a computer in a systematic way, such that a computer program can consult it to answer questions. A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence.
2 Divisions of DNA databases Since the size of databases is growing rapidly, these have been further broken into divisions on the basis of the taxonomy of the organisms. The GenBank divisions are divided into two general categories like, organismal and functional. The sequences derived from specific organisms are stored in the organismal category. Whereas the functional category include databases which are independent of their taxonomic classification e.g. EST, STS and HTG etc. Respective Genbank divisions store sequence records of different organism which is identified from three letter codes indicated in the beginning of each sequence entry. For instance, HTG (high throughput genome) division contained sequences generated from different organisms. These sequences are generally unfinished and are further classified as Phase1(sequences which are unfinished, unordered and contained gaps) and Phase 2 (sequences which unfinished, ordered and contained a few gaps). Once sequences are finished and all gaps are resolved (Phase 3) it moved to a specific division e.g. PLN in case of plants. The huge wealth of information in the form of DNA and protein sequences and publications on molecular biology are stored in the data banks (Fig.1). Major public data banks which takes care of the DNA and protein sequences are GenBank in USA ( EMBL (European Molecular Biology Laboratory) in Europe ( and DDBJ (DNA Data Bank) in Japan ( The growth of DNA sequence data in GenBank is depicted in Fig. 2. This rapid growth in DNA sequence data is because of the fact that various Collaborative International Programmes have started during the past few years to sequence complete genomes of various organisms. The whole genomes of various microorganisms have already been sequenced by The Institute of Genome Research (TIGR) which can be seen on their website The large genomes like Human (3 billion bp) Rice (450 Mb bp), Arabidopsis (130Mb bp) and Mouse (2.5 billion bp) have also been sequenced and the data is in public domain in GenBank. Now these DNA sequences have to be used in meaningful ways for the welfare of mankind. Different types of sequences of important crops available in public domain are listed in Table1. Fig.1. Status of Sequences submitted in the GenBank (Source: NCBI) VI-78
3 Table1. Different types of sequences of important crops available in public domain* Type of database in public domain Plant species Whole genome Oryza sativa, Arabidopsis thaliana Partial genome EST mrna Protein BAC end Source: NCBI T. aestivum, Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, L. esculentum, V. vinifera, Poncirus trifoliate, Medicago truncatula, Lotus corniculatus Aegilops tauschii, Allium cepa, Arabidopsis thaliana, Avena sativa, Beta vulgaris subsp. vulgaris, Brassica napus, Brassica oleracea, Brassica rapa, Capsicum annuum, Coffea arabica, Glycine max, Gossypium arboreum, Gossypium hirsutum, Helianthus annuus, Hordeum vulgare, Lactuca sativa, Lolium perenne, Lotus corniculatus, Lycopersicon esculentum, Malus domestica, Medicago sativa, Medicago truncatula, Nicotiana benthamiana, Nicotiana tabacum, Oryza sativa, Phaseolus coccineus, Phaseolus vulgaris, Saccharum officinarum, Secale cereale, Solanum melongena, Solanum tuberosum, Sorghum bicolor, Triticum monococcum, Vitis vinifera, Zea mays T. aestivum, Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, L. esculentum, V. vinifera, Medicgo truncatula, L. corniculatus, O. sativa, A. thaliana Z. mays, S. bicolor, B. oleracea, B. rapa, G. max, S. tuberosum, V. vinifera, C. sinensis, M. truncatula, E. globulus, O. sativa, A. thaliana Oryza australiensis, O. brachyantha, O. glaberrima, O. granulata, O. latifolia, O. minuta, O. officinalis, O. punctata, O. ridleyi, O. rufipogon, O. schlechteri, G. hirsutum Divisions of Protein databases Protein sequences are mainly stored in two databases EMBL and GenBank. Swiss-Prot which is a very well maintained and curetted database was established at the Swiss Institute of Bioinformatics. Though it is a small database, it has important annotations which are freely available to the academic users. GenBank created PIR a protein database as a translation of the Genbank. PIR database is further subdivided into four sections like PIR1, PIR2, PIR 3 and PIR4 on the bases of degree of annotation. DNA Sequence Analysis Bioinformatics tools are now easily available to the biologists with the advent of internet and various Web Browsers on World Wide Web. These tools are indispensable for any Genome Sequencing Centres. The analysis of DNA sequences started once these are out of the sequencing machines. The first and foremost task of a biologist is to look for the accuracy of sequence he got from the machine. One way is to go for finding cloning sites of inserts in the sequencing vector. If the insert is a PCR product then one should look for the primer sequences used in the amplification of that product. Then one can perform Basic Local alignment Search Tool (BLAST) search against the DNA sequence database in the GenBank and see the probable matches. If the unknown sequences shows hits with any sequence of the same or related organisms then it is considered as a true sequence. These are the basic steps, VI-79
4 which can be performed manually if the dataset is very small or if one has to deal with single or a few sequences. However, in large genome sequencing projects one has to handle thousands of sequences at a given time. Searching for Sequence Alignment Once high quality sequence is obtained once has to ask an important question whether this is a new sequence or the sequence similar to other DNA sequences available in the databases. For getting answer of this question, on has to perform database search for sequence comparisons. All sequence searching methods rely on the basic concepts of alignment and distance between the sequences and pair wise sequence alignment is performed. There are different algorithms to perform global and local alignments (Fig.2). In global alignment, complete alignment of the input sequence is performed with sequences available in the databases. Whereas in local alignment, most similar segments of the input sequence are aligned with the database sequences. Sequence comparison (DNA/protein) against database is one of the very important and powerful tools of bioinformatics. This type of sequence comparison is generally performed with two programmes BLAST and FASTA, which compares unknown sequence against a sequence database. In BLAST best local alignments between the unknown sequences and the database is found by using an approach based on matching short sequence fragments and a powerful statistical model. Whereas a method of approximation is used in FASTA which try to concentrate only on significant alignments. In BLAST search output, Expected (E) values and Bit scores are mentioned to determine the significant match of unknown sequences with that of sequences available in the database (Fig.3). The significance of a BLAST hit is very important for the interpretation of results. Generally 67% identity at DNA level shows 100% identity in protein level. It is also suggested that at least 75% sequence identity between two sequences should be observed for considering it as a significant hit. Fig.2. Global and local alignments between two DNA sequences VI-80
5 Fig.3. BLAST output showing Bit score and E values after similarity search Gene Prediction and Annotation Simply determining four alphabets (ATGC) of DNA sequences of any organism has no value until some meaning is derived from this by gene prediction. Gene prediction is complex work and there is no algorithm which can exactly predict the true exons in a DNA sequence. Basically two major considerations are taken into consideration while predicting a gene. 1) identification of structural elements such a start/ stop codon and splice sites of the unknown sequence and 2) performing homology search against protein, EST and cdna database to identify potential coding regions. For gene prediction, very commonly used software GENSCAN developed by MIT, USA ( which is freely available on Web and online analysis of DNA sequences, can be performed. The output obtained from the GENSCAN is then used for gene annotation by using BLAST to search the public or private DNA sequence databases to find out the matches to the unknown query sequence with millions of sequences available in the Gen Bank. A very popular Website is available for BLAST at NCBI`s Home page which performs searches by using various criteria and options (Fig.4). VI-81
6 Fig. 4. Performing BLAST search at NCBI Home page Primer Design Another important aspects in the use of genome sequence data after predicting genes are to design primers either for PCR or for sequencing. Such primers are used for the amplification of genes or its alleles from the known sources and making best use out of it. Though PRIME software within GCG package is mainly used for this purpose, PRIMER3- a web based software (www-genoem.wi.mit.edu /genome_software/other /primer3.html) is being commonly used for designing primers. PCR Primer pairs are designed to amplify a welldefined target sequences from the template. Some of the important considerations while designing primers are, the GC content, melting temperature, primer size, and size of the PCR product to be amplified. These parameters can be used either as default setting or one can change them as per their requirement. Phylogenetic Analysis Once similarity search is performed between unknown sequence and the database sequence to find per cent homology between them, it is obvious to know how these sequences are related to each other. The sequences derived from two closely related organisms shows more similarity at DNA level and distantly related organisms shows more dissimilarity at the sequence level. To find an evolutionary relationship among sequences derived from different organisms, a phylogenetic tree is constructed (Fig.5). Such evolutionary tree can also be constructed on the basis of phenotypic markers, molecular markers or sequence information. A typical phylogentic tree is comprised of nodes, branches and termini of the branches. When VI-82
7 all the branches are emerged from a common node it is termed as the root of a tree. Though some trees are constructed as un-rooted tree where common evolutionary point is not known. For constructing a phylogenetic tree the PILEUP option of GCG package is more commonly used. Besides, DNA STAR software ( also have options to construct tree from different DNA or protein sequences. However, web based tools like MacClade (//www. phylogeny.arizona.edu/macclade/) can also be used for evolutionary studies of different organisms based on their DNA sequences. Similarly, bioinformatics tools can be used for protein function analysis by database search. Finding SSR markers and SNP markers from the EST or genome sequences can be performed in silico by using different algorithms which will also be discussed in the presentation. Fig. 5. Phylogenetic analysis of resistance gene analogue sequences (sk21,sk95, sk10, sk3, sk76, sk101 and sk65) obtained from rice and known Resistance gene sequences (L6, M, N,RPS2 and Xa1) isolated from different crops. Analysis was performed with DNASTAR software. Conclusions In functional genomics, investigation of gene expression at whole genome levels under different stresses can be studied by using microarryas. Now-a-day this type of gene expression databases are being prepared in different organisms and even at different tissues. Bioinformatics tools are helpful in locating DNA sequences in the GenBank simply by putting accession numbers, making alignments of two or more than two sequences, performing similarity searches for unknown sequences in the GenBank, assembling short sequence reads and developing consensus sequences, finding genes and markers in silico and in performing comparative analysis of different genomes. Selected References and Web Resources Sobral, B.W.S Common language of bioinformatics. Nature. 389:418. Brown, S.M Bioinformatic: A Biologist`s Guide to Biocomputing and the Internet. Eton Publishing, Natick. MA, USA. Baxevanis, A.D. and Ouellette B.F.F Bioinformatics- A Practical Guide to the Analysis of Genes and Proteins. Second Edition. A John Wiley and Sons, Inc., Publication, NY. GENSCAN : FGENESH : VI-83
ELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationWorksheet for Bioinformatics
Worksheet for Bioinformatics ACTIVITY: Learn to use biological databases and sequence analysis tools Exercise 1 Biological Databases Objective: To use public biological databases to search for latest research
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationHost : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama
Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationStudent Learning Outcomes (SLOS)
Student Learning Outcomes (SLOS) KNOWLEDGE AND LEARNING SKILLS USE OF KNOWLEDGE AND LEARNING SKILLS - how to use Annhyb to save and manage sequences - how to use BLAST to compare sequences - how to get
More informationOverview of Health Informatics. ITI BMI-Dept
Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational
More informationComparative Bioinformatics. BSCI348S Fall 2003 Midterm 1
BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to
More informationSAMPLE LITERATURE Please refer to included weblink for correct version.
Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationGenome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does
More informationIntroduction to Molecular Biology
Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve
More informationGenomics and Transcriptomics of Spirodela polyrhiza
Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationIntegration of data management and analysis for genome research
Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa
More information3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome
Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts
More informationBioinformatics, in general, deals with the following important biological data:
Pocket K No. 23 Bioinformatics for Plant Biotechnology Introduction As of July 30, 2006, scientists around the world are pursuing a total of 2,126 genome projects. There are 405 published complete genomes,
More informationWhat is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.
What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer
More informationGenome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity
Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing
More informationTranscriptome analysis in the post-genomic era
Transcriptome analysis in the post-genomic era Faccioli P., Ciceri G.P., Provero P., Stanca A.M., Morcia C., Terzi V. in Molina-Cano J.L. (ed.), Christou P. (ed.), Graner A. (ed.), Hammer K. (ed.), Jouve
More informationSequencing the Human Genome
The Biotechnology 339 EDVO-Kit # Sequencing the Human Genome Experiment Objective: In this experiment, DNA sequences obtained from automated sequencers will be submitted to Data bank searches using the
More informationEuropean Commission Joint Research Centre Institute for Health and Consumer Protection
Extraction of DNA from Choline Chloride Feed Additive (CC) and from derived Pre-Mixes (PMCC) and Screening of CC and PMCC for (a) presence of rice and (b) presence of BT63 2014 Maria Grazia Sacco Francesco
More informationComputational Biology I LSM5191
Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA
More informationSerial Analysis of Gene Expression
Serial Analysis of Gene Expression Cloning of Tissue-Specific Genes Using SAGE and a Novel Computational Substraction Approach. Genomic (2001) Hung-Jui Shih Outline of Presentation SAGE EST Article TPE
More informationRNA-seq Data Analysis
Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More information7 Gene Isolation and Analysis of Multiple
Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple
More informationConifer Translational Genomics Network Coordinated Agricultural Project
Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 2 Genes, Genomes, and Mendel Nicholas Wheeler & David Harry
More informationAnnotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationApplication for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick
Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan
More informationCHAPTER 14 Genetics and Propagation
CHAPTER 14 Genetics and Propagation BASIC GENETIC CONCEPTS IN PLANT SCIENCE The plants we cultivate for our survival and pleasure all originated from wild plants. However, most of our domesticated plants
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationO C. 5 th C. 3 rd C. the national health museum
Elements of Molecular Biology Cells Cells is a basic unit of all living organisms. It stores all information to replicate itself Nucleus, chromosomes, genes, All living things are made of cells Prokaryote,
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationUC Davis UC Davis Previously Published Works
UC Davis UC Davis Previously Published Works Title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases
More informationEukaryotic Gene Prediction. Wei Zhu May 2007
Eukaryotic Gene Prediction Wei Zhu May 2007 In nature, nothing is perfect... - Alice Walker Gene Structure What is Gene Prediction? Gene prediction is the problem of parsing a sequence into nonoverlapping
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationBio 101 Sample questions: Chapter 10
Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information
More informationExpressed Sequence Tags: Clustering and Applications
12 Expressed Sequence Tags: Clustering and Applications Anantharaman Kalyanaraman Iowa State University Srinivas Aluru Iowa State University 12.1 Introduction... 12-1 12.2 Sequencing ESTs... 12-2 12.3
More informationDNA sequencing. Course Info
DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationCONSERVATION TILLAGE TRENDS IN VIRGINIA AGRICULTURAL PRODUCTION. Research and Extension Center, Painter, VA
2 CONSERVATION TILLAGE TRENDS IN VIRGINIA AGRICULTURAL PRODUCTION Mark S. Reiter 1 * 1 Department of Crop and Soil Environmental Sciences, Virginia Tech Eastern Shore Agricultural Research and Extension
More informationAccess to Information from Molecular Biology and Genome Research
Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is
More informationRNA Sequencing Analyses & Mapping Uncertainty
RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly
More informationChapter 20: Biotechnology
Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter
More informationA legume genomics resource: The Chickpea Root Expressed Sequence Tag Database
Electronic Journal of Biotechnology ISSN: 0717-3458 Vol. 8 No. 2, Issue of August 15, 2005 2005 by Pontificia Universidad Católica de Valparaíso -- Chile Received December 9, 2004 / Accepted April 27,
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationMicroSEQ Rapid Microbial Identification System
MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based
More informationFrom AP investigative Laboratory Manual 1
Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND
More informationMolecular Biology Primer. CptS 580, Computational Genomics, Spring 09
Molecular Biology Primer pts 580, omputational enomics, Spring 09 Starting 19 th century What do we know of cellular biology? ell as a fundamental building block 1850s+: ``DNA was discovered by Friedrich
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationChapter 15 Gene Technologies and Human Applications
Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect
More informationHands-On Four Investigating Inherited Diseases
Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise
More informationThe String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationExamination Assignments
Bioinformatics Institute of India H-109, Ground Floor, Sector-63, Noida-201307, UP. INDIA Tel.: 0120-4320801 / 02, M. 09818473366, 09810535368 Email: info@bii.in, Website: www.bii.in INDUSTRY PROGRAM IN
More informationExisting potato markers and marker conversions. Walter De Jong PAA Workshop August 2009
Existing potato markers and marker conversions Walter De Jong PAA Workshop August 2009 1 What makes for a good marker? diagnostic for trait of interest robust works even with DNA of poor quality or low
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationProtein Structure Prediction. christian studer , EPFL
Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationExploring Similarities of Conserved Domains/Motifs
Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;
More informationUsing the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!
Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! buell@msu.edu! 1 Whole Genome Shotgun Sequencing 2 New Technologies Revolutionize
More informationAdvances in analytical biochemistry and systems biology: Proteomics
Advances in analytical biochemistry and systems biology: Proteomics Brett Boghigian Department of Chemical & Biological Engineering Tufts University July 29, 2005 Proteomics The basics History Current
More informationThe use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015
The use of bioinformatic analysis in support of HGT from plants to microorganisms Meeting with applicants Parma, 26 November 2015 WHY WE NEED TO CONSIDER HGT IN GM PLANT RA Directive 2001/18/EC As general
More informationNOTES - CH 15 (and 14.3): DNA Technology ( Biotech )
NOTES - CH 15 (and 14.3): DNA Technology ( Biotech ) Vocabulary Genetic Engineering Gene Recombinant DNA Transgenic Restriction Enzymes Vectors Plasmids Cloning Key Concepts What is genetic engineering?
More informationPCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA
PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE-452017 INDIA BIOINFORMATICS Bioinformatics is considered as amalgam of biological sciences especially Biotechnology with
More informationFunction Prediction of Proteins from their Sequences with BAR 3.0
Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2
More informationAP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.
New Jersey Center for Teaching and Learning Slide 1 / 32 Progressive Science Initiative This material is made freely available at www.njctl.org and is intended for the non-commercial use of students and
More informationAPPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources
Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationHands on session: Advanced promoter analysis
Interactive workshop Hands on session: Advanced promoter analysis Thomas Werner CEO&CSO Genomatix Software GmbH Landsberger Strasse 6, D-80339 München http://www.genomatix.de Outline of tasks What you
More informationOutline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018
Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT
More informationOutline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation
Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project
More informationSequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:
Sequence Variations Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms NCBI SNP Primer: http://www.ncbi.nlm.nih.gov/about/primer/snps.html Overview Mutation and Alleles Linkage Genetic variation
More informationFiles for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web
More informationEntrez Gene: gene-centered information at NCBI
D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National
More informationCarl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life
METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary
More informationOrganisation de Coopération et de Développement Economiques Organisation for Economic Co-operation and Development
Unclassified ENV/JM/MONO(2002)7 ENV/JM/MONO(2002)7 Unclassified Organisation de Coopération et de Développement Economiques Organisation for Economic Co-operation and Development 20-Oct-2004 English -
More information