EMBO COURSE. Practical Course on Genetic and Molecular Analysis of Arabidopsis. Module 3. Genome analysis and in silico functional predictions

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "EMBO COURSE. Practical Course on Genetic and Molecular Analysis of Arabidopsis. Module 3. Genome analysis and in silico functional predictions"

Transcription

1 EMBO COURSE Practical Course on Genetic and Molecular Analysis of Arabidopsis Module 3 Genome analysis and in silico functional predictions Barbet J.C., Chiapello H., Cooke R., Lecharny A., Ollivier E. and Rouzé P. 3-1

2 3.1 Introduction Sequence databases Arabidopsis resources How to access and search the biological data Sequence similarity searches Rapidity, selectivity, specificity and exhaustivity How to find genes in a genomic sequence and what are the annotations? What may be inferred about an uncharacterized protein? A new database, Indigo Readings Bibliography

3 3.1 Introduction The size of the nuclear genome of Arabidopsis thaliana is estimated to be Mb organized in 5 chromosomes. Started in the early 90 as a word wide coordinated effort (1), the sequencing of this model genome is progressing very rapidly. At the time of this course, May 99, more than 50% of the genome will be available in public databases. Progress of the sequencing project can be monitored through the Arabidopsis thaliana data base: or the MIPS (Munich Information center for Protein Sequences, Max-PlancK Inst. für Biochemie, Martinsried, Germany) WWW server: The complete sequence is announced for the year 2001 by the steering commitee of the Arabidopsis Genome Initiative. The complete mitochondrial DNA sequence (57 identified genes in 366,924 nucleotides) is also available (2): This places Arabidopsis workers in a privileged but uncomfortable situation. Indeed, it is a great advantage to know so much of the sequence of the organism you are working on. But the daily release of raw sequences by a number of sequencing facilities makes it extremely difficult for unskilled or unassisted biologists to keep up with available data. Nowadays, bioinformatics, in the instrumental meaning of analysis and interpretation of sequence data, should play a key role in the design of most if not all biological experiments. Thus, for a biologist, bioinformatics is no longer an optional subject but needs to be part of fundamental training. The aim of this part of the course is to give insight in biocomputing tools which are necessary in many studies where a sequence is a step in the protocol. This is evident in genomics (ordered DNA libraries, SAGE) or proteomics approaches but it is also essential in a number of methods like screening of T-DNA mutants, in silico cdna cloning, chromosome walking, specific PCR amplifications etc. The use of such tools is of great help in function identification prior to experimental verification. There are a number of Web sites in France and neighbouring countries which propose biocomputing tools and links to servers all around the word: GIS-INFOBIOGEN (INFOrmatics for BIOmolecules and GENomes), a national academic centre, VILLEJUIF, France: The Pasteur Institute, Paris, France: The Pôle Bio-Informatique Lyonnais, Lyon, France. The Laboratory of Biometry and Evolutive Biology and the Institute of Biology and Chemistry of Proteins: The Atelier Bioinformatique de Marseille, France, provides a comprehensive and up-todate list of sites proposing a wide variety of biocomputing tools: The Institut de Génétique et de Biologie Moléculaire et Cellulaire de Strasbourg, France, gives an access to SRS, a Sequence Retrieval System on the World Wide Web. Also, to get : 3-3

4 SeqCleaner, a programme that removes end of sequences with too many N s or vector sequences. DBWatcher, a programme handling periodic BLAST searches and reporting novel similarities. ClustalX, a windows interface for the ClustalW multiple sequence alignment programme (3). A map and a list to click to access the EMBnet members WWW servers. EMBnet, the European Molecular Biology network, is a science-based group of 26 collaborating nodes throughout Europe that provide data and software accessibility to the European molecular biology community. The Department of Plant Genetics, Gent, Belgium: The Expasy server of the Swiss Institute of Bioinformatic, Genève, Switzerland. Protein sequences and structures. The Pedro server provides a collection of WWW links to information and services useful to molecular biologists. As in wet-lab procedures, to be efficient, we have to make the best choice at each step of an analysis and this requires both an overview of the possibilities and a basic understanding of the method being applied. Most sites proposing on-line analysis have built-in, hypertext-linked help pages which provide information on the programme(s), databases available and the parameters which can be used in analysis. Due to the shortage of time, only the analysis of the primary structure, i.e. the sequence, will be considered even if it is obvious that, when possible, sequence analysis should proceed further by gaining information on the structures at the 3D level. Below we present short introductions to what will be discussed and experimented. The aim is to introduce various terms and notions frequently used in biocomputing. Deliberately, only the tools and resources freely available through the INTERNET are considered. 3.2 Sequence databases Every year, the first issue (January 1999, 27-1) of Nucleic Acids Research is especially dedicated to biological databases. DNA: All publicly available DNA sequences (>3,000,000) are collected in three sequence databases, DDBJ, EMBL and GenBank, exchanging data on a daily basis. DNA sequences come primarily from direct submission of sequence data from individual laboratories and 3-4

5 large-scale sequencing projects. From France these three databases may be efficiently searched at INFOBIOGEN. EMBL: European Bioinformatics Institute, Hinxton, UK. GenBank: National Center for Biotechnology Information, Bethesda, MD, USA. DDBJ: DNA DataBank of Japan, Mishima, Japan. Proteins: Swiss-Prot (4): The SWISS-PROT Protein Sequence Database is a curated protein sequence database which provides a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc). TREMBL(4) (TRanslated from EMBL) and GenPeptide: CDS features from EMBL and GenBank nucleotide sequence DataBases as translated peptide sequences. Useful when these sequences are not yet integrated in SWISS-PROT. The Protein Information Resource (PIR): PIR International is a collaboration established in 1988 between the National Biomedical Research Foundation (NBRF), the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID). It is a curated data base divided into three sections depending on the degree of verification of sequences (from unverified to fully classified). Sequences are organized according to structural, functional and evolutionnary relationships. 3.3Arabidopsis resources. The principal Arabidopsis thaliana resources are available from or linked to the AtDB at Stanford University School of Medicine, Stanford, CA, USA. AtDB collects all the Arabidopsis sequences and also provides a number of tools for sequence analysis. AtDB also includes interactive physical and genetic maps, the latest AGI sequencing information, colleague details, literature, clone and locus data and important information relevant to the Arabidopsis community. AtDB has recently released a new version of the Unified Display of Physical Maps. The display contains BAC and YAC tiling paths and a number of physical maps from many research groups. It is very useful for map-based cloning. 3-5

6 In general databases, about 40,000 Arabidopsis ESTs (5) (EST = Expressed sequence tags, bases) are available in dbest. An EST is a single-pass sequencing of a cdna (5). It is a rapid means of obtaining information on coding sequences. TIGR has assembled Arabidopsis ESTs with Arabidopsis transcripts into Tentative Consensus (TC) sequences and provides the results as a service to the community These partial cdna sequences that may contain errors (up to 5%) allow the detection of similarities with known sequences in databases, the design of gene-specific probes to be used in expression studies and for mapping experiments. They are also an easy way to a (full-length or partial) cdna. Once you have found the reference (either clone name or accession number) of a cdna tagged by an EST, you may order the cdna clone from the Arabidopsis Biological Resource Centre (ABRC) at Ohio State University, USA. An other Biological Resource Centre for Arabidopsis is the Nottingham Arabidopsis Stock Centre (NASC). Resources available at NASC include seeds of mutants, ecotypes and T-DNA lines. It is also the curator of the Lister and Dean Recombinant Inbred map. BAC-end sequences are sequences of the extremities (< 1kb) of Bacterial Artificial Chromosomes. They are consultable at several sites including the TIGR, Rockville, MD, USA the CNS, Paris, France An increasing number of databases are dedicated to a given gene family with a classification by species. For instance: Secretory peroxidase genes in A. thaliana Cytochrome P450 family in A. thaliana (>350 genes) 3.4 How to access and search the biological data. In databases, sequence entries are made of the sequence itself and a number of informations either describing the sequence (features) or associated to the sequence (reference, comment etc.). On next page is a short comparison of the organization in fields of typical data forms from EMBL and GenBank. Note that framed parts may be repeated a number of times. There are two kinds of searches. In a text based search we use words we hope to have been used in the descriptions of the sequence. Three retrieval systems allow to select sequences from many criteria and to extract selected sequences. The three logical operators OR, AND and BUTNOT can be used to combine search words in an index search

7 As an increasing percentage of Arabidopsis entries in databases is made of large sequences registered by sequence facilities invoved in the AGI, very often your query nucleotide sequence will be similar to only a small part of the subject sequence. For example, Arabidopsis BAC or P1 sequences are around 100 kb long, enough to contain about 20 genes. The features, containing the positions of each of the (predicted)genes, help you to locate the region of interest in large annotated sequences. Example: start new SRS session select embl and GenBank continue select AccNumber Y11187 Do Query >EMBL:AT23KB >GenBank:AT23KB The features table contains : information about potential gene products, regions of biological significance and cross-references to other data collections. Data in the feature tables are of highly different quality levels since they are produced by different methods (from informatics predictions-annotations to experimental evidences by biology) and provided by different authors (6). Expert curated databases exist (Swiss-Prot) but they are not the rule. 3-7

8 GenBank embl LOCUS ID+DT ATXXXXDNA nnnn bp DNA PLN DEFINITION DE A.thaliana XXXX gene ACCESSIONAC Annnnn KEYWORDS KW XXXX gene SOURCE thale cress ORGANISM OS+OC Arabidopsis thaliana Eukariotae; mitochondrial eukariotes..arabidopsis REFERENCE RN+RP 1 (bases 1 to nnnn) AUTHORS Dupont et al. TITLE Regulation of.. JOURNAL RL Unpublished COMMENT NCBI gi: nnnnnn FEATURES FH keylocation/qualifiers SOURCE FT source 1..xxxx /organism= Arabidopsis thaliana /strain= Columbia /map= 4-xx CDS FT CDS join (xxxx..xxxx,.) /translation= MAPTETTGS. EXON FT exon xxxx.xxxx /gene= XXXX /number=1 INTRON FT intron xxxx.xxxx /number=1 BASE COUNT 1521 a 939 c 1049 g 1946 t ORIGIN SQ 1 gtcaagtggtaaccggtcaacgtagccat // 3-8

9 In a sequence based search, we use a nucleotide or a protein sequence (the query) to search a sequence database. Due to the non standardization of the descriptions of the sequences, and because efficient links have been established between a number of databases, when possible, a sequence based search is often the best way to collect the desired information. The query sequence may be either represented by the Accession number it has received in the EMBL/GenBank/DDBJ databanks or is the sequence itself in an defined format. Two formats are largely used: the FASTA format, >sequence_name or any thing you want ATGCATGCATGCATGCATGCATGCATGCATGCA... and the plain (raw) text format, ATGCATGCATGCATGCATGCATGCATGCATGCA... As a first step in sequence analysis it is best to use a programme performing local alignment, which will only look for similar blocks of sequence between your query and data base sequences (subject sequences) rather than attempting to align both sequences from end to end (global alignment). 3.5 Sequence similarity searches. For local alignment the BLAST (Basic Local Alignment Search Tool, ref 7) suite of programmes has become a worldwide reference for such analyses and is almost always used to begin sequence comparison. BLASTs search for alignments locally maximal using a scoring method described in the help of BLAST. The five BLAST programs perform the following tasks: blastp compares an amino acid query sequence against a protein sequence database; blastn compares a nucleotide query sequence against a nucleotide sequence database; blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database; tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). 3-9

10 tblastx compares the six-frame translations of a nucleotide query sequence against the sixframe translations of a nucleotide sequence database. The result of a sequence based search is (by default) organized following the level of similarity between your query sequence and a number of sequences of the searched databases (subject sequences). Depending on the BLAST you use, the statistical significance of the alignment between a query sequence and a subject sequence is indicated by: -the HS or High Score: the score of the best alignment. By itself the HS is not useful to evaluate an alignment unless the specific scoring matrix employed is also provided. -the P(N) value (old-ungapped BLAST1.4.11): the statistical significance based on all the N alignments, with a HS above a given value, between the query and a subject. P(N) is given following the scientific notation, thus, 3.5 e-25 = P(N) varies from 0 to 1. The lower P(N) value correspond to higher statistical significance of the similarity between the two aligned sequences. -the BLAST E value (new-gapped BLAST2.0): an E value of 1 assigned to a hit between a query sequence and a subject sequence can be interpreted as meaning that in the searched database one might expect to see 1 match with a similar score simply by chance. Both P(N) and E value depend on the scoring matrix, the length of the query sequence and the total length of the database. The size of the database searched is indicated in the introductory BLAST output. Always remember that P(N) or E values are nothing but statistics. The biological significance of an alignment cannot be simply derived from these values but it needs validation. More specialized forms of BLAST are: BLAST-PSI (Position Specific Iterated)(8): Proteins only. query sequence gapped BLAST search position specific score matrix from previous alignments iteration database search BLAST-PHI (Pattern Hit Initiated)(9): Proteins only. Search for protein motifs and for similarity in the vicinity of this motif in subject sequences. It is an efficient method to find homologous proteins since it is linked to BLAST-PSI. The motif sequence should be in PROSITE format: The motifs may come from the PROSITE database or from the BLOCKS database. It may also be built from your own family of sequences with BLOCKS MAKER. BLAST2sequences: to compare 2 sequences only. 3.6 Rapidity, selectivity, specificity and exhaustivity. 3-10

11 The rapidity, sensitivity and specificity of a sequence based search are all critically increased if coding DNA is conceptually translated to protein before performing the search and if regions of low complexity are not considered (see the Filter item in the BLAST help). Such a gain in the quality of the result of a search may be also obtain by restricting the database size. The rapid increase in the number of entries in sequence databases as well as the submission of large amounts of lower-quality, single-pass or unfinished sequence has led to the creation of independent subdivisions of the data. ESTs are in the separate dbest whereas high throughput genome sequences are initially presented in the HTG(S) or GSS (Genome Survey Sequences) sections. In addition, HTGS sequences are divided into three subsections: HTGS_PHASE1 : Sequence consists of an unordered set of sequence pieces (typically 7-20), unoriented, unannotated and containing gaps. HTGS_PHASE2 : Sequence consists of sequence pieces (typically 2 or 3) for which order and orientation have been established, while gaps remain. HTGS_PHASE3 : Sequence is considered to be completed and might contain (some) annotation. GenBank divisions: These change relatively frequently and currently include non-redundant sequences, ESTs, HTGS, GSS and more recently, taxonomy-specific searches (by name, group, etc.). NCBI-What s New 10/19/98 Organism-specific BLAST is now available at the NCBI. Users may limit their BLAST search to a specific organism selected from a pull-down menu of common organisms or by entering an organism name (Genus species) or a taxonomic group (e.g., "Eukaryota"). Arabidopsis thaliana lineage (short): Eukaryota; Magnoliophyta (flowering plants); Eudicotyledons; Rosidae; Capparales; Brassicaceae; Arabidopsis. Another subdivision which it may be wise to consider are the month section (all new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days). EMBL divisions : The EMBL data base has always been divided into subsections (e.g. bacteria BAC, mammals MAM, plants PLN, etc.) and new ones have been added, such as the EST or HTG subsections. nrdb: The non-redundant (nr) data bases built up by some centers of sequence data resources contains all classically-sequenced or high quality systematic sequencing data from GenBank+EMBL+DDBJ+PDB after suppression of redundant entries but generally without EST, STS (Sequenced Tagged Site), GSS, or HTGS sequences). These latter sections contain a large number of sequences (37,000 A. thaliana sequences in dbest and 29,000 in GSS in February 1999). This means that, for exhaustive analysis, separate alignments against all relevant sections must be carried out since, depending on the server you use, the ESTdb, the BAC-end-sequences-db and the HTG(S)db may need separated searches. 3.7 How to find genes in a genomic sequence and what are the annotations? 3-11

12 Finding genes in a genomic sequence from an higher eukaryote is not trivial. It is a process related to a number of biological issues like the mechanism of splicing, the biaised codon usage and homology. Annotations are rapid and automatic functionnal assignments. They are possible only if similar sequences with known function exist, that is around 50% of the predicted genes in A. thaliana. Annotations, based on similarity searches, should be considered cautiously since a significant fraction of them in databases are wrong. Annotations and functional data that relate to sequence are stored in the features tables of the sequence. 3.8 What may be inferred about an uncharacterized protein? From similarity to homology and to function. Much can be inferred about an uncharacterized protein when significant sequence similarity is detected with a well studied protein. Protein families are a powerful tool in the functional predictions. Clustering of protein by (super)family is the object of the PIR-PROTFAM database: Each family groups homeomorphic proteins with 50% of sequence identity. More than 10,000 families are recognized. Superfamilies (~2,000) clustered sequences with 30% identity. When a sequence similarity is observed in a search against the uncurated nr databases, always verify that the function of the subject sequence has been obtained experimentally and not only by similarity (error propagation). Do not rely directly on the highest score in a sequence comparison. Biologically significant similarities may be different of the statistical signification. Nevertheless, it should be borne in mind that functions may diverge as sequences diverge and thus the biological role may be altered even when biochemical function is retained. Homologs: genes descending with modifications from a common gene ancestor. Orthologs: homologs with function conserved in different species. Generated by speciation events. Paralogs: homologs with different functions in the same species. Generated by duplication events and divergence. Convergence: sequence similarities that have arisen without a common evolutionary history. Convergence is generaly suspected only in small regions or domains of genes. Proteins are very often made of more than one recognizable domain. When the function of a protein cannot be found by an overall similarity with a well known protein, we may proceed by a protein domain dissection to look for conserved, diagnostic signatures which are found in particular proteins or protein families. PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. BLOCK Searcher compares a protein or a DNA sequence to a database of protein blocks. Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins

13 In general, homologues share more than one Block and distances between two Blocks are frequently conserved. BLOCK Searcher output takes all these data into account and deliver finger prints i.e. a group of conserved motifs used to characterize a protein family. The Blocks may be retrieve with Get Blocks. You may create your own blocks from a set of homologous sequences with Block Maker. PRINTS is an other compendium of protein fingerprints. PRODOM : The ProDom protein domain database consists of an automatic compilation of homologous domains. ProDom was built using recursive PSI-BLAST searches. Some WWW sites offer tools helping in the prediction of either peptide signals or transmembrane fragments. ChloroP, for chloroplast transit peptides and their cleavage sites in plant proteins. SignalP, for signal peptide and cleavage sites. TMHMM, for transmembrane helices. TMpred, for prediction of membrane-spanning regions and their orientation. PSORT, for the prediction of protein localization sites in cells A new database, Indigo. The database, Indigo (10), is open through The concept used for organising the data is the concept of neighbourhood. The main idea underlying this work is that the biological objects making a cell alive cannot be isolated from each other : biology must be described more as a science of relationships between objects, than as a science describing objects. Knowledge of whole genome sequences is a unique opportunity to study the relationships between genes and gene products. In most cases, we ignore what relationships are involved, but we know that they exist. To study them we investigated the concept of neighbourhood in order to organise the disparate knowledge we have on a particular genome. This concept is very wide. Because we study the genomic text, we chose genes as the core items. For a given gene, we constructed lists of neighbours based on links of several possible categories. The first and intuitive relationship between two genes is their proximity in a chromosome. A second possibility, often used in classical studies is to related genes or gene products because they evolve from a common ancestor. We can also consider that genes coding for proteins involved in the same metabolic pathways or using the same substrat are related. This constitutes the metabolic neighbourhoods. More complex relationships have been described, such as relationships based on the genetic code utilisation or on common presence in bibliographical references: two genes can be related because they used synonymous codons whith the same frequency; they also can be linked because they are cited in the same bibliographical source. Indigo creates an interactive environment allowing to retrieve and exploit the knowledge about gene neighbours for model organisms (at present : E. coli and B. subtilis, and a preliminary compendium of A. thaliana genes). 3-13

14 3.10 Readings. Genome analysis and predictions Science, 1998 Oct 23, 282 (5389): Trends guide to bioinformatics : Trends Supplement 1998 Genetwork, in various issues of Trends in Genetics. Genome analysis: A laboratory manual. Vol 1 Chapter 7 Computational analysis and annotation of sequence data. Baxevanis A.D. et al. BIOSCI is a set of free electronic communication forums used by biological scientists worldwide. It contains a specialized section for Arabidopsis Bibliography: To load the following references and the corresponding summary directly into your own bibliography database, connect you to the servers for MEDLINE and use the MUI or PMID codes. References may also be searched at INIST by words or by shelf number. 1) Bevan M. et al., Objective: the complete sequence of a plant genome. Plant Cell 9, Bevan M. et al., Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391, (MUI: ). Bevan M. et al., Clearing a path through the jungle: progress in Arabidopsis genomics. Bioessays, 21: Sato S. et al., Structural analysis of Arabidopsis thaliana chromosome 5. I (MUI: ); II (MUI: ); III (MUI: ); IV (MUI: ). DNA Res. 2) Unseld M. et al., The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides. Nat Genet 15: (MUI: ; PMID: ) (shelf number:22883). 3) Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 24: ) Bairoch A. and Apweiler R., The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res. 25, (MUI ). 5) Cooke R. et al., Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5,000 non-redundant ESTs. The Plant Journal, 9, (MUI: ). 3-14

15 6) Rouzé P. et al., Genome annotation: which tools do we have for it? Curr. Opin. Plant Biol., 2, ) Altschul S.F. et al., Basic local alignment search tool. J. Mol. Biol. 215: (MUI ). Altschul S.F. et al., Issues in searching molecular sequence databases. Nature Genetics 6, (MUI ). 8) Altschul S.F., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, (UI: ). 9) Zhang et al.,1998. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 26: (UI: ). 10) Nitschké P. et al., Indigo: a World-Wide-Web review of genomes and gene functions. FEMS Microbiology Reviews 22, (UI: ). 3-15

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

I nternet Resources for Bioinformatics Data and Tools

I nternet Resources for Bioinformatics Data and Tools ~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.

More information

Types of Databases - By Scope

Types of Databases - By Scope Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Integration of data management and analysis for genome research

Integration of data management and analysis for genome research Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

ONLINE BIOINFORMATICS RESOURCES

ONLINE BIOINFORMATICS RESOURCES Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.

More information

Glossary of Commonly used Annotation Terms

Glossary of Commonly used Annotation Terms Glossary of Commonly used Annotation Terms Akela a general use server for the annotation group as well as other groups throughout TIGR. Annotation Notebook a link from the gene list page that is associated

More information

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007 Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

Introduction to Molecular Biology Databases

Introduction to Molecular Biology Databases Introduction to Molecular Biology Databases Laboratorio de Bioinformática Centro de Astrobiología INTA-CSIC Centro de Astrobiología PRESENT BIOLOGY RESEARCH Data sources Genome sequencing projects: genome

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor

More information

Sequence Analysis Lab Protocol

Sequence Analysis Lab Protocol Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136

More information

B I O I N F O R M A T I C S

B I O I N F O R M A T I C S B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

Engineering Genetic Circuits

Engineering Genetic Circuits Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art

More information

Exploring the Genetic Basis for Behavior. Instructor s Notes

Exploring the Genetic Basis for Behavior. Instructor s Notes Exploring the Genetic Basis for Behavior Instructor s Notes Introduction This lab was designed for our 300-level Advanced Genetics course taken by juniors and seniors majoring in Biology or Biochemistry.

More information

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of

More information

Access to Information from Molecular Biology and Genome Research

Access to Information from Molecular Biology and Genome Research Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is

More information

7 Gene Isolation and Analysis of Multiple

7 Gene Isolation and Analysis of Multiple Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple

More information

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan

More information

GenBank. Dennis A. Benson*, Mark S. Boguski, David J. Lipman, James Ostell and B. F. Francis Ouellette

GenBank. Dennis A. Benson*, Mark S. Boguski, David J. Lipman, James Ostell and B. F. Francis Ouellette 1998 Oxford University Press Nucleic Acids Research, 1998, Vol. 26, No. 1 1 7 GenBank Dennis A. Benson*, Mark S. Boguski, David J. Lipman, James Ostell and B. F. Francis Ouellette National Center for Biotechnology

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Hands-On Four Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1 AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1 - Genetics: Progress from Mendel to DNA: Gregor Mendel, in the mid 19 th century provided the

More information

Regulation of eukaryotic transcription:

Regulation of eukaryotic transcription: Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:

More information

Overview of Health Informatics. ITI BMI-Dept

Overview of Health Informatics. ITI BMI-Dept Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational

More information

Chapter 20: Biotechnology

Chapter 20: Biotechnology Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter

More information

Examination Assignments

Examination Assignments Bioinformatics Institute of India H-109, Ground Floor, Sector-63, Noida-201307, UP. INDIA Tel.: 0120-4320801 / 02, M. 09818473366, 09810535368 Email: info@bii.in, Website: www.bii.in INDUSTRY PROGRAM IN

More information

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi

More information

Entrez Gene: gene-centered information at NCBI

Entrez Gene: gene-centered information at NCBI D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics KBM Journal of Science Education (2010) 1 (1): 7-12 doi: 10.5147/kbmjse/2010/0013 Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics Pablo Sobrado Department of Biochemistry,

More information

Using mutants to clone genes

Using mutants to clone genes Using mutants to clone genes Objectives 1. What is positional cloning? 2. What is insertional tagging? 3. How can one confirm that the gene cloned is the same one that is mutated to give the phenotype

More information

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Advances in analytical biochemistry and systems biology: Proteomics

Advances in analytical biochemistry and systems biology: Proteomics Advances in analytical biochemistry and systems biology: Proteomics Brett Boghigian Department of Chemical & Biological Engineering Tufts University July 29, 2005 Proteomics The basics History Current

More information

Chapter 15 Gene Technologies and Human Applications

Chapter 15 Gene Technologies and Human Applications Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect

More information

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning Section A: DNA Cloning 1. DNA technology makes it possible to clone genes for basic research and commercial applications: an overview 2. Restriction enzymes are used to make recombinant DNA 3. Genes can

More information

Modern BLAST Programs

Modern BLAST Programs Modern BLAST Programs Jian Ma and Louxin Zhang Abstract The Basic Local Alignment Search Tool (BLAST) is arguably the most widely used program in bioinformatics. By sacrificing sensitivity for speed, it

More information

Chapter 5. explain how information is submitted to and processed by biological databases.

Chapter 5. explain how information is submitted to and processed by biological databases. Introduction to Databases The computer belongs on the benchtop in the modern biology lab, along with other essential equipment. A network of online databases provides researchers with quick access to information

More information

Introduction to EMBL-EBI.

Introduction to EMBL-EBI. Introduction to EMBL-EBI www.ebi.ac.uk What is EMBL-EBI? Part of EMBL Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands,

More information

Bio 101 Sample questions: Chapter 10

Bio 101 Sample questions: Chapter 10 Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Getting To Know Your Protein

Getting To Know Your Protein Getting To Know Your Protein Comparative Protein Analysis: Part II. Protein Domain Identification & Classification Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

Bioinformatics, in general, deals with the following important biological data:

Bioinformatics, in general, deals with the following important biological data: Pocket K No. 23 Bioinformatics for Plant Biotechnology Introduction As of July 30, 2006, scientists around the world are pursuing a total of 2,126 genome projects. There are 405 published complete genomes,

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

NATIONAL OPEN UNIVERSITY OF NIGERIA SCHOOL OF ARTS AND SOCIAL SCIENCES COURSE CODE: BIO 316 COURSE TITLE: INTRODUCTION TO BIOINFORMATICS

NATIONAL OPEN UNIVERSITY OF NIGERIA SCHOOL OF ARTS AND SOCIAL SCIENCES COURSE CODE: BIO 316 COURSE TITLE: INTRODUCTION TO BIOINFORMATICS NATIONAL OPEN UNIVERSITY OF NIGERIA SCHOOL OF ARTS AND SOCIAL SCIENCES COURSE CODE: BIO 316 COURSE TITLE: INTRODUCTION TO BIOINFORMATICS 1 Course Code : BIO 316 Course Title : Introduction to Bioinformatics

More information

Open Access. Abstract

Open Access. Abstract Software ProSplicer: a database of putative alternative splicing information derived from protein, mrna and expressed sequence tag sequence data Hsien-Da Huang*, Jorng-Tzong Horng*, Chau-Chin Lee and Baw-Jhiune

More information

Sequencing the Human Genome

Sequencing the Human Genome The Biotechnology 339 EDVO-Kit # Sequencing the Human Genome Experiment Objective: In this experiment, DNA sequences obtained from automated sequencers will be submitted to Data bank searches using the

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas ORTHOMINE - A dataset of Drosophila core promoters and its analysis Sumit Middha Advisor: Dr. Peter Cherbas Introduction Challenges and Motivation D melanogaster Promoter Dataset Expanding promoter sequences

More information

Functional Genomics in Plants

Functional Genomics in Plants Functional Genomics in Plants Jeffrey L Bennetzen, Purdue University, West Lafayette, Indiana, USA Functional genomics refers to a suite of genetic technologies that will contribute to a comprehensive

More information

2012 GENERAL [5 points]

2012 GENERAL [5 points] GENERAL [5 points] 2012 Mark all processes that are part of the 'standard dogma of molecular' [ ] DNA replication [ ] transcription [ ] translation [ ] reverse transposition [ ] DNA restriction [ ] DNA

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

UNIT 3: GENETICS Chapter 9: Frontiers of Biotechnology

UNIT 3: GENETICS Chapter 9: Frontiers of Biotechnology CORNELL NOTES Directions: You must create a minimum of 5 questions in this column per page (average). Use these to study your notes and prepare for tests and quizzes. Notes will be stamped after each assigned

More information

Towards definition of an ECM parts list: An advance on GO categories

Towards definition of an ECM parts list: An advance on GO categories Towards definition of an ECM parts list: An advance on GO categories The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

MicroSEQ Rapid Microbial Identification System

MicroSEQ Rapid Microbial Identification System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based

More information

Construction of plant complementation vector and generation of transgenic plants

Construction of plant complementation vector and generation of transgenic plants MATERIAL S AND METHODS Plant materials and growth conditions Arabidopsis ecotype Columbia (Col0) was used for this study. SALK_072009, SALK_076309, and SALK_027645 were obtained from the Arabidopsis Biological

More information

Plant Proteomics Tutorial and Online Resources. Manish Raizada University of Guelph

Plant Proteomics Tutorial and Online Resources. Manish Raizada University of Guelph Plant Proteomics Tutorial and Online Resources Manish Raizada University of Guelph A Brief Introduction to Proteins -A. Structural proteins make large structures (eg. microtubule cables to pull chromosomes

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

Human KIR sequences 2003

Human KIR sequences 2003 Immunogenetics (2003) 55:227 239 DOI 10.1007/s00251-003-0572-y ORIGINAL PAPER C. A. Garcia J. Robinson L. A. Guethlein P. Parham J. A. Madrigal S. G. E. Marsh Human KIR sequences 2003 Received: 17 March

More information

AP Biology Gene Expression/Biotechnology REVIEW

AP Biology Gene Expression/Biotechnology REVIEW AP Biology Gene Expression/Biotechnology REVIEW Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Gene expression can be a. regulated before transcription.

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Bi 8 Lecture 7. Ellen Rothenberg 26 January Reading: Ch. 3, pp ; panel 3-1

Bi 8 Lecture 7. Ellen Rothenberg 26 January Reading: Ch. 3, pp ; panel 3-1 Bi 8 Lecture 7 PROTEIN STRUCTURE, Functional analysis, and evolution Ellen Rothenberg 26 January 2016 Reading: Ch. 3, pp. 109-134; panel 3-1 (end with free amine) aromatic, hydrophobic small, hydrophilic

More information

3 Designing Primers for Site-Directed Mutagenesis

3 Designing Primers for Site-Directed Mutagenesis 3 Designing Primers for Site-Directed Mutagenesis 3.1 Learning Objectives During the next two labs you will learn the basics of site-directed mutagenesis: you will design primers for the mutants you designed

More information

Plant genome annotation using bioinformatics

Plant genome annotation using bioinformatics Plant genome annotation using bioinformatics ghorbani mandolakani Hossein, khodarahmi manouchehr darvish farrokh, taeb mohammad ghorbani24sma@yahoo.com islamic azad university of science and research branch

More information

2 Gene Technologies in Our Lives

2 Gene Technologies in Our Lives CHAPTER 15 2 Gene Technologies in Our Lives SECTION Gene Technologies and Human Applications KEY IDEAS As you read this section, keep these questions in mind: For what purposes are genes and proteins manipulated?

More information

2054, Chap. 14, page 1

2054, Chap. 14, page 1 2054, Chap. 14, page 1 I. Recombinant DNA technology (Chapter 14) A. recombinant DNA technology = collection of methods used to perform genetic engineering 1. genetic engineering = deliberate modification

More information

Heme utilization in the Caenorhabditis elegans hypodermal cells is facilitated by hemeresponsive

Heme utilization in the Caenorhabditis elegans hypodermal cells is facilitated by hemeresponsive Supplemental Data Heme utilization in the Caenorhabditis elegans hypodermal cells is facilitated by hemeresponsive gene-2 Caiyong Chen 1, Tamika K. Samuel 1, Michael Krause 2, Harry A. Dailey 3, and Iqbal

More information