Gene Annotation Project. Group 1. Tyler Tiede Yanzhu Ji Jenae Skelton
|
|
- Aldous Abner Atkinson
- 5 years ago
- Views:
Transcription
1 Gene Annotation Project Group 1 Tyler Tiede Yanzhu Ji Jenae Skelton
2 Outline Tools Overview of 150kb region Overview of annotation process Characterization of 5 putative gene regions Analysis of masked regions
3 Annotation Tools Sequence analyses EMBOSS tools Dot Plot, word frequency, RepeatMasker, Nucleotide Density, and CpG Island Gene predictions FGeneSH, AUGUSTUS, GeneMark In the end used only FGeneSH and GeneMark b/c AUGUSTUS did not add additional information Alignments NCBI, TIGR, GRAMENE blastn, blastx, blastp Genome viewer MaizeGDB
4 Gramene Blast result 150kb region from Chr8: in the maize reference genome
5 Intrinsic Sequence Analysis GC content 49.77% 99,668bp (66.45%) of bases masked
6 Gene 1 4 exons Reverse Strand 844 bp coding sequence Gene Model Exon Start End Exon Length Evidence for Start Evidence for End (1022) (1424) (1666) (1855) (1342) (1586) (1789) (2093) Gene1:EST1.5; "Exon 4" Prediction; Gene1:cDNA1.4; Gene1:cDNA3.4; Gene1:cDNA4.4 Gene1:cDNA1.0; Gene1:cDNA2.0; Gene1:cDNA3.0; Gene1:cDNA4.0 "Exon 2" Prediction; Gene1:cDNA2.2; Gene1:cDNA4.2 Gene1:cDNA1.5; Gene1:cDNA3.5; "Exon 5" Predicted End "Exon 4" Prediction; Gene1:cDNA1.4; Gene1:cDNA3.4; Gene1:cDNA4.4 Gene1:cDNA2.0; Gene1:cDNA3.0; Gene1:cDNA4.0 Gene1:cDNA2.2; ; and weak support by Gene1:EST1.2 FGeneSH Prediction GeneMark Prediction Exon Strand Start End Start End Coding sequence of gene model
7 Gene 1 cont. Predicted exons 1 and 3 supported by EST and cdna Exon 2 not predicted by either software Predicted exon 4 partially supported by EST and cdna Overall, expression supported by ESTs in MaizeGDB and NCBI cdna/est summary from NCBI blastn Accession ID Query Range Relation to Predicted Exons % Match E. Value gb FL Gene1:cDNA e^-156 Gene1:cDNA e^-72 Gene1:cDNA e^-41 gb FL Gene1:cDNA e^-102 Gene1:cDNA e^-52 Gene1:cDNA e^-42 gb FK Gene1:cDNA e^-72 Gene1:cDNA e^-61 Gene1:cDNA e^-52 Gene1:cDNA gb CO e^-65 Gene1:cDNA e^-51 Gene1:cDNA e^-48 TA Gene1:EST e^-122 Gene1:EST e^-112 Gene1:EST e^-122 Gene1:EST e^-114
8 Gene 1 cont. NCBI blastx with model sequence NCBI blastp w/ FGeneSH predicted protein as query 4 exon gene model better supported than FGeneSH prediction by cdna and EST Expression supported by ESTs and some cdna blastx highest hit 64% match (e^-30) to a hypothetical protein hits of lesser extent also include hypothetical proteins blastp of FGeneSH predicted AA sequence yielded worse results (E.values >2) tblastx of model coding sequence provided no results Conclusion region codes for ncrna novel protein not yet characterized
9 Gene 2 3 exon model Forward strand 1248 bp coding sequence Possible homolog to candidate gene: 1-aminocyclopropane-1-carboxylase oxidase bp Gene Model Exon Exon Start Exon Stop Exon Length Evidence for Start Evidence for Stop Gene2:mRNAcds1.1; Gene Predictions; 201 Gene2:mRNAcds1.1 (35) (235) from MaizeGDB (355) (721) (599) (1524) 244 Gene2:mRNAcds1.2; Gene Predictions 803 Gene2:mRNAcds1.3 Gene2:mRNAcds1.2; Gene Predictions; from MaizeGDB Gene2:mRNAcds1.3; Gene Predictions; from MaizeGDB FGeneSH Prediction GeneMark Prediction Exon Strand Start End Start End FGeneSH predicted coding sequence, 942 bp: ATGGAGATTCCGGTGATCGATCTCGGCGGCCTCAACGGCGGCGGCGAGGAGAG GTCGCGGACCTTGGCGGAGCTCCACGACGCCTGCAAGGACTGGGGCTTCTTCTG GGTGGAGAACCACGGCGTGGACGCGCCGCTGATGGACGAGGTCAAGCGCTTCG TCTACGGCCACTACGAGGAGCACCTGGAGGCCAAGTTCTACGCCTCCGCCCTCG CCATGGACCTCGAGGCCGCCACCAGAGGTGACACTGATGAGAAGCCCTCCGAC GAGGTGGACTGGGAGTCCACCTACTTCATCCAGCACCACCCCAAGACCAACGTC GCCGACTTCCCAGAGATCACGCCGCCGACACGAGAGACGCTGGACGCGTACGT CGCGCAGATGGTGTCCCTCGCGGAGCGTCTGGCCGAGTGCATGAGCCTCAACCT GGGCCTCCCCGGGGCCCACGTCGCCGCCACCTTCGCGCCGCCGTTCGTGGGCAC CAAGTTCGCCATGTACCCGTCCTGCCCGCGCCCGGAGCTGGTGTGGGGCCTGCG CGCGCACACCGACGCCGGCGGCATCATCCTGCTCCTCCAGGACGACGTCGTGGG CGGCCTCGAGTTCCTCAGGGCCGGCGCCCACTGGGTCCCCGTCGGCCCCACCAA GGGGGGCAGGCTCTTCGTCAACATCGGGGACCAGATCGAGGTCCTCAGCGCCG GCGCCTACCGGAGCGTCCTGCACCGCGTCGCGGCCGGGGACCAGGGCCGCCGC CTGTCCGTGGCCACGTTCTACAACCCTGGCACCGACGCCGTGGTCGCGCCGGCG CCCCGCAGGGATCAGGACGCCGGCGCCGCGGCGTACCCCGGTCCCTACAGGTTC GGGGACTACCTCGACTACTACCAGGGCACCAAGTTCGGCGACAAGGACGCCAG GTTCCAGGCCGTCAAGAAGCTGCTCGGCTAA
10 Gene 2 cont. High match (almost 100%, E.value basically 0) to maize 1-aminocyclopropane-1-carboxylate oxidase 1 Many >>10 ESTs align to region, suggests that gene 2 is expressed Many blastx and blastp alignments to candidate gene in many other species, top 8 in table below Gene 2 may be a homolog to candidate gene Gene Model Match to Candidate Gene Accession ID Query Range Relation to Predicted Exons % Match E. Value Gene2:mRNAcd s Gene ID: ; 1- Gene2:mRNAcd aminocyclopropane e^-124 carboxylase oxidase 1 s1.2 Gene2:mRNAcd e^-100 s1.1 cdna from MaizeGDB (below) 3 exons -Coordinates: blastx results Top Hits from blastp and blastx to 1- aminocyclopropane-1-carboxylate oxidase 1 Organism % Match E.value Arabidopsis thaliana 50% 2e^-105 clove pink 43% 5e^-86 Indian rice 45% 5e^-85 Japanese rice 45% 3e^-85 Kiwifruit 43% 7e^-85 Arabidopsis thaliana (L.) Heynh 43% 4e^-83 Apple 42% e^-82 Tomatoe 42% 4e^-82
11 Gene 3 8 or 9 exons possible alternative splicing 4150 bp of coding sequence for model 1; 3589 bp for model 2 Forward strand Candidate gene: lycopene epsilon cyclase 1 (lyce1) FGeneSH Prediction GeneMark Prediction Exon Strand Start End Start End Exons 1-7 of models match MaizeGDB model, whose CDS is below:
12 Gene 3 cont. cdna and EST support for expression of exons potential alternative splicing mrna evidence Associated Predicted Exon(s) (FGSH/GM) % Match E.value Gene3:cDNA1.14/ and Accession ID Start End gb BT ; GENE ID: LOC gb BT gb BT ; GENE ID: lyce1 gb EU / lcye- W22 allele **B73 allel supports model Gene3:cDNA1.13/ and Gene3:cDNA2.14/ and Gene3:cDNA2.13/ and e^-164 Gene3:cDNA2.0/ none and none 87 6e^-110 Gene3:cDNA2.0/ none and e^-42 Gene3:cDNA2.0/ none and e^-35 Gene3:cDNA3.11/ and none Gene3:cDNA3.1/ and Gene3:cDNA3.7/ and e^-107 Gene3:cDNA3.5/ and e^-83 Gene3:cDNA3.8/ and none 100 3e^-68 Gene3:cDNA3.6/ and e^-63 Gene3:cDNA3.9/ and e^-48 Gene3:cDNA3.10/ and e^-42 Gene3:cDNA4.1/ and Gene3:cDNA4.7/ and e^-107 Gene3:cDNA4.0/ none and none 100 4e^-92 Gene3:cDNA4.5/ and e^-83 Gene3:cDNA4.8/ and none 100 3e^-68 Gene3:cDNA4.6/ and e^-63 Gene3:cDNA4.0/ none and e^-61 Gene3:cDNA4.11/ and noen 100 2e^-59 Gene3:cDNA4.9/ and e^-48 Gene3:cDNA4.10/ and e^-42 GENE ID: LOC lycopene epsilon cyclase1 [Zea mays]
13 Gene 3 cont. blastx using model 2 CDS as query -When expanded the "NADB_Rossmann superfamily" (blue bars) in all three reading frames are exactly lined up with domains of lyce1. -Model 1 similar to model 2 except NADB_Rossman domain truncated at 3 end blastx of MaizeGDB gene model Organism Arabidopsis thaliana Tomatoe Tobacco % Match E.Value 67% 0 72% 0 38% 4e^-89 blastp using MaizeGDB lyce1 protein sequence as query Conclusion: blastp of MaizgGDB lyce1 protein sequence resulted in a perfect match to Zea mays lcye1 (E.value = 0) PKc-like superfamily domain on 3 end of model sequences suggest that exon 9 and 10 of model 1 (10 and 11 of model 2) can themselves be their own gene model for a PKc_like superfamily protein. cdna and EST evidence and a blastx match exists to support the gene model suggestthat GENE ID: LOC may have been mistakenly named
14 Gene 4 4 exons Forward strand 1820 bp coding sequence Expression and exon positions supported by cdna and ESTs FGeneSH Prediction GeneMark Prediction Exon Strand Start End Size Start End Size EST evidence below FGeneSH Model Exon Start End Size (537) (1310) (1657) (3024) (738) (1570) (2547) (3493)
15 Gene 4 cont. blastx of model coding sequence results in a hit to a Pkc_like superfamily domain 10+ blastx hits with E.values ranging from 6e^-39 to 5e^-43, ~35% identity cdna hits, while strong matches, do not provide any additional information cdna hits
16 Gene 5 2 exon model Reverse strand 604 bp coding sequence FGeneSh GeneMark Exon Start Stop Start Stop (753) (1309) (797) (1869) (1309) (1869) cdna support for exon 2 However, upstream, around 141, ,000 the query matches cdna of transposons and cdna and ESTs of random gene fragments cdna and blastp (of FGeneSH predicted protein) match to HLH superfamily (cdna: 81%, 3e^-123) according to NCBI HLH is common in DNAbinding proteins such as transcription factors
17 Repetitive Region Validation Skip the regions with predicted genes Database search DNA-level Maize TE database Protein-level Swiss-Prot
18 BLASTn against maize TE databas
19 Region 3: 62729, aminocyclopropane- 1-carboxylate oxidase 1 Transposonrelated protein BLASTx against Swiss-Prot
20 Region 5: , BLASTx 30S ribosomal protein S4, Chloroplast! BLASTn (Maize TE database)
21 Region 5 cont: , BLASTx, nr
22 Summary 5 regions harboring genes predicted 1 possible ncrna coding region 2 candidate gene hits 1-aminocyclopropane-1-carboxylate oxidase 1 homolog Lycopene epsilon cyclase 1 With PKC_like superfamily slightly downstream 1 Pkc_like superfamily hit 1 region likely resulting from helitron insertion Potential expression of a transcription factor Further analyses on repetitve regions support repeatmasker results
Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More informationBME 110 Midterm Examination
BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource
More informationGenome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)
Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA
More informationAaditya Khatri. Abstract
Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationAnnotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G
Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of
More informationAnnotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.
David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six
More informationAnnotating Fosmid 14p24 of D. Virilis chromosome 4
Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome
More informationGENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS
GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS Olivier GARSMEUR & Stéphanie SIDIBE-BOCS Introduction two main concepts: Identify the different elements of the genome, (location and stucture) :
More informationHUMAN GENOME BIOINFORMATICS. Tore Samuelsson, Dec 2009
HUMAN GENOME BIOINFORMATICS Tore Samuelsson, Dec 2009 The sequenced (gray filled) and unsequenced (white) portions of the human genome. Peter F.R. Little Genome Res. 2005; 15: 1759-1766 Human genome organisation
More informationHC70AL Spring An Introduction to Bioinformatics -- Part I. Brandon Le. April 6, What is a Gene? An ordered sequence of nucleotides
APPENDIX 2 - BIOINFORMATICS (PARTS I AND II) HC70AL Spring 2004 An Introduction to Bioinformatics -- Part I By Brandon Le April 6, 2004 What is a Gene? An ordered sequence of nucleotides What are the 4
More informationGenome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does
More informationWhat is a Gene? HC70AL Spring An Introduction to Bioinformatics -- Part I. What are the 4 Nucleotides By in DNA?
APPENDIX 2 - BIOINFORMATICS (PARTS I AND II) What is a Gene? HC70AL Spring 2004 An ordered sequence of nucleotides An Introduction to Bioinformatics -- Part I What are the 4 Nucleotides By in DNA? Brandon
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationQuestion 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.
Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationIdentifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.
Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Brent Prerequisites: A Simple Introduction to NCBI BLAST Resources: The GENSCAN
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationHC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet
HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet NAME: DATE: QUESTION ONE Using primers given to you by your TA, you carried out sequencing reactions to determine the identity of the
More informationOverview of the next two hours...
Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea
More informationAnnotation of Drosophila erecta Contig 14. Kimberly Chau Dr. Laura Hoopes. Pomona College 24 February 2009
Annotation of Drosophila erecta Contig 14 Kimberly Chau Dr. Laura Hoopes Pomona College 24 February 2009 1 Table of Contents I. Overview A. Introduction..1 B. Final Gene Model.....1 II. Genes A. Initial
More informationA Prac'cal Guide to NCBI BLAST
A Prac'cal Guide to NCBI BLAST Leonardo Mariño-Ramírez NCBI, NIH Bethesda, USA June 2018 1 NCBI Search Services and Tools Entrez integrated literature and molecular databases Viewers BLink protein similarities
More informationGenome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity
Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing
More informationAnnotation of contig62 from Drosophila elegans Dot Chromosome
Abstract: Annotation of contig62 from Drosophila elegans Dot Chromosome 1 Maxwell Wang The goal of this project is to annotate the Drosophila elegans Dot chromosome contig62. Contig62 is a 32,259 bp contig
More informationIntroduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph
Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationDraft 3 Annotation of DGA06H06, Contig 1 Jeannette Wong Bio4342W 27 April 2009
Page 1 Draft 3 Annotation of DGA06H06, Contig 1 Jeannette Wong Bio4342W 27 April 2009 Page 2 Introduction: Annotation is the process of analyzing the genomic sequence of an organism. Besides identifying
More informationBIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology
BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get
More informationHost : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama
Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor
More informationGenomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010
Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomics is a new and expanding field with an increasing impact
More informationAnnotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University
Annotation Walkthrough Workshop NAME: BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University A Simple Annotation Exercise Adapted from: Alexis Nagengast,
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationAnnotation of a Drosophila Gene
Annotation of a Drosophila Gene Wilson Leung Last Update: 12/30/2018 Prerequisites Lecture: Annotation of Drosophila Lecture: RNA-Seq Primer BLAST Walkthrough: An Introduction to NCBI BLAST Resources FlyBase:
More informationOutline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018
Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationDrosophila ficusphila F element
5/2/2016 CONTIG52 Drosophila ficusphila F element Vahag Kechejian BIO434W Abstract Contig52 is a 35,000 bp region located on the F element of Drosophila ficusphila. Genscan predicts six features in the
More informationGenome annotation & EST
Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary
More informationAnnotation. Repeated sequences
Annotation Repeated sequences Premier tool for finding repeated sequences is Repeatmasker. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protoc.
More informationExercise I, Sequence Analysis
Exercise I, Sequence Analysis atgcacttgagcagggaagaaatccacaaggactcaccagtctcctggtctgcagagaagacagaatcaacatgagcacagcaggaaaa gtaatcaaatgcaaagcagctgtgctatgggagttaaagaaacccttttccattgaggaggtggaggttgcacctcctaaggcccatgaagt
More informationNCBI Molecular Biology Resources
NCBI Molecular Biology Resources Part 2: Using NCBI BLAST December 2009 Using BLAST Basics of using NCBI BLAST Using the new Interface Improved organism and filter options New Services Primer BLAST Align
More informationWheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline
Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline Xi Wang Bioinformatics Scientist Computational Life Science Page 1 Bayer 4:3 Template 2010 March 2016 17/01/2017
More informationHC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011!
HC70AL Spring 2011! An Introduction to Bioinformatics! By!! Brandon Le! April 7, 2011! Outline 1. Review of Dideoxy Sequencing 2. Obtaining and Processing DNA Sequences 3. What is a Gene? 4. Sequence Analysis
More informationMAKER: An easy to use genome annotation pipeline. Carson Holt Yandell Lab Department of Human Genetics University of Utah
MAKER: An easy to use genome annotation pipeline Carson Holt Yandell Lab Department of Human Genetics University of Utah Introduction to Genome Annotation What annotations are Importance of genome annotations
More informationMatch the Hash Scores
Sort the hash scores of the database sequence February 22, 2001 1 Match the Hash Scores February 22, 2001 2 Lookup method for finding an alignment position 1 2 3 4 5 6 7 8 9 10 11 protein 1 n c s p t a.....
More informationWorksheet for Bioinformatics
Worksheet for Bioinformatics ACTIVITY: Learn to use biological databases and sequence analysis tools Exercise 1 Biological Databases Objective: To use public biological databases to search for latest research
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationFUNCTIONAL BIOINFORMATICS
Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.
More informationGenome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation
Genome Annotation Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America May 27th, 2015 Outline Genome Annotation 1 Repeat Annotation 2 Repeat
More informationLab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)
Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009) Prerequisites: BLAST Exercise: An In-Depth Introduction to NCBI BLAST Familiarity
More informationSmall Exon Finder User Guide
Small Exon Finder User Guide Author Wilson Leung wleung@wustl.edu Document History Initial Draft 01/09/2011 First Revision 08/03/2014 Current Version 12/29/2015 Table of Contents Author... 1 Document History...
More informationAnnotation of Contig8 Sakura Oyama Dr. Elgin, Dr. Shaffer, Dr. Bednarski Bio 434W May 2, 2016
Annotation of Contig8 Sakura Oyama Dr. Elgin, Dr. Shaffer, Dr. Bednarski Bio 434W May 2, 2016 Abstract Contig8, a 45 kb region of the fourth chromosome of Drosophila ficusphila, was annotated using the
More informationBLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.
BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of
More informationFINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)
FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.
More informationChimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R.
Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R. Brent Prerequisites: BLAST exercise: Detecting and Interpreting Genetic
More informationAgenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018
Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson
More informationAnnotating the D. virilis Fourth Chromosome: Fosmid 99M21
Sonal Singhal 3 May 2006 Bio 4342W Annotating the D. virilis Fourth Chromosome: Fosmid 99M21 Abstract In this project, I annotated a chunk of the D. virilis fourth chromosome (fosmid 99M21) by considering
More informationDownload the Lectin sequence output from
Computer Analysis of DNA and Protein Sequences Over the Internet Part I. IN CLASS Download the Lectin sequence output from http://stan.cropsci.uiuc.edu/courses/cpsc265/ Open these in BioEdit (free software).
More informationLast Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationRiceGAAS: an automated annotation system and database for rice genome sequence
98 102 Nucleic Acids Research, 2002, Vol. 30, No. 1 2002 Oxford University Press RiceGAAS: an automated annotation system and database for rice genome sequence Katsumi Sakata*, Yoshiaki Nagamura, Hisataka
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationAdditional file 2. Sequence origin GC% URL. Sequence type Number of sequences
Additional file 2 Table S1 Summary of sorghum transcript and genomic data Sequence origin Sequence type Number of sequences GC% URL Genome Chromosomes 10 41.6 http://www.phytozome.net/sorghum Super scaffold
More informationFinding Genes, Building Search Strategies and Visiting a Gene Page
Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use
More informationFinding Genes, Building Search Strategies and Visiting a Gene Page
Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use
More informationGenomics and Database Mining (HCS 604.3) April 2005
Genomics and Database Mining (HCS 604.3) April 2005 David M. Francis OARDC 1680 Madison Ave Wooster, OH 44691 e-mail: francis.77@osu.edu Introduction: Computers have changed the way biologists go about
More informationOutline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation
Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project
More informationWSSP-10 Chapter 9 Determine ORF and BLASTP
WSSP-10 Chapter 9 Determine ORF and BLASTP Steps and terms used in protein expression 1 st ATG in mrna p 9-1 Cloning the cdna library p 9-1 Possible reading frames p 9-2 Possible types of clones in the
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationA tutorial introduction into the MIPS PlantsDB barley&wheat database instances
transplant 2 nd user training workshop Poznan, Poland, June, 27 th, 2013 A tutorial introduction into the MIPS PlantsDB barley&wheat database instances TUTORIAL ANSWERS Please direct any questions related
More informationFiles for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]
BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web
More informationSequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro
Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationBIOINFORMATICS AN OVERVIEW
BIOINFORMATICS AN OVERVIEW T.R. Sharma Genoinformatics Lab, National Research Centre on Plant Biotechnology I.A.R.I, New Delhi 110012 trsharma@nrcpb.org Introduction Bioinformatics is the computational
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationHomework 4. Due in class, Wednesday, November 10, 2004
1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors
More informationGENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR. Training course in Bioinformatics applied to Musa genome November 2013
GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS Olivier GARSMEUR Training course in Bioinformatics applied to Musa genome 18-22 November 2013 Introduction two main concepts: Identify the different
More informationThe TriAnnot Automated Annotation Pipeline: Making Sense the Output Files and Information - a Case Study W422 V3.5 3:45 4:05 P.
The TriAnnot Automated Annotation Pipeline: Making Sense of the Output Files and Information - a Case Study W422 V3.5 3:45 4:05 Introduction P. Leroy 4:05 4:25 New Interface N. Guilhot Download New GBrowse
More informationTo investigate the heredity of the WFP gene, we selected plants that were homozygous
Supplementary information Supplementary Note ST-12 WFP allele is semi-dominant To investigate the heredity of the WFP gene, we selected plants that were homozygous for chromosome 1 of Nipponbare and heterozygous
More informationBiology 4100 Minor Assignment 1 January 19, 2007
Biology 4100 Minor Assignment 1 January 19, 2007 This assignment is due in class on February 6, 2007. It is worth 7.5% of your final mark for this course. Your assignment must be typed double-spaced on
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More information9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3
cdna libraries, EST clusters, gene prediction and functional annotation Biosciences 741: Genomics Fall, 2013 Week 3 1 2 3 4 5 6 Figure 2.14 Relationship between gene structure, cdna, and EST sequences
More informationWhy Use BLAST? David Form - August 15,
Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use
More informationSupplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly
Supplementary Tables Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Library Read length Raw data Filtered data insert size (bp) * Total Sequence depth Total Sequence
More informationGenomic region (ENCODE) Gene definitions
DNA From genes to proteins Bioinformatics Methods RNA PROMOTER ELEMENTS TRANSCRIPTION Iosif Vaisman mrna SPLICE SITES SPLICING Email: ivaisman@gmu.edu START CODON STOP CODON TRANSLATION PROTEIN From genes
More informationMicroarray Ordering Guide
Catalog Microarray Ordering Guide Gene Expression Microarrays G4112F 014850 4 5 Whole Human Genome Microarray Kit, 4x44K 41,000+ unique genes and transcripts represented, all with public domain annotations.
More informationAssessing De-Novo Transcriptome Assemblies
Assessing De-Novo Transcriptome Assemblies Shawn T. O Neil Center for Genome Research and Biocomputing Oregon State University Scott J. Emrich University of Notre Dame 100K Contigs, Perfect 1M Contigs,
More information