Bioinformatics overview
|
|
- Oswin McCoy
- 6 years ago
- Views:
Transcription
1 Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance computing platforms Oswaldo Trelles, PhD University of Malaga In this section we survery the bioinformatics application domain and the typical sources of data in the field
2 Definition Computer sciences, statistics, physics, chemistry,... Information Technologies Bioinformatics: The application of computational techniques to the management and analysis of biological data Molecular, clinic, population, environmental,... Acquisition, storage, retrieval, transmission, processing...
3 The domain of the data
4 Data production Huge data production at different levels Atoms Proteins Interactions Metabolic pathways Cells Organs Organisms Populations
5 Diversity of types of data > E bp DNA linear gaattctaac ggtcccgaaa ctctgtgcgg tgctgaactg gttgacgctc tgcagtttgt ttgcggtgac cgtggttttt attttaacaa acccactggt tatggttctt cttctcgtcg tgctccccag actggtattg ttgacgaatg ctgctttcgt tcttgcgacc tgcgtcgtct ggaaatgtat tgcgctcccc tgaaacccgc taaatctgct tagaagctt
6 Format heterogeneity LOCUS E bp DNA linear PAT 04-NOV-2005 DEFINITION DNA encoding human insulin-like growth factor I(IGF-I). ID E01306; SV 1; linear; unassigned DNA; PAT; SYN; 229 BP. ACCESSION E01306 AC E01306; VERSION E GI: DT 07-OCT-1997 (Rel. 52, Created) KEYWORDS JP A/1. DT 09-NOV-2005 (Rel. 85, Last updated, Version 3) SOURCE synthetic construct DE DNA encoding human insulin-like growth factor I(IGF-I). ORGANISM synthetic construct KW JP A/1. other sequences; artificial sequences. OS synthetic construct REFERENCE 1 (bases 1 to 229) OC other sequences; artificial sequences. AUTHORS Raasu,A., Toomasu,M., Berun,N. and Majiasu,U. RA Raasu A., Toomasu M., Berun N., Majiasu U.; TITLE METHOD FOR TRANSPORTING GENE PRODUCT TO MEDIUM PROPAGATING GRAM RT "METHOD FOR TRANSPORTING GENE PRODUCT TO MEDIUM PROPAGATING NEGATIVE BACTERIA GRAM JOURNAL Patent: JP A 1 20-AUG-1987; RT NEGATIVE BACTERIA"; KABIGEN AB RL Patent number JP A/1, 20-AUG COMMENT OS Artificial gene RL KABIGEN AB. OC Artificial sequence; Genes. CC OS Artificial gene OS Homo sapiens CC OC Artificial sequence; Genes. PN JP A/1 CC OS Homo sapiens PD 20-AUG-1987 CC CC strandedness: Single; CC strandedness: Single; CC CC topology: Linear; CC topology: Linear; CC CC hypothetical: No; CC hypothetical: No; CC CC anti-sense: No; CC anti-sense: No; CC FH Key Location/Qualifiers FH Key Location/Qualifiers CC FT mat_peptide FT /product='human insuline-like growth factor I CC FT CDS > FT CDS > CC FT /product="human insulin-like growth factor I" FEATURES Location/Qualifiers FH Key Location/Qualifiers source FT source /organism="synthetic construct" FT /organism="synthetic construct" /mol_type="unassigned DNA" FT /mol_type="unassigned DNA" /db_xref="taxon:32630" FT /db_xref="taxon:32630" ORIGIN SQ Sequence 229 BP; 40 A; 57 C; 55 G; 77 T; 0 other; 1 gaattctaac ggtcccgaaa ctctgtgcgg tgctgaactg gttgacgctc tgcagtttgt gaattctaac ggtcccgaaa ctctgtgcgg tgctgaactg gttgacgctc tgcagtttgt ttgcggtgac cgtggttttt attttaacaa acccactggt tatggttctt cttctcgtcg ttgcggtgac cgtggttttt attttaacaa acccactggt tatggttctt cttctcgtcg tgctccccag actggtattg ttgacgaatg ctgctttcgt tcttgcgacc tgcgtcgtct tgctccccag actggtattg ttgacgaatg ctgctttcgt tcttgcgacc tgcgtcgtct ggaaatgtat tgcgctcccc tgaaacccgc taaatctgct tagaagctt ggaaatgtat tgcgctcccc tgaaacccgc taaatctgct tagaagctt 229 // // The DNA encoding human insulin-like growth factor I(IGF-I) available at GenBank: E (search for insulin in All databases ) The same insulin (E01306) sequence at (in both text-boxes some lines has been removed)
7 Dispersion of data sources More than 1000 biological data collections Bioinformatics workflows: the usual way to work See: [1] Infobiogen: Catalog of Databases: Bioinformatics: a web-based domain
8 Types of data and applications (overview)
9 Sequencing data The long DNA chain is split in small fragments that are read using sequencing technology. Read: short sequences obtained during the sequencing process Software is used to obtain Contigs, Scaffolds, Consensus Reading in annotation data from a GFF file. Assigning aligned reads to exons and genes. Biologically intelligent interpretation of genomic data FASTA and FASTQ formats >000014_1863_0292 length=76 uaccno=fgsmdpn08etuie AATACTCAGGAATCGAACGGACTCGGGTATAGTATATGATCGGCAGCCAGCCG AACATAACAGCGGCATGAAAACC >000016_1821_0619 length=120 uaccno=fgsmdpn08ep50t GGCAAGTTTTCGGTGTCGCTAAGCCCGAGATATCGCAGCTCACCCGTGTCGGC GATTGCTGCTGTGACCGTCCCCAGTCGGTCACCCTCCGGCTGATTCTATCCTTACATCGG TCGTTTC >000021_1845_1786 length=69 uaccno=fgsmdpn08esarw ATCCGCGCGGCCGCATTGTCGACACTGCCTGCCGGCAGTGAAGGCGAGGCGCA GGTGGCCGATGCGCTG >000030_1849_0863 length=69 uaccno=fgsmdpn08esmpd ATCCGCGCGGCCGCATTGTCGACACTGCCTGCCGGCAGTGAAGGCGAGGCGCA GGTGGCCGATGCGCTG >000035_1856_0283 length=148 uaccno=fgsmdpn08es8dp GACGCCCTTTATGCACGTTTCGCTCACAGTATCCCTTAATAGCAAGATTAATA CCCTCAGTGGCCCCACTAGTAAAAACGATCTCTCGAGAACGACAGTTCAGTTC ATTGGCAATCAATTTTCGGGCCGTTTCTTACCGCCTCCTCAG
10 Assembling the puzzle From spectrograms to a sequence of letters An exhaustive and resource consuming procedure is needed to solve the assembling fragments into a longer Contigs... the sequence is coming up
11 Biological sequence data biológicas >ref NT_ : Drosophila melanogaster chromosome 2L CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGG GAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAA TGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTT AAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCG TTCTCTGTCTTATATTACCGCAAACCCAAAAAGACAATACACGACAGAGAGAGAGAGCAGCGGAGATATT TAGATTGCCTATTAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTCTATATAATGAC TGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAATCGACAATGCACGACAGAGGAAGCAGAACAGAT ATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGC CAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGATGATAATATATTC AAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGA GAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAA GACAATACACGACAGAGAGAGAGAGCAGCGGAGATATTTAGATTGCCTATTAAATATGATCGCGTATGCG AGAGTAGTGCCAACATATTGTGCTCTCTATATAATGACTGCCTCTCATTCTGTCTTATTTTACCGCAAAC CCAAATCGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATAT TATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAAC CCAAAATGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGC AACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGC CTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAGACAATACACGACAGAGAGAGAGAGCAGCGGA GATATTTAGATTGCCTATTAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTCTATAT AATGACTGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAATCGACAATGCACGACAGAGGAAGCAGA ACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGATCGCGTATGCGAGAG TAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGATGATAAT ATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGT ATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACC CAAAAAGACAATACACGACAGAGAGAGAGAGCAGCGGAGATATTTAGATTGCCTATTAAATATGATCGCG TATGCGAGAGTAGTGCCAACATATTGTGCTCTCTATATAATGACTGCCTCTCATTCTGTCTTATTTTACC GCAAACCCAAATCGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTC CCATATTATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTT GGCAACCCAAAATGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATT CATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAAT GAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAGACAATACACGACAGAGAGAGAGAGC AGCGGAGATATTTAGATTGCCTATTAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCT CTATATAATGACTGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAATCGACAATGCACGACAGAGGA AGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGATCGCGTATG CGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT GATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGA TCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCG CAAACCCAAAAAGACAATACACGACAGAGAGAGAGAGCAGCGGAGATATTTAGATTGCCTATTAAATATG ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTCTATATAATGACTGCCTCTCATTCTGTCTTAT TTTACCGCAAACCCAAATCGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTT TCTCTCCCATATTATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGA TTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAA TAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGT GCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAGACAATACACGACAGAGAGA GAGAGCAGCGGAGATATTTAGATTGCCTATTAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTG TGCTCTCTATATAATGACTGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAATCGACAATGCACGAC AGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGATCG From assembly to databases entries FASTA is the favorite format used for this type of data (without annotations) We knows the text, but the meaning needs more processing
12 Annotated Sequence databases ID 100K_RAT STANDARD; PRT; 889 AA. AC Q62671; DT 01-NOV-1997 (Rel. 35, Created) DT 01-NOV-1997 (Rel. 35, Last sequence update) DT 15-JUL-1999 (Rel. 38, Last annotation update) DE 100 KD PROTEIN (EC ). OS Rattus norvegicus (Rat). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; OC Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. RN [1] RP SEQUENCE FROM N.A. RC STRAIN=WISTAR; TISSUE=TESTIS; RX MEDLINE; RA MUELLER D., REHBEIN M., BAUMEISTER H., RICHTER D.; RT "Molecular characterization of a novel rat protein structurally RT related to poly(a) binding proteins and the 70K protein of the U1 RT small nuclear ribonucleoprotein particle (snrnp)."; RL Nucleic Acids Res. 20: (1992). RN [2] RP ERRATUM. RA MUELLER D., REHBEIN M., BAUMEISTER H., RICHTER D.; RL Nucleic Acids Res. 20: (1992). CC -!- FUNCTION: E3 UBIQUITIN-PROTEIN LIGASE WHICH ACCEPTS UBIQUITIN FROM CC AN E2 UBIQUITIN-CONJUGATING ENZYME IN THE FORM OF A THIOESTER AND CC THEN DIRECTLY TRANSFERS THE UBIQUITIN TO TARGETED SUBSTRATES (BY CC SIMILARITY). THIS PROTEIN MAY BE INVOLVED IN MATURATION AND/OR CC POST-TRANSCRIPTIONAL REGULATION OF MRNA. CC CC This SWISS-PROT entry is copyright. It is produced through... CC DR EMBL; X64411; CAA ; -. DR PFAM; PF00632; HECT; 1. DR PFAM; PF00658; PABP; 1. KW Ubiquitin conjugation; Ligase. FT DOMAIN ASP/GLU-RICH (ACIDIC). FT DOMAIN PRO-RICH. FT DOMAIN ASP/GLU-RICH (ACIDIC). FT BINDING UBIQUITIN (BY SIMILARITY). SQ SEQUENCE 889 AA; MW; DD7E6C7A CRC32; MMSARGDFLN YALSLMRSHN DEHSDVLPVL DVCSLKHVAY VFQALIYWIK AMNQQTTLDT PQLERKRTRE LLELGIDNED SEHENDDDTS QSATLNDKDD ESLPAETGQN HPFFRRSDSM VYEYVRKYAE HRMLVVAEQP LHAMRKGLLD VLPKNSLEDL TAEDFRLLVN GCGEVNVQML ISFTSFNDES GENAEKLLQF KRWFWSIVER MSMTERQDLV YFWTSSPSLP ASEEGFQPMP SITIRPPDDQ HLPTANTCIS RLYVPLYSSK QILKQKLLLA IKTKNFGFV //
Bioinformatics overview
Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance
More informationAAGTGCCACTGCATAAATGACCATGAGTGGGCACCGGTAAGGGAGGGTGATGCTATCTGGTCTGAAG. Protein 3D structure. sequence. primary. Interactions Mutations
Introduction to Databases Lecture Outline Shifra Ben-Dor Irit Orr Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationNiceProt View of Swiss-Prot: P18907
Hosted by NCSC US ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot Mirror sites: Australia Bolivia Canada China Korea Switzerland Taiwan Search Swiss-Prot/TrEMBL for horse alpha Go Clear NiceProt
More informationRedundancy at GenBank => RefSeq. RefSeq vs GenBank. Databases, cont. Genome sequencing using a shotgun approach. Sequenced eukaryotic genomes
Databases, cont. Redundancy at GenBank => RefSeq http://www.ncbi.nlm.nih.gov/books/bv.fcg i?rid=handbook RefSeq vs GenBank Many sequences are represented more than once in GenBank 2003 RefSeq collection
More informationRegulation of eukaryotic transcription:
Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:
More informationDNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences
DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationBioinformatics Course AA 2017/2018 Tutorial 2
UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it
More informationLinking the EMBL Australia Bioinformatics Resource with the Australian National Data Service
Linking the EMBL Australia Bioinformatics Resource with the Australian National Data Service JEFF CHRISTIANSEN ANDS PIERRE CHAUMEIL - QFAB DOMINIQUE GORSE QFAB MARK RAGAN IMB/UQ EMBL Australia Australia
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationNUCLEIC ACIDS. DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides.
NUCLEIC ACIDS DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides. Base Adenine Guanine Cytosine Uracil Thymine Abbreviation A G C U T DNA RNA 2
More informationTranscriptome Assembly, Functional Annotation (and a few other related thoughts)
Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types
More informationDatabases for Life Science Research. Ulf Leser
Databases for Life Science Research Ulf Leser This Lecture What this lecture is not RDBMS in ten slides Classification & Properties Some Examples Ulf Leser: Bioinformatics, Winter Semester 2010/2011 2
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationComputational Molecular Biology Intro. Alexander (Sacha) Gultyaev
Computational Molecular Biology Intro Alexander (Sacha) Gultyaev a.p.goultiaev@liacs.leidenuniv.nl Biopolymer sequences DNA: double-helical nucleic acid. Monomers: nucleotides C, A, T, G. RNA: (single-stranded)
More informationFundamentals of Bioinformatics: computation, biology, computational biology
Fundamentals of Bioinformatics: computation, biology, computational biology Vasilis J. Promponas Bioinformatics Research Laboratory Department of Biological Sciences University of Cyprus A short self-introduction
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationFollowing text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005
Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationORGANISATION AND STANDARDISATION OF INFORMATION IN SWISS-PROT AND TREMBL
13 ORGANISATION AND STANDARDISATION OF INFORMATION IN SWISS-PROT AND TREMBL Michele Magrane* and Rolf Apweiler. EMBL Outstation European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton,
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationThemes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!
Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationDiscovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies
Discovering gene regulatory control using ChIP-chip and ChIP-seq Part 1 An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk http://bit.ly/bio2links
More informationCHAPTERS , 17: Eukaryotic Genetics
CHAPTERS 14.1 14.6, 17: Eukaryotic Genetics 1. Review the levels of DNA packing within the eukaryote nucleus. Label each level. (A similar diagram is on pg 188 of your textbook.) 2. How do the coding regions
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SGBC-SLU 2016 VALIDATION Experimental Literature Manual or semi-automatic computational analysis EXPERIMENTAL Costs Needs skilled manpower
More informationIntroduction to CGE tools
Introduction to CGE tools Pimlapas Leekitcharoenphon (Shinny) Research Group of Genomic Epidemiology, DTU-Food. WHO Collaborating Centre for Antimicrobial Resistance in Foodborne Pathogens and Genomics.
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationMS bioinformatics analysis for proteomics. Protein anotations
MS bioinformatics analysis for proteomics Protein anotations UCO - Córdoba Organized by: ProteoRed, EUPA and Seprot Alberto Medina January, 23rd 2009 Summary Introduction Some issues Software: Fatigo -
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationFrietze_Figure S1. Validation of the ZNF263 antibody.
A MW(kDa) 150 GM12878 HeLaS3 HepG2 K562 30 µg Nuc. Ext. 100 75 ZNF263 B MW(kDa) 150 100 75 SN IgG IP IN 1 2 4 1 2 4 1 2 4 : µg angbody ZNF263 Frietze_Figure S1. Validation of the ZNF263 antibody. (A) Nuclear
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationUnit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression
Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression On completion of this subtopic I will be able to State the meanings of the terms genotype,
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationIntegration of data management and analysis for genome research
Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationDiscovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies
Discovering gene regulatory control using ChIP-chip and ChIP-seq An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk bit.ly/bio2_2012 The Central Dogma
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationKlinisk kemisk diagnostik BIOINFORMATICS
Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationDatabases in Bioinformatics. Molecular Databases. Molecular Databases. NCBI Databases. BINF 630: Bioinformatics Methods
Databases in Bioinformatics BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Molecular Databases Molecular Databases Nucleic acid sequences: GenBank, DNA Data Bank of Japan, EMBL
More informationearray 5.0 Create your own Custom Microarray Design
earray 5.0 Create your own Custom Microarray Design http://earray.chem.agilent.com earray 5.x Overview Session Summary Session Summary Agilent Genomics Microarray Solution earray Functional Overview Gene
More informationGenome Annotation - 2. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation - 2 Qi Sun Bioinformatics Facility Cornell University Output from Maker GFF file: Annotated gene, transcripts, and CDS FASTA file: Predicted transcript sequences Predicted protein sequences
More informationBiology From gene to protein
Biology 205 5.3.06 From gene to protein Shorthand abbreviation of part of the DNA sequence of the SRY gene >gi 17488858 ref XM_010627.4 Homo sapiens SRY (sex determining region Y chromosome) GGCATGTGAGCGGGAAGCCTAGGCTGCCAGCCGCGAGGACCGCACGGAGGAGGAGCAGG
More informationCSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa
CSE/Beng/BIMM 182: Biological Data Analysis Instructor: Vineet Bafna TA: Nitin Udpa Today We will explore the syllabus through a series of questions? Please ASK All logistical information will be given
More informationData Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis
Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationAnnotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G
Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of
More informationLecture 11. Initiation of RNA Pol II transcription. Transcription Initiation Complex
Lecture 11 *Eukaryotic Transcription Gene Organization RNA Processing 5 cap 3 polyadenylation splicing Translation Initiation of RNA Pol II transcription Consensus sequence of promoter TATA Transcription
More informationMODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?
MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome
More informationCS313 Exercise 1 Cover Page Fall 2017
CS313 Exercise 1 Cover Page Fall 2017 Due by the start of class on Monday, September 18, 2017. Name(s): In the TIME column, please estimate the time you spent on the parts of this exercise. Please try
More informationNGS Approaches to Epigenomics
I519 Introduction to Bioinformatics, 2013 NGS Approaches to Epigenomics Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Background: chromatin structure & DNA methylation Epigenomic
More informationFrom assembled genome to annotated genome
From assembled genome to annotated genome Procaryotic genomes Eucaryotic genomes Genome annotation servers (web based) 1. RAST 2. NCBI Gene prediction pipeline: Maker Function annotation pipeline: Blast2GO
More informationApplied Biosystems SOLiD 3 Plus System. RNA Application Guide
Applied Biosystems SOLiD 3 Plus System RNA Application Guide For Research Use Use Only. Not intended for any animal or human therapeutic or diagnostic use. TRADEMARKS: Trademarks of Life Technologies Corporation
More informationENCODE DCC Antibody Validation Document
ENCODE DCC Antibody Validation Document Date of Submission Name: Email: Lab Antibody Name: Target: Company/ Source: Catalog Number, database ID, laboratory Lot Number Antibody Description: Target Description:
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationThe use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015
The use of bioinformatic analysis in support of HGT from plants to microorganisms Meeting with applicants Parma, 26 November 2015 WHY WE NEED TO CONSIDER HGT IN GM PLANT RA Directive 2001/18/EC As general
More informationMCB 102 University of California, Berkeley August 11 13, Problem Set 8
MCB 102 University of California, Berkeley August 11 13, 2009 Isabelle Philipp Handout Problem Set 8 The answer key will be posted by Tuesday August 11. Try to solve the problem sets always first without
More informationReference genomes and common file formats
Reference genomes and common file formats Overview Reference genomes and GRC Fasta and FastQ (unaligned sequences) SAM/BAM (aligned sequences) Summarized genomic features BED (genomic intervals) GFF/GTF
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationM1 - Biochemistry. Nucleic Acid Structure II/Transcription I
M1 - Biochemistry Nucleic Acid Structure II/Transcription I PH Ratz, PhD (Resources: Lehninger et al., 5th ed., Chapters 8, 24 & 26) 1 Nucleic Acid Structure II/Transcription I Learning Objectives: 1.
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationThe Need for Scientific. Data Annotation. Alick K Law, Ph.D., M.B.A. Marketing Manager IBM Life Sciences.
The Need for Scientific Data Annotation Alick K Law, Ph.D., M.B.A. Marketing Manager IBM Life Sciences alaw@us.ibm.com Cross disciplinary research approach requires organizations to address diverse needs
More informationCRISPR GENOMIC SERVICES PRODUCT CATALOG
CRISPR GENOMIC SERVICES PRODUCT CATALOG DESIGN BUILD ANALYZE The experts at Desktop Genetics can help you design, prepare and manufacture all of the components needed for your CRISPR screen. We provide
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationAn Introduction to the package geno2proteo
An Introduction to the package geno2proteo Yaoyong Li January 24, 2018 Contents 1 Introduction 1 2 The data files needed by the package geno2proteo 2 3 The main functions of the package 3 1 Introduction
More informationQuick reference guide
Quick reference guide Our Invitrogen GeneArt CRISPR Search and Design Tool allows you to search our database of >600,000 predesigned CRISPR guide RNA (grna) sequences or analyze your sequence of interest
More informationMODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)
MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC) Lesson Plan: Title JAMIE SIDERS, MEG LAAKSO & WILSON LEUNG Identifying transcription start sites for Peaked promoters using chromatin landscape,
More informationRetroelement-guided protein diversification abounds in vast lineages of Bacteria and Archaea
In the format provided by the authors and unedited. SUPPLEMENTARY INFORMATION VOLUME: 2 ARTICLE NUMBER: 17045 Retroelement-guided protein diversification abounds in vast lineages of Bacteria and Archaea
More informationKDD Cup Task 1 Information Extraction from Biomedical Articles
Information Extraction from Biomedical Articles Sub title here Sub title here System Description June / July 2002 The Task: Curate or Not-Curate Build a system for automatic analysis of scientific papers
More informationSynthetic Biology. Sustainable Energy. Therapeutics Industrial Enzymes. Agriculture. Accelerating Discoveries, Expanding Possibilities. Design.
Synthetic Biology Accelerating Discoveries, Expanding Possibilities Sustainable Energy Therapeutics Industrial Enzymes Agriculture Design Build Generate Solutions to Advance Synthetic Biology Research
More informationIntroduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il TA: Oleg Rokhlenko Lecture 1 Introduction to Bioinformatics Introduction to Bioinformatics What is Bioinformatics?
More informationExercises (Multiple sequence alignment, profile search)
Exercises (Multiple sequence alignment, profile search) 8. Using Clustal Omega program, available among the tools at the EBI website (http://www.ebi.ac.uk/tools/msa/clustalo/), calculate a multiple alignment
More informationWill discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice
Will discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice integration - web system (V) 1 Touring the Protein Space (outline) 1. Protein Sequence - how rich? How
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationDigital information cycle. Database. Database. BINF 630: Bioinformatics Methods
Digital information cycle BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Creation and capture Storage and management Rights management Search and access Distribution Electronic
More informationProcessing Very Large Genomic Files
Processing Very Large Genomic Files Michael Robinson School of Computer Information Science Florida International University Miami, Florida, USA michael.robinson@cs.fiu.edu Abstract We have developed a
More informationTranscription in Eukaryotes
Transcription in Eukaryotes Biology I Hayder A Giha Transcription Transcription is a DNA-directed synthesis of RNA, which is the first step in gene expression. Gene expression, is transformation of the
More informationTheoretische Biologie
Theoretische Biologie Prof. Computational EvoDevo, University of Leipzig SS 2017 Two Gene Concepts in Comparison Gerstein-Snyder gene definition Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel
More informationNovel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.
Novel methods for RNA and DNA- Seq analysis using SMART Technology Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc. Agenda Enabling Single Cell RNA-Seq using SMART Technology SMART
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More information2014 Pearson Education, Inc. CH 8: Recombinant DNA Technology
CH 8: Recombinant DNA Technology Biotechnology the use of microorganisms to make practical products Recombinant DNA = DNA from 2 different sources What is Recombinant DNA Technology? modifying genomes
More information