Introduction to 'Omics and Bioinformatics

Similar documents
Introduction to Bioinformatics

Genome Sequence Assembly

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

DNA and RNA Structure. Unit 7 Lesson 1

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Genetics 101. Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site

Two Mark question and Answers

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Introduction to BIOINFORMATICS

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

21.5 The "Omics" Revolution Has Created a New Era of Biological Research

Introduction to Computational Genomics

Videos. Lesson Overview. Fermentation

Reading for lecture 2

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

CSC 121 Computers and Scientific Thinking

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

Concepts of Genetics, 10e (Klug/Cummings/Spencer/Palladino) Chapter 1 Introduction to Genetics

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

DNA and RNA Structure Guided Notes

Genomes DNA Genes to Proteins. The human genome is a multi-volume instruction manual

Polymers. DNA is a polymer of nucleotides. What polymers (macromolecules) have we met so far? Proteins are polymers of amino acids

Computers in Biology and Bioinformatics

Molecular Cell Biology - Problem Drill 06: Genes and Chromosomes

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Multiple choice questions (numbers in brackets indicate the number of correct answers)

BLASTing through the kingdom of life

Protein Synthesis: From Gene RNA Protein Trait

Algorithms in Bioinformatics

What is necessary for life?

Computational Genomics

DNA, RNA & Proteins Chapter 13

Chapter 2: Access to Information

CHAPTER 21 LECTURE SLIDES

Introduction to Genome Biology

Protein Synthesis and Processing A New Language

O C. 5 th C. 3 rd C. the national health museum

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Introduction to BIOINFORMATICS

DNA and RNA. Chapter 12

Introduction to Bioinformatics

Reinforcement. Cells and Life CHAPTER 1 LESSON 1

ELE4120 Bioinformatics. Tutorial 5

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

What is necessary for life?

Introduction to Molecular Biology

Chapter 15 THE HUMAN GENOME PROJECT AND GENOMICS

BIOLOGY 111. CHAPTER 6: DNA: The Molecule of Life

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools

Molecular Cell Biology - Problem Drill 01: Introduction to Molecular Cell Biology

Introduction to Bioinformatics

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

The Major Function Of Rna Is To Carry Out The Genetic Instructions For Protein Synthesis

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

GREG GIBSON SPENCER V. MUSE

Lesson Overview. Fermentation 13.1 RNA

Resources. How to Use This Presentation. Chapter 10. Objectives. Table of Contents. Griffith s Discovery of Transformation. Griffith s Experiments

Introduction to Bioinformatics

Human genome sequence February, 2001

Introduction to Algorithms in Computational Biology Lecture 1

Gene is the basic physical and functional unit of heredity. A Gene, in molecular terms,

Bioinformatics Course AA 2017/2018 Tutorial 2

Computational gene finding

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics and Gene Expression Technology

Developmental Biology BY1101 P. Murphy

NOTES Gene Expression ACP Biology, NNHS

The study of the structure, function, and interaction of cellular proteins is called. A) bioinformatics B) haplotypics C) genomics D) proteomics

Kickstart Biology. Year 11 and Year 12

Which diagram represents a DNA nucleotide? A) B) C) D)

الحمد هلل رب العالميه الذي هداوا لهذا وما كىا لىهتدي لىال أن هداوا اهلل والصالة والسالم على أشزف األوبياء. 222Cell Biolgy 1

Basic Bioinformatics: Homology, Sequence Alignment,

THE COMPONENTS & STRUCTURE OF DNA

Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication.

Name Per AP: CHAPTER 27: PROKARYOTES (Bacteria) p559,

STUDY GUIDE SECTION 10-1 Discovery of DNA

DNA REPLICATION & BIOTECHNOLOGY Biology Study Review

DNA, Genes and Chromosomes. Vocabulary

Machine Learning. HMM applications in computational biology

The Chemistry of Genes

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

Sequence Databases and database scanning

Motivation From Protein to Gene

1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below.

Gene Expression Transcription

Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site.

2018 Midterm Exam Review KEY

Nucleic Acids. OpenStax College. 1 DNA and RNA

TEKS 5C describe the roles of DNA, ribonucleic acid (RNA), and environmental factors in cell differentiation

Winter Quarter Midterm Exam

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

B I O I N F O R M A T I C S

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

Advanced Algorithms and Models for Computational Biology

Gene Identification in silico

Transcription:

Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize

Bioinformatics makes many current biological and biomedical studies possible

Cells are the fundamental units of life, with many common abilities Reproduce Carry out chemical reactions Grow Abilities Respond to stimuli (adaptation) Process information All cells share certain structural features and carry out complicated processes in basically the same way

Despite the similarity of all cells, they vary greatly in size, shape and function What makes them different?

There are two basic types of cells: prokaryotes and eukaryotes Eukaryote Prokaryote 10-100 μm 1 2 μm Nucleus Nucleoid Membrane-bound organelles No membrane-bound organelles Multicellular or unicellular Mostly unicellular

Humans start as a single eukaryotic cell and end up with 100 trillion cells 100 trillion = 100 * 101 2 = 100,000,000,000,000 If you lined up all of your cells in a row, they would stretch across the continental U.S. over 1 million times

All cells use the same four types of macromolecules Proteins Lipids Nucleic acids Carbohydrates We will focus on proteins and nucleic acids

Proteins give a cell structure and perform most cellular tasks Motor Regulator Common Jobs Sensor Enzyme

A protein is a linear sequence of amino acids, which uniquely defines it Folded protein Amino acid sequence LGLCLAAPRKSVRWCTISPAEAAKCAKFQRNMKKVRGPSVSCIRKTSSFECIQAIAANKA DAVTLDGGLVYEAGLHPYKLRPVAAEVYQTRGKPQTRYYAVAVVKKGSGFQLNQLQGVKS CHTGLGRSAGWNIPIGTLRPYLNWTGPPEPLQKAVANFFSASCVPCADGKQYPNLCRLCA GTEADKCACSSQEPYFGYSGAFKCLENGAGDVAFVKDSTVFENLPDEADRDKYELLCPDN TRKPVDAFKECHLARVPSHAVVARSVDGREDLIWRLLHRAQEEFGRNKSSAFQLFKSTPE NKDLLFKDSALGFVRIPSQIDSGLYLGANYLTATQNLRETAAEVAARRERVVWCAVGPEE ERKCKQWSDVSNRKVACASASTTEECIALVLKGEADALNLDGGFIYVAGKCGLVPVLAEN QKSQNSNAPDCVHRPPEGYLAVAVVRKSDADLTWNSLSGKKSCHTGVGRTAAWNIPMGLL FNQTGSCKFDKFFSQSCAPGADPQSSLCALCVGNNENENKCMPNSEERYYGYTGAFRCLA EKAGDVAFVKDVTVLQNTDGKNSEPWAKDLKQEDFELLCLDGTRKPVAEAESCHLARAPN HAVVSQSDRAQHLKKVLFLQQDQFGGNGPDCPGKFCLFKSETKNLLFNDNTECLAELQGK Folding RCSSSPLLEACAFLRA TTYEQYLGSEYVTSITNLR 3D structure = function All organisms use basically the same 20 letter alphabet Typical length: 100 1,000 amino acids

An organism's proteome is the entire set of proteins that it can produce proteome = + + + N What is the estimated size of the human proteome? Phrased differently, how many unique proteins can the human body create? Answer: 2 million

The genes encoded in DNA determine which proteins can be created by a cell DNA alphabet = {A, C, T, G} AGGGTCCCCGACCTCGCTGTGGGGGC TCCTGTTTCTCTCCGCCGCGCTCTCG CTCTGGCCGACGAGTGGAGAAAGTGA GTATGTGCCCGCCGCCCGCGGCCACT GCGGGAACTTTTCCTCCGAGGGGCTG CGCCCTGTTTGCGAAACCCGAGTTGC CACCGTCGCAGCTGTCGGGCCCCCGG GCTCGGGAGCGGCGGGGTGGGGCGCA Gene 1 GSPTSLWGLLFLSAALSLWPTS GESEYVPAARGHCGNFSSEGLR PVCETRVATVAAVGPPGSGAAG WGAGR GGACGT GCAGCGACAGCACCCTGGGCCGCACC GCGTCTCCGCTGCTGGGGCTGACCGG GCAGTGGGGAGCGGGCAGCCGCCCGG CCCGCGGGGTGCGAGTTGCCCTCGAG GGGGGAGGTGCCCTGCGGAGAGGCCC CGAAGCCCCAGGACACGTCTCCAAGC GCCCGTCGCTGCCTCCCGTCGCCCAA Gene 2 QRQHPGPHRVSAAGADRAV GSGQPPGPRGASCPRGGRC PAERPRSPRTRLQAPVAAS RRPTL CGTTACG DNA Gene sequence Protein sequence Change this slide!!! DNA RNA Protein

An organism's genome is the entirety of its hereditary (genetic) information Genome = genes + non-coding regions How many genes do humans have? Answer: approximately 20,000 The genome of every cell in the human body is basically identical: they have the same genes

You have 10 times more bacterial cells than human cells in your body You have 1 quadrillion bacterial cells Approximately 3 pounds of your weight is bacteria Most of them won't make you sick The beneficial bacteria is mostly in your gut You need them to digest food E. coli is the most common one Your metagenome is the entirety of the hereditary (genetic) information of the bacteria in your body

Questions (1) What is a genome? (2) What is a transcriptome? (3) What is a proteome? (4) What is a metagenome?

How do we get different types of human cells if their genomes are the same? Typical cell Neuron Cancer cell Macrophage

The genome also specifies how, when and where to produce each kind of protein Specialization Different human cell types, such as those found in different organs, have different proteomes Liver vs. neuron Adaptation A cell responds to changes in the environment by slightly changing its proteome Liver 1 vs. liver 2 This is known as gene regulation.

End of Act 1: Questions (1) What is a cell? (2) What is a protein? (3) What is a gene? (4) How is a protein created from a gene? (5) What is a genome? (6) What is a a proteome? (7) What are the letters (nucleotides) in the DNA alphabet? (8) How many letters (amino acids) are in the protein alphabet?

'Omics is large-scale, high-throughput molecular biology *omics Genomics Studies the entire set of a certain type of molecule in an organism Studies when, where and how the molecule interacts, is created and is destroyed (1) Study of all the genes and non-coding regions in an organism (2) Study of an organism's genome Functional genomics Study of what, when, where and how the genome produces proteins or other biological products

There are four core 'omics discriplines, and many other subdisciplines Genome Genomics Metagenome Metagenomics Proteome Proteomics Transcriptome Transcriptomics

Since the year 2000, there has been an explosion of biological data Sequenced genomes = 5,062 Base pairs = 99,116,431,942 = 396 GB Sequences = 98,868,465 Known genes = 7,095,197 Known proteins = 523,151 This is only a small portion of 'omics data that has been generated in the past 10 years

For even a single organism, there are thousands of protein interactions

We need Bioinformatics to make sense of the bewildering amount of 'omics data Bioinformatics = + Computers Acquire 'Omics Experimental techniques for studying generating 'omics data Store Computational techniques for storing, organizing and sharing 'omics data Analyze Computational techniques for finding patterns in and making sense of 'omics data Visualize Computational techniques for visualizing 'omics data; pictures are easier to understand than numbers

End of Act 2: Questions

The human genome is fairly large, but the mouse genome is even larger The human genome is stored in 23 pairs of chromosomes 3 billion base pairs in our DNA; 3 gigabytes of storage space There are an estimated 20 25,000 protein coding genes in our genome There are approximately 2 million proteins in our proteome It would take a person a 100 years to recite the human genome, saying one nucleotide per second, 24 hours a day

Human Genome Project: sequence all of our DNA and identify all of our genes Aside from answering basic biological questions, why is this important? Answer: every disease has a genetic component Inherited Response to environment The draft sequence of the human genome was published in 2000; the complete sequence was published in 2003 Bioinformatics was, and continues to be, an essential part of the Human Genome Project. How did we do it?

Acquire: determine the sequence of the human genome It wasn't possible to sequence the whole genome at one time We had to break it up, sequence smaller pieces and then put put it back together Whole genome BAC phase = 150,000 bp Shotgun phase = 800 bp

Bioinformatics was necessary to assemble the genome sequence Bioinformatics challenge paste together the small shotgun sequences to get the entire human genome sequence ACCTTGGCCTAGGCT TAGGCTACTGGCTGGA GCTGGAATCCAGTGCC TAACTAGCTTAATCCG GTGCCCGGGTTAACTA Required the use of sophisticated assembly algorithms that ran on supercomputers

Sequence assembly game

Store: organize and share the sequence of the human genome Problem: we have too much data to store in a notebook and to search manually Bioinformatics challenge Develop efficient methods to store, organize and access the 'omics data Develop easy ways for experimental biologists to access the data Solution Created databases; The major data repositories are GenBank, EMBL and DDBJ Created web based applications to access the data A substantial portion of biological data is freely available to anyone with a web browser

Analyze: Search the genome sequence for biological features Bioinformatics challenge Identify genes Identify genes that are similar to an unknown one Solution Developed gene finding algorithms (i.e. spot the gene) Developed sequence alignment algorithms, such as BLAST Find regulatory sequences Developed pattern identification algorithms to find out how those genes are regulated Store, organize and share these feature annotations Developed databases to store information about genes

Analyze: OK, we know the genes now. What do they make? The draft sequence of the human genome was published in 2000; the complete sequence was published in 2003

Analyze: When, how and where are the genes expressed? The draft sequence of the human genome was published in 2000; the complete sequence was published in 2003

Visualize: We need tools to visualize the information in our genome.

Network screening can uncover novel, context-dependent interactions Developed a method to score (rate) a subnetwork based upon gene differential expression Developed a method to find high-scoring subnetworks Identified active subnetworks in a subset of the yeast interaction network when GAL80 was deleted Identified active subnetworks in the full yeast interaction network when perturbed twenty different ways. Questions?