Genomics I. Organization of the Genome
|
|
- Russell Hensley
- 6 years ago
- Views:
Transcription
1 Genomics I Organization of the Genome
2 Outline Organization of genome Genomes, chromosomes, genes, exons, introns, promoters, enhancers, etc. Databases Why do we need them? How do we access them? What can they do for us? Basic principles of Bioinformatics
3 What is a genome? Definition the complete set of genetic material present in the cells of an organism The genetic material is composed of DNA Base pairing + base stacking double helix
4 Genome Sizes and Phylogeny 0.5 to 7 Mbp 10 Mbp to 1000 Gbp
5 The Human Genome February 2001 Considered a crowning achievement blueprint of life Yet, many questions regarding fidelity, organization (e.g., how many genes?)
6 The Human Genome Project
7 What is the Human Genome Project? Completed in 2003, the Human Genome Project (HGP) was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China, and others. Goals identify the approximate 20,000-25,000 genes in human DNA determine the sequences of the 3 billion bases that make up human DNA store this information in databases develop tools for data analysis transfer related technologies to the private sector, and address the ethical, legal, and social issues that arise from genome research
8 Why is the Department of Energy involved? -after atomic bombs were dropped during War War II, Congress told DOE to conduct studies to understand the biological and health effects of radiation and chemical by-products of all energy production -best way to study these effects is at the DNA level
9 Whose genome is being sequenced? the first reference genome is a composite genome from several different people generated from primary samples taken from numerous anonymous donors across racial and ethnic groups
10 Benefits of HGP Research improvements in medicine microbial genome research for fuel and environmental cleanup DNA forensics improved agriculture and livestock better understanding of evolution and human migration more accurate risk assessment
11 Ethical, Legal, and Social Implications of HGP Research fairness in the use of genetic information privacy and confidentiality psychological impact and stigmatization genetic testing reproductive issues education, standards, and quality control commercialization conceptual and philosophical implications
12 For More Information about HGP Human Genome Project Information Website
13 Basic numbers in Human Genome 3x10 9 bp ~30,000 genes 23 x 2 = 46 chromosomes All from 4 bases (A,C,G,T)
14 Chromosomes a single large macromolecule of DNA, and is the basic 'unit' of DNA in a cell. It is a very long, continuous piece of DNA (a single DNA molecule), which contains many genes, regulatory elements and other intervening nucleotide sequences. Supercontig Rat Chromosome 13 ( PreceedContigs = ) Start End Start End 1 NW_ ,234, ,234,043 Gap 1 50,000 19,234,044 19,284,043 2 NW_ ,093,222 19,284,044 30,377,265 Gap 1 50,000 30,377,266 30,427,265 3 NW_ ,305,237 30,427,266 32,732,502 Gap 1 50,000 32,732,503 32,782,502 4 NW_ ,069,318 32,782,503 39,851,820 Gap 1 50,000 39,851,821 39,901,820 5 NW_ ,889,800 39,901,821 44,791,620 Gap 1 50,000 44,791,621 44,841,620 6 NW_ ,278,911 44,841,621 49,120,531 Gap 1 50,000 49,120,532 49,170,531 7 NW_ ,820,895 49,170,532 51,991,426 Gap 1 50,000 51,991,427 52,041,426 8 NW_ ,884,033 52,041,427 68,925,459 Gap 1 50,000 68,925,460 68,975,459 9 NW_ ,699,042 68,975,460 82,674,501 Gap 1 50,000 82,674,502 82,724, NW_ ,573,714 82,724,502 95,298,215 Gap 1 50,000 95,298,216 95,348, NW_ ,599,125 95,348, ,947,340 Gap 1 50, ,947, ,997, NW_ , ,997, ,239,764 Gap 1 50, ,239, ,289, NW_ , ,289, ,243,944 Gap 1 50, ,243, ,293, NW_ , ,293, ,965,548 Gap 1 50, ,965, ,015, NW_ ,333, ,015, ,348,958 + Plus strand ,348,958 (only an issue if you are building a database)
15 Contig assembly: physical map Software (Image or Bandleader) is used to identify overlapping clones with common restriction fragments and assembles them into a contig (FPC) Clone A B C D E F G * * * *
16 Sequence data assembly: Supercontig creation and gap filling (A) A supercontig is constructed by successively linking pairs of contigs that share at least two forward-reverse links. Here, three contigs are joined into one supercontig. (B) ARACHNE attempts to fill gaps by using paths of contigs. The first gap in the supercontig shown here is filled with one contig, and the second gap is filled by a path consisting of two contigs. Genome Research 12: (2002)
17 Whole genome map assembly Genome map Edit contigs and align to map. Gaps between clones can be filled with other clones, such as fosmids, or by generating PCR products from BAC clones or genomic DNA.
18 Genes The Central Dogma Metabolites Interactions DNA RNA Protein Growth rate Expression A more realistic picture
19 The Genetic Code In reality, there is more information in the genome than just amino acid sequences.
20 The classic molecular human disease: Sickle cell, HbS Normal RBC 6-8 µm; 4e12 per L Sickle cells; HbS 1949 Castle & Pauling Single nucleotide polymorphism (SNP) GAG to GUG : E6V. Treatments: antibiotics, hydroxyurea, or bone-marrow transplant (From an old version of George Church s Biophysics 101 class see further reading)
21 Routine screening for intelligence alleles Phenylketonuria is one of the commonest inherited disorders - occurring in approximately 1 in 10,000 babies born in the U. S. PKU (Phenylketonuria) gene required for F (phenyalanine) to Y (Tyrosine) conversion. Phenylalanine builds-up prevents the brain from developing properly. Progressive intellectual disability results if PKU is not treated from early infancy. Discovered by Folling in Nature/Nurture: ~100% Genetic with normal diet leading to mental retardation ~100% Environmental varying with knowledge of prevention by reduced F in the diet. All states and U.S. territories screen newborns for PKU. (some since the 1960s)
22 So where do I find the genome? NCBI: UCSC genome browser: Ensembl:
23
24
25
26
27
28
29
30
31 Organization of the Gene Exons regions of DNA that code for protein Introns intervening regions that are spliced out Transcriptional start site (TSS) - where transcription begins Promoter sequences upstream of TSS that are bound by transcription factor proteins to regulate gene expression TSS Regulatory Region PROMOTER Coding Region i n t r o n s e x o n s
32 BLAST (Basic Local Alignment Search Tool) Compares sequences of DNA for sequence similarity Can be two sequences of yours, or one of yours against known human, rat,... Genome Will give you back similarities, not just identical matches Can give you disjoint or continuous hits BLAST genome
33 What BLAST tells you BLAST reports surprising alignments Different than chance Assumptions Random sequences Constant composition Conclusions Surprising similarities imply evolutionary homology Evolutionary Homology: descent from a common ancestor Does not always imply similar function
34 Basic Local Alignment Search Tool Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations (DNA/Protein) query and database. DNA vs DNA DNA translation vs Protein Protein vs Protein Protein vs DNA translation DNA translation vs DNA translation www, standalone, and network clients
35 BLAST and BLAST-like programs Traditional BLAST (blastall) nucleotide, protein, translations blastn nucleotide query vs. nucleotide database blastpprotein query vs. protein database blastx nucleotide query vs. protein database tblastnprotein query vs. translated nucleotide database tblastx translated query vs. translated database Megablast nucleotide only Contiguous megablast Nearly identical sequences Discontiguous megablast Cross-species comparison Position Specific BLAST Programs protein only Position Specific Iterative BLAST (PSI-BLAST) Automatically generates a position specific score matrix (PSSM) Reverse PSI-BLAST (RPS-BLAST) Searches a database of PSI-BLAST PSSMs
36 GTACTGGACATGGACCCTACAGGAACGTATACGTAAG 11-mer GTACTGGACAT GTACTGGACATGGACCCTACAGGAACGT TACTGGACATG ACTGGACATGG CTGGACATGGA TGGACATGGAC TGGACATGGACCCTACAGGAACGTATAC GGACATGGACC GACATGGACCC ACATGGACCCT... Nucleotide Words WORD SIZE blastn megablast CATGGACCCTACAGGAACGTATACGTAA... Make a lookup table of words Def Query Min. 7 12
37 Query: Make a lookup table of words Protein Words GTQITVEDLFYNIATRRKALKN GTQ TQI QIT ITV Word size = 3 (default) TVE VED EDL DLF Word size can only be 2 or 3 Neighborhood Words LTV, MTV, ISV, LSV, etc....
38 Minimum Requirements for a Hit ATCGCCATGCTTAATTGGGCTT CATGCTTAATT exact word match one match Nucleotide BLAST requires one exact match Protein BLAST requires two neighboring matches within 40 aa GTQITVEDLFYNI SEI YYN neighborhood words two matches
39 An alignment that BLAST can t find 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG 61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT 61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT 121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC 121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC
40 Megablast: NCBI s Genome Annotator Long alignments for similar DNA sequences Concatenation of query sequences Faster than blastn Contiguous Megablast exact word match Word size 28 Discontiguous Megablast initial word hit with mismatches cross-species comparison
41 Templates for Discontiguous Words W = 11, t = 16, coding: W = 11, t = 16, non-coding: W = 12, t = 16, coding: W = 12, t = 16, non-coding: W = 11, t = 18, coding: W = 11, t = 18, non-coding: W = 12, t = 18, coding: W = 12, t = 18, non-coding: W = 11, t = 21, coding: W = 11, t = 21, non-coding: W = 12, t = 21, coding: W = 12, t = 21, non-coding: W = word size; # matches in template t = template length (window size within which the word match is evaluated) Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics March, 2002; 18(3):440-5
42 Local Alignment Statistics High scores of local alignments between two random sequences follow the Extreme Value Distribution Expect Value E = number of database hits you expect to find by chance size of database Alignments your score expected number of random hits E = Kmne -λs or E = mn2 -S K = scale for search space λ = scale for scoring system S = bitscore = (λs - lnk)/ln2 Score (applies to ungapped alignments)
43 Scoring Systems Position Independent Matrices Nucleic Acids identity matrix Proteins PAM Matrices (Percent Accepted Mutation) Implicit model of evolution Higher PAM number all calculated from PAM1 PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices) Empirically determined from alignment of conserved blocks Each includes information up to a certain level of identity BLOSUM62 widely used Position Specific Score Matrices (PSSMs( PSSMs) PSI and RPS BLAST
44 BLOSUM62 A 4 R -1 5 N D Common amino acids have low weights C Q E G H I L Rare amino acids have high weights K M F P S T W Y V X Positive -1-2 for -1 more -1-1 likely -1-1 substitutions A R N D C Q E G H I L K M F P S T W Y V X Negative for less likely substitutions
45 Position Specific Substitution Rates Typical serine Typical serine Active site serine Active site serine
46 Position Specific Score Matrix (PSSM) A R N D C Q E G H I L K M F P S T W Y V 206 D G V I S S C N Serine scored -6-4 differently G in these two -7 positions D S G G P Active -5-6 site -5 nucleophile L N C Q A
47 Gapped Alignments Gapping provides more biologically realistic alignments Gapped BLAST parameters must be simulated Affine gap costs = -(a+bk) a = gap open penalty b = gap extend penalty A gap of length 1 receives the score -(a+b)
48 Scores V D S C Y V E T L C F BLOSUM PAM
49
50
51
52
53
54
55 Becker et al., Nature, 1998 Position Weight Matrices
56 PWMs (continued)
57 Formulae used in searching DNA sequences
58
59
60 Inter-species Comparison Albumin gene promoters obtained from rat, human and mouse genomes using Promoser Aligned using BLAST: conserved regions (hu vs. mu/rat) span from -250 to +50 relative to TSS RAT RAT MOUSE HUMAN Regulatory elements obtained using Possum Retained 200 bp upstream from TSS
61
62
63
64 Phylogenetic footprinting
65 Further Reading George Church s Computational Biology (Biophysics 101) course Your Molecular Cell Biol. text!
BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More informationLecture 2: Biology Basics Continued
Lecture 2: Biology Basics Continued Central Dogma DNA: The Code of Life The structure and the four genomic letters code for all living organisms Adenine, Guanine, Thymine, and Cytosine which pair A-T and
More informationBLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.
BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of
More informationThe String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationComparative Bioinformatics. BSCI348S Fall 2003 Midterm 1
BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationEvolutionary Genetics. LV Lecture with exercises 6KP
Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline Central Dogma of Molecular
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationHands-On Four Investigating Inherited Diseases
Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG
Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationThe Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot
The Genetic Code and Transcription Chapter 12 Honors Genetics Ms. Susan Chabot TRANSCRIPTION Copy SAME language DNA to RNA Nucleic Acid to Nucleic Acid TRANSLATION Copy DIFFERENT language RNA to Amino
More informationO C. 5 th C. 3 rd C. the national health museum
Elements of Molecular Biology Cells Cells is a basic unit of all living organisms. It stores all information to replicate itself Nucleus, chromosomes, genes, All living things are made of cells Prokaryote,
More informationTypically, to be biologically related means to share a common ancestor. In biology, we call this homologous
Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural
More informationAnnotating Fosmid 14p24 of D. Virilis chromosome 4
Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationProtein Structure Prediction. christian studer , EPFL
Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of
More informationAGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1
AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1 - Genetics: Progress from Mendel to DNA: Gregor Mendel, in the mid 19 th century provided the
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationBio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?
Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationLecture 2: Central Dogma of Molecular Biology & Intro to Programming
Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Central Dogma of Molecular Biology Proteins: workhorse molecules of biological systems Proteins are synthesized from the genetic blueprints
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationFINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)
FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationSequence Analysis Lab Protocol
Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136
More informationMODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?
MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome
More informationFrom DNA to Protein: Genotype to Phenotype
12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each
More informationDNA Structure and Analysis. Chapter 4: Background
DNA Structure and Analysis Chapter 4: Background Molecular Biology Three main disciplines of biotechnology Biochemistry Genetics Molecular Biology # Biotechnology: A Laboratory Skills Course explorer.bio-rad.com
More informationNucleic acids. How DNA works. DNA RNA Protein. DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology
Nucleic acid chemistry and basic molecular theory Nucleic acids DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology Cell cycle DNA RNA Protein Transcription Translation
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIdentification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources
Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationChapter 15 Gene Technologies and Human Applications
Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect
More informationGenome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity
Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing
More informationAn introduction to genetics and molecular biology
An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular
More informationHigher Human Biology Unit 1: Human Cells Pupils Learning Outcomes
Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes 1.1 Division and Differentiation in Human Cells I can state that cellular differentiation is the process by which a cell develops more
More informationWhy Use BLAST? David Form - August 15,
Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use
More informationOutline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018
Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT
More informationAnswer: Sequence overlap is required to align the sequenced segments relative to each other.
14 Genomes and Genomics WORKING WITH THE FIGURES 1. Based on Figure 14-2, why must the DNA fragments sequenced overlap in order to obtain a genome sequence? Answer: Sequence overlap is required to align
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationProtein Synthesis
HEBISD Student Expectations: Identify that RNA Is a nucleic acid with a single strand of nucleotides Contains the 5-carbon sugar ribose Contains the nitrogen bases A, G, C and U instead of T. The U is
More informationGene Regulation & Mutation 8.6,8.7
Gene Regulation & Mutation 8.6,8.7 Eukaryotic Gene Regulation Transcription factors: ensure proteins are made at right time and in right amounts. One type forms complexes that guide & stabilize binding
More informationIntroduction to Molecular Biology
Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve
More informationYear III Pharm.D Dr. V. Chitra
Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationAlignment to a database. November 3, 2016
Alignment to a database November 3, 2016 How do you create a database? 1982 GenBank (at LANL, 2000 sequences) 1988 A way to search GenBank (FASTA) Genome Project 1982 GenBank (at LANL, 2000 sequences)
More informationPV92 PCR Bio Informatics
Purpose of PCR Chromosome 16 PV92 PV92 PCR Bio Informatics Alu insert, PV92 locus, chromosome 16 Introduce the polymerase chain reaction (PCR) technique Apply PCR to population genetics Directly measure
More informationAnnotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More informationThe common structure of a DNA nucleotide. Hewitt
GENETICS Unless otherwise noted* the artwork and photographs in this slide show are original and by Burt Carter. Permission is granted to use them for non-commercial, non-profit educational purposes provided
More informationPRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5
Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
1 CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, PhD Computer Science, Kennesaw State University 2 Genetics The discovery of
More informationModern BLAST Programs
Modern BLAST Programs Jian Ma and Louxin Zhang Abstract The Basic Local Alignment Search Tool (BLAST) is arguably the most widely used program in bioinformatics. By sacrificing sensitivity for speed, it
More informationZoology Evolution and Gene Frequencies
Zoology Evolution and Gene Frequencies I. any change in the frequency of alleles (and resulting phenotypes) in a population. A. Individuals show genetic variation, but express the genes they have inherited.
More informationBIOINFORMATICS IN BIOCHEMISTRY
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses on the analysis of molecular sequences (DNA, RNA, and
More informationBio 101 Sample questions: Chapter 10
Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information
More informationWorksheet for Bioinformatics
Worksheet for Bioinformatics ACTIVITY: Learn to use biological databases and sequence analysis tools Exercise 1 Biological Databases Objective: To use public biological databases to search for latest research
More informationMolecular Biology Primer. CptS 580, Computational Genomics, Spring 09
Molecular Biology Primer pts 580, omputational enomics, Spring 09 Starting 19 th century What do we know of cellular biology? ell as a fundamental building block 1850s+: ``DNA was discovered by Friedrich
More informationTEKS 5C describe the roles of DNA, ribonucleic acid (RNA), and environmental factors in cell differentiation
TEKS 5C describe the roles of DNA, ribonucleic acid (RNA), and environmental factors in cell differentiation 1. Unicellular organisms carry out all the necessary life processes in one cell. In multicellular
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationGenes and Gene Technology
CHAPTER 7 DIRECTED READING WORKSHEET Genes and Gene Technology As you read Chapter 7, which begins on page 150 of your textbook, answer the following questions. What If...? (p. 150) 1. How could DNA be
More informationBundle 5 Test Review
Bundle 5 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? _Nucleic
More informationLecture #1. Introduction to microarray technology
Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationChapter 12. DNA TRANSCRIPTION and TRANSLATION
Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making
More informationComputational Molecular Biology. Lecture Notes. by A.P. Gultyaev
Computational Molecular Biology Lecture Notes by A.P. Gultyaev Leiden Institute of Applied Computer Science (LIACS) Leiden University January 2017 1 Contents Introduction... 3 1. Sequence databases...
More informationGenes and human health - the science and ethics
Deoxyribonucleic acid (DNA) - why is it so important? Genes and human health - the science and ethics DNA is essential to all living organisms, from bacteria to man, as it contains a code which specifies
More information4.1. Genetics as a Tool in Anthropology
4.1. Genetics as a Tool in Anthropology Each biological system and every human being is defined by its genetic material. The genetic material is stored in the cells of the body, mainly in the nucleus of
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationPROTEIN SYNTHESIS. copyright cmassengale
PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other
More informationPROTEIN SYNTHESIS. copyright cmassengale
PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other
More informationHello! Outline. Cell Biology: RNA and Protein synthesis. In all living cells, DNA molecules are the storehouses of information. 6.
Cell Biology: RNA and Protein synthesis In all living cells, DNA molecules are the storehouses of information Hello! Outline u 1. Key concepts u 2. Central Dogma u 3. RNA Types u 4. RNA (Ribonucleic Acid)
More informationMake the protein through the genetic dogma process.
Make the protein through the genetic dogma process. Coding Strand 5 AGCAATCATGGATTGGGTACATTTGTAACTGT 3 Template Strand mrna Protein Complete the table. DNA strand DNA s strand G mrna A C U G T A T Amino
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationUnit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total)
Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Chapter 16 The Molecular Basis of Inheritance Unit 6: Molecular Genetics
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationSAMPLE LITERATURE Please refer to included weblink for correct version.
Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel
More informationLECTURE 12: INSIGHTS FROM GENOME SEQUENCING
LECTURE 12: INSIGHTS FROM GENOME SEQUENCING Read Chapter 10 (p366-375) DOE s Genomics and its implications (link at course website) STS (p359), SNP, SSR (integrates the following) Linkage, Physical, Sequence
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More information