INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

Similar documents
UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Investigating Inherited Diseases

Chapter 2: Access to Information

Hands-On Four Investigating Inherited Diseases

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers

ab initio and Evidence-Based Gene Finding

Bioinformatics for Proteomics. Ann Loraine

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

user s guide Question 1

user s guide Question 3

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

The University of California, Santa Cruz (UCSC) Genome Browser

Chimp Sequence Annotation: Region 2_3

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

FUNCTIONAL BIOINFORMATICS

Gene-centered resources at NCBI

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Guided tour to Ensembl

user s guide Question 3

BME 110 Midterm Examination

COMPUTER RESOURCES II:

Introduction to BIOINFORMATICS

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Biotechnology Explorer

Worksheet for Bioinformatics

Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.

G4120: Introduction to Computational Biology

Types of Databases - By Scope

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Important gene-information's

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

Training materials.

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

ChroMoS Guide (version 1.2)

Bioinformatics for Cell Biologists

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Lab Week 9 - A Sample Annotation Problem (adapted by Chris Shaffer from a worksheet by Varun Sundaram, WU-STL, Class of 2009)

Browser Exercises - I. Alignments and Comparative genomics

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Introduction to Bioinformatics

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

BIOINF525: INTRODUCTION TO BIOINFORMATICS LAB SESSION 1

User s Manual Version 1.0

Training materials.

Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G

Array-Ready Oligo Set for the Rat Genome Version 3.0

Applied Bioinformatics

Overview: GQuery Entrez human and amylase Search Pubmed Gene Gene: collected information about gene loci AMY1A Genomic context Summary

NCBI web resources I: databases and Entrez

Overview of the next two hours...

Introduction to Bioinformatics. What are the goals of the course? Who is taking this course? Different user needs, different approaches

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Sequence Alignments. Week 3

Using the Genome Browser: A Practical Guide. Travis Saari

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page

Introduc)on to Databases and Resources Biological Databases and Resources

Algorithms in Bioinformatics ONE Transcription Translation

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Practical Bioinformatics for Biologists (BIOS 441/641)

Read Mapping and Variant Calling. Johannes Starlinger

Tutorial for Stop codon reassignment in the wild

Protein Bioinformatics Part I: Access to information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Data Retrieval from GenBank

Annotating Fosmid 14p24 of D. Virilis chromosome 4

BLAST. Subject: The result from another organism that your query was matched to.

Variant prioritization in NGS studies: Annotation and Filtering "

CITATION FILE CONTENT / FORMAT

Evolutionary Genetics. LV Lecture with exercises 6KP

2. The dropdown box has a number of databases that are searchable. Select the gene option and search for dihydrofolate reductase.

Introduction to NGS analyses

Niemann-Pick Type C Disease Gene Variation Database ( )

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

HUMAN GENOME BIOINFORMATICS. Tore Samuelsson, Dec 2009

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology


Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Investigation of Genomic Variation in the Rising Era of Individual Genome Sequence: A Primer on Some Available Datasets and Structures

Custom TaqMan Assays DESIGN AND ORDERING GUIDE. For SNP Genotyping and Gene Expression Assays. Publication Number Revision G

Bioinformatics Translation Exercise

In silico variant analysis: Challenges and Pitfalls

ELE4120 Bioinformatics. Tutorial 5

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Introduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

Transcription:

INTRODUCTION TO BIOINFORMATICS SAINTS GENETICS 12-120522 - Ian Bosdet (ibosdet@bccancer.bc.ca)

Bioinformatics bioinformatics is: the application of computational techniques to the fields of biology and medicine bioinformatics is generally associated with the analysis of DNA/RNA/protein sequences and other data related to biomolecules and cell biology the roots of bioinformatics are in linux, Perl and C

Bioinformaticians background 1) life scientists with computational skills biology, genetics, microbiology, molecular biology, medicine 2) computer scientists with knowledge of biology computer science, mathematics, physics, engineering, statistics 3) graduates of bioinformatics training programs (http://bioinformatics.bcgsc.ca/) common software scripting languages - Python, Perl statistical software - R programming languages - C, C++, Java Microsoft Excel Vancouver Bioinformatics Users Group: http://www.vanbug.org

Types of data and example databases DNA/RNA/Protein sequence- NCBI, Ensembl Protein - Domains (Pfam) Gene Expression - NCBI GEO Epigenetics - ENCODE Variation Sequence (dbsnp) Copy-number (DGV) Mutations Cancer (COSMIC) Health (ClinVar) Published Literature - PubMed Expert analysis and interpretation - PubMed, GeneReviews

Common Online Sites NCBI - http://www.ncbi.nlm.nih.gov/ Ensembl - http://ensembl.org/ UCSC - http://genome.ucsc.edu/ search tools links to outside databases custom genome browsers

Genome Browsers http://genome.ucsc.edu/

BLAST/BLAT NCBI BLAST (Basic Local Alignment Search Tool) Finds similar sequences within a large database of known sequences Provides a statistical estimate of the likelihood that the match is simply chance Search DNA, Protein, DNA Protein, Protein DNA http://blast.ncbi.nlm.nih.gov/blast.cgi UCSC BLAT (BLAST Like Alignment Tool) Similar to BLAST but quicker Less versatile http://genome.ucsc.edu/cgi-bin/hgblat Try searching this sequence with each tool: caggcccaactgtgagcaaggagcacaagccacaagtcttccagaggatg cttgattccagtggttctgcttcaaggcttccactgcaaaacactaaaga

Sequence file formats Genbank Sequence Features Literature links http://www.ncbi.nlm.nih.gov/nucleotide/41327737?report=genbank FASTA Sequence only Header line with name http://www.ncbi.nlm.nih.gov/nuccore/41327737?report=fasta

Multiple alignments Find similar regions in different proteins these regions may highlight evolutionary conservation and gives clues to protein function http://www.ebi.ac.uk/tools/msa/clustalw2/ >hs_tp53 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDI EQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQ KTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDST PPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGN LRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRP ILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQDQTSFQKENC >mm_trp53 MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDV EEFFEGPSEALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTY QGNYGFHLGFLQSGTAKSVMCTYSPPLNKLFCQLAKTCPVQLWVSATPPA GSRVRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLYP EYLEDRQTFRHSVVVPYEPPEAGSEYTTIHYKYMCNSSCMGGMNRRPILT IITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEVLCPELPPGS AKRALPTCTSASPPQKKKPLDGEYFTLKIRGRKRFEMFRELNEALELKDA HATEESGDSRAHSSLQPRAFQALIKEESPNC >Xenopus_trp53 MEPSSETGMEPPLSQETFEDLWSLLPDPLQTGTGQMENFAEFSEYPLAPDMTVLQEGLMGNTVPTVTSSA VPSTEDYAGSYGLKLEFQQNGTAKSVTCTYSTDLNKLFCQLAKTCPLLVRVERPPPLGSILRATAVYKKS EHVAEVVKRCPHHERSVEPGDDPAPPSHLMRVEGNSKAYYMEDVGTGRHSVCVPYEGPQVGTECTTVLYN YMCNSSCMGGMNRRPILTIITLESPEGLLLGRRCFEVRVCACPGRDRRTEEDNCTKKRGLKPNGKRELSH PPSSDPPLPKKRLVEEDDEETFTLLIKGRSRYEMIKKLNDALELQESLDQQKLSIKCRKCRDEIKPKKGK KLLVKDELQDSE

Exercise: Is there a mouse model for Li-Fraumeni Syndrome? 1. Goto OMIM in your browser: http://omim.org/ 2. Enter the search term lfs1 - click the top search result (#151623) 3. Looking at the Phenotype Gene Relationships table, what two genes are associated with this disease? 1. 2. 4. In this table, click on the link to the gene on chromosome 17 (MIM number: 191170) 5. What is another of the diseases associated with mutations in this gene? 1. 6. Click the Genomic coordinates link to see this gene in the UCSC Genome Browser 7. Below the genome display, click the gray Default tracks button 8. Find the RefSeq Genes track and right-click - select Pack to display all splice forms 9. Find the bottom (longest) splice form and left-click to see the gene details. 10. Click the PubMed link. Approximately how many publications are related to this gene? 1. 11. Click Back in your browser to return to the RefSeq gene details. Click the RefSeq link to go to NCBI.

Exercise - Li-Fraumeni Syndrome 13. Run BLAST on this sequence. Select Run BLAST from the toolbar links on the right. 14. Under Choose Search Set select Mouse genomic + transcript and click BLAST 15. What is the Accession number of the top transcript hit? What is the E value of the alignment? 1. 2. 16. Is there a mouse with this gene knocked out? Search the gene common name (Trp53) at http://www.findmice.org.

Exercises 1. Is the exact peptide sequence Serine-Alanine-Isoleucine-Asparagine-Threonine-Serine found in the human genome? If so, what protein(s)? If not, what is the closest match? 2. You are sequencing DNA isolated from a sample of Vancouver drinking water. One DNA fragment contains a small open-reading frame that codes the following peptide: mgydwlgrmpykgsvengaykaqgvqltak What organism does this come from (and should it be in the water)? Are there any conserved domains in this peptide? 3. [UCSC Browser] Find the name of a SNP found in an exon or intron of the human gene KRAS. Click on the name to see a summary report. Click on the dbsnp link to see a detailed report. What is the frequency of this variant in the human population? 4. [UCSC Browser] Find the gene Notch1. Click the DNA link at the top of the page and then click the extended case/color options button. Select underline for ESTs, blue color for SNPs(135) and bold for RepeatMasker