BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Size: px
Start display at page:

Download "BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP"

Transcription

1 Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

2 MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in hospitals

3 MB&C2017 Workshop Bioinformatics for dummies 3 INTRODUCTION Molecular Biology Information Technologies Combine: New insights and technologies in molecular biology Advances in information technologies Bioinformatics

4 MB&C2017 Workshop Bioinformatics for dummies 4 INTRODUCTION To store, organize and share molecular biological data in database systems To process and analyse biological data by using bioinformatics tools in a dry lab To integrate the different tools by means of scripting into a bioinformatics pipeline

5 MB&C2017 Workshop Bioinformatics for dummies 5 MOLECULAR BIOLOGY AND BIOINFORMATICS Important (high-throughput) technologies: Next Generation Sequencing o Sequencing and expression analysis Microarray o Expression and genetic variation analysis Mass spectrometry o Protein (sequence) identification

6 MB&C2017 Workshop Bioinformatics for dummies 6 NEXT GENERATION SEQUENCING Johnsen, J. M., Nickerson, D. A. & Reiner, A. P. (2013). Massively parallel sequencing: the new frontier of hematologic genomics. Blood, 122(19), Short-read NGS 2 approaches: Sequencing by synthesis Sequencing by ligation bp read length High accuracy (~ 99,99%) Complex assembly

7 MB&C2017 Workshop Bioinformatics for dummies 7 NEXT NEXT GENERATION SEQUENCING

8 MB&C2017 Workshop Bioinformatics for dummies 8 MICROARRAYS

9 MB&C2017 Workshop Bioinformatics for dummies 9 MASS SPECTROMETRY

10 MB&C2017 Workshop Bioinformatics for dummies 10 MOLECULAR BIOLOGY AND BIOINFORMATICS Biological databases: DNA Sequence and loci (Natural) genetic variation RNA Transcripts (and variants) Gene expression Protein Sequence and function Phenotype (and diseases)

11 MB&C2017 Workshop Bioinformatics for dummies 11 (SEQUENCE) REPOSITORIES Exploratory example: TGF beta 1 an important protein involved in cell proliferation, differentiation and growth NCBI Gene General and integrated sequence and locus information NCBI Nucleotide All available (partial) TGF beta 1 nucleotide sequences ± 117 records (!) NCBI UniGene Uniprot or NCBI Protein Transcripts and gene expression (EST Profile) information High-quality recourse of protein sequence and functional information

12 MB&C2017 Workshop Bioinformatics for dummies 12 (SEQUENCE) REPOSITORIES Example 1: looking for the nucleotide sequence of PSA NCBI nucleotide query: (prostate specific antigen) restricted to humans

13 MB&C2017 Workshop Bioinformatics for dummies 13 (SEQUENCE) REPOSITORIES Example 2: the 1000 Genomes project ( ) Goal = to find most genetic variants with frequencies of at least 1% in the populations studied ACGTACGTACGTACGTACGTACGT ACGTACCTACGTACGTACGTACGT ACGTACCTACGTATGTTCGTACGT ACGTACGTACGTATGTTCGTACGT

14 MB&C2017 Workshop Bioinformatics for dummies 14 (SEQUENCE) REPOSITORIES Solution for genetic and sequence diversity: Genome Reference Consortium (GRC): to create the best possible reference assembly for human latest major release: GRCh38 o NCBI Reference Sequence Database (RefSeq): a non-redundant, wellannotated set of reference sequences incl. genomic, transcript, and protein o o One gene one sequence Select seq ref NM_ Select seq ref NM_ Select seq emb X Select seq gb BC Select seq gb BC Select seq gb M HUMAPS Select seq gb M HUMPAA Select seq emb X Select seq emb AJ

15 MB&C2017 Workshop Bioinformatics for dummies 15 BEST PRACTICE - on how to find a reference sequence

16 MB&C2017 Workshop Bioinformatics for dummies 16 HOMOLOGY SEARCHING Next Generation Sequencing o Result = unknown nucleotide sequences Determination of sequence identity simple keyword search strategy Instead: usage of evolutionary model to determine homology between nucleotide (or protein) sequences Based on sequence alignment BLAST: Basic Local Alignment Search Tool

17 MB&C2017 Workshop Bioinformatics for dummies 17 HOMOLOGY SEARCHING Homology Derived from the same ancestor 80 MYA 2 types: Orthologs = due to speciation event Paralogs = due to duplication event Typically based on morphological characteristics Making use of molecular phylogeny to determine homology

18 MB&C2017 Workshop Bioinformatics for dummies 18 HOMOLOGY SEARCHING CAAGGCTGTCCCCCCAAGACGTGCTCCCAGGACGAGTTTCGCTGCCACGATGGGAAGTGCATCTCTCG GCAGTTCGTCTGTGACTCAGACCGGGACTGCTTGGACGGCTCAGACGAGGCCTCCTGCCCGGTGCTCA CCTGTGGTCCCGCCAGCTTCCAGTGCAACAGCTCCACCTGCATCCCCCAGCTGTGGGCCTGCGACAAC Given = an unknown human nucleotide sequence > unknown human nucleotide sequence.fasta To determine the identity use BLAST o Against the Homo sapiens RefSeq RNA database, exclude models o Identity? Bits score? Expect value?

19 MB&C2017 Workshop Bioinformatics for dummies 19 HOMOLOGY SEARCHING BLAST simple keyword search strategy 3 steps: LIST SCAN EXTEND Based on a model of evolution and scoring system

20 MB&C2017 Workshop Bioinformatics for dummies 20 HOMOLOGY SEARCHING Are two sequences homologous? o Percent identity (quantitative) + Expect value While homology = YES or NO question!! Example: is it possible to predict that human myoglobin (NP_005359) and beta hemoglobin (NP_000509) are paralogs?

21 MB&C2017 Workshop Bioinformatics for dummies 21 DNA VARIANT ANALYSIS Compare nucleotide sequence with a reference sequence Nucleotide diversity DNA variant identification Example: nucleotide diversity in multiple hemoglobin beta variants o > HBB multiple sequence alignment.fasta o Align sequences using MUSCLE software ( output = HTML Multiple sequence alignment (MSA) Phylogenetic analysis

22 MB&C2017 Workshop Bioinformatics for dummies 22 DNA VARIANT ANALYSIS Browsing genetic variations: Natural genetic variation the 1000 Genomes Browser o The database of short genetic variation NCBI dbsnp ( BRCA1?) ( BRCA1?)

23 MB&C2017 Workshop Bioinformatics for dummies 23 DNA VARIANT ANALYSIS Genetic variation effect on protein structure/function? o Depends on the location of the mutation/variation: o Make use of IMPACT and SIFT (sorts intolerant from tolerant) score for amino acid substitutions:

24 MB&C2017 Workshop Bioinformatics for dummies 24 DNA VARIANT ANALYSIS Genetic variation effect on protein structure/function? Variant Effect Predictor ( Example: search for rs on the 1000 Genomes Browser o Look up the SNP in the dbsnp database o Examine the SNP with the Variant Effect Predictor

25 MB&C2017 Workshop Bioinformatics for dummies 25 CONCLUDING REMARKS Bioinformatics is more than sequence alignment, BLAST and variant calling Example:

26 Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP