Characterization of transcription factor binding sites by high-throughput SELEX. Overview of the HTPSELEX Database

Size: px
Start display at page:

Download "Characterization of transcription factor binding sites by high-throughput SELEX. Overview of the HTPSELEX Database"

Transcription

1 Characterization of transcription factor binding sites by high-throughput SELEX Overview of the HPSELEX Database ranscription Factor Binding Sites: Features and Facts Degenerate sequence motifs ypical length: 6-20 bp Low information content: 8-12 bits (1 site per bp) Quantitative recognition mechanism: measurable affinity of different sites may vary over three orders of magnitude Regulatory function often depends on cooperative interactions with neighboring sites

2 Representation of the binding specificity by a scoring matrix (also referred to as weight matrix) A C G Strong C G A C Binding site = 43 Random A C G A C G A Sequence = -83 itle

3 Physical interpretation of an weight matrix Weight matrix elements represent relative binding energies between DNA base-pairs and protein surface areas (base-pair acceptor sites). A weight matrix column describes the base preferences of a base-pair acceptor site. Berg-von Hippel model of protein-dna interactions he weight matrix score expresses the binding free energy of protein-dna complex in arbitrary units: It is convenient to express the binding free energy in dimension-free R units: G( x) = S( x) + const. S(x) = N i= 1 N w i ( x i ) E( x) = ε ( i x i i= 1 εi ( b) wi ( b) R ) On a relative scale, the binding constant for sequence x is given by: K = e rel ( x) For sequences longer than the weight matrix: 1 1 K ( x) = or K (... ) rel( x) = E xi xi+ N 1 E( xi... xi+ N e max e rel 1 ) i i E( x) (index i runs over all subsequence starting positions on both strands)

4 Berg-von Hippel heory Information Content he energy terms of a weight matrix can be computed from the base frequencies p i (b) found in in vitro or in vivo selected binding sites: q(b) is the background frequency of base b. 1 pi ( b) ε i ( b) = ln λ q( b) λ is an unknown parameters related to the stringency of the binding conditions. he information content of a binding site has been defined as the conditional entropy of the base frequency matrix relative to back-ground base frequencies. IC = N i= 1 b= A p ( b)log i 2 pi ( b) q( b) Paradox: λ depends on selection conditions (e.g. the protein concentration) - therefore the base frequencies observed in selected binding sites do not reflect a protein-intrinsic property. Weight matrices/profiles from a biochemical and viewpoint A weight matrix expresses the sequence specificity of a DNA binding proteins. A column describes the base preferences of a surface area of the DNAbinding protein. Weights of a weight matrix can be interpreted as additive binding energy contributions. No interactions between binding site positions! According to the Berg-von Hippel theory negated binding energies are proportional to the logarithms of the base frequencies observed in an in vivo or in vitro selected set of binding sites. Weight matrices can thus be used to compute relative binding energies or dissociation constants for oligonucleotides of any sequence, which in turn can be experimentally determined by gel shift experiments. An accurate weight matrix for the binding specificity of a transcription factor is one that accurately predicts binding constants.

5 Experimental techniques for estimating the parameters of a F specificity matrix Competitive bandshifts (EMSA) rel. binding constants of oligonucletides Alignment of in vivo sites base frequency matrix (from sequences) in vitro selection (SELEX) base frequency matrix (up to 200 sequences) SAGE/SELEX base frequency matrix (up to binding sequences) Exhaustive mutagenesis + K rel assay intrinsic specificity matrix Protein binding arrays + magic algorithm intrinsic specificity matrix Some problems and limitations: A base probability matrix is generate by an alignment or probabilistic modeling algorithm no direct observation K rel usually not very precise (within factor of 2) Point mutations may create binding site in other frame Modeling of a ranscription Factor Binding Site from High hroughput SELEX Data Using a Hidden Markov Modeling Approach Emmanuelle Roulet, Nicolas Mermod (Center for biotechnology UNIL- EPFL, Lausanne, Switzerland) Anamaria A Camargo, Andrew JG Simpson (Ludwig Institute of Cancer Research, Sao Paulo, Brazil) Philipp Bucher (Swiss Institute for Experimental Cancer Research and Swiss Institute of Bioinformatics, Epalinges s/lausanne, Switzerland) Nat. Biotechnol. 20, (2002)

6 Motivation and Goals of the Project Motivation: Accurate and reliable computational tools to predict transcription factor binding sites are still not available. Potential reasons: 1. Lack of adequate experimental data 2. Lack of adequate computational models 3. Lack of an adequate method to estimate the parameters of a computational model from the experimental data Goal: o develop a combined computational-experimental protocol to derive an accurate predictive model of the sequence specificity of a DNA-binding protein Potential benefits: 1. Being able to predict transcription factor binding in genome sequences. 2. Insights into molecular mechanisms of sequence-specific protein-dna interactions 3. Ability to rationally design gene control regions of desired properties for biotechnological applications

7 Our Approach to the Problem of Characterizing the Sequence-Specificity of a DNA Binding ranscription Factor 1. Choice of a quantitative predictive model for representing the binding specificity. Our choice: a profile-hmm 2. Choice of an experimental method to generate data for estimating the model parameters. Our choice: a SELEX experiment 3. Choice of a machine learning algorithm to estimate the model parameters from the data. Our choice: the Baum-Welch HMM training algorithm 4. Validation of the approach and optimization of the experimental parameters by a computer simulation of step 2 and 3. Adjustment of experimental protocol to produce the necessary data as suggested by the computer simulation 6. Generation of the experimental data 7. Building a binding site model from the data 8. A posteriori validation of the model by cross-validation and comparison with independent experimental results Study Object: ranscription Factor CF/NFI Dimeric DNA-binding protein recognizing a palindromic sequence motif with consensus sequence GGC(N)GCCAA First isolated as a replication factor of Adenovirus type 2 Later independently isolated as a CCAA-box binding transcription factor Can activate transcription of a reporter gene in transfected cells Recently shown to be implicated in regulatory pathways related to tumor progression and immune response Biochemical mechanism of gene regulation still elusive

8 Old CF/NFI Binding Site Profile Example: GGGCAAAGCCAC Score: = 88

9 Random sequence library CCACCCGAGCGAGACA.N(2).AGACCCAACCGACCCGAA-3 Second strand synthesis by pcr Primer 1 Bgl II Bgl II CCACCCGAGCGAGACA.N(2).AGACCCAACCGACCCGAA-3 3 AGGAGAGAAGACAACAGACAGA.N(2).ACAGAGGAGGCGAGGCAAAA- Selection of binding sequences (gel shift) Amplification Primer 2 Selection cycles Digestion Bgl II GACA..N(2)..A A..N(2)..ACAG-3 Concatemerization and cloning -GACA N(2) AGACA N(2) AGACA N(2) A A N(2) ACAGA N(2) ACAGA N(2) ACAG-3 site 1 site 2 site 3 HS sequencing Principle of the Baum-Welch hidden Markov model training algorithm Initial model: raining sequences: AACAGCGGCCAACAGGACACA CCACAACFFACGCCCAAAAACCAA GAGGGACCGCCCAGCAAC ACACGGCACCCCACGC GGAAAAAAAAAAACAGGG GCGCGGAGGCACGCCCAA AAGGGCCACCAAAGCGAG... How does it work? 1. he initial model serves as current model. 2. raining sequences are aligned to the current model. 3. New base and transition frequencies are estimated from the multiple alignment generated by step 2. he new model becomes the current model. 4. Step 2 and 3 are repeated until convergence is reached. rained model:

10

11

12 Doing the Experiment

13 Results CF/NF1 Cycle Cycle SUM Seq.reads Sites Clone statistics Clones Different sites Site Statistics 1481 Colonies Diff. sites err < 0.01/bp err </bp Clones with detectable inserts New CF/NFI model Hidden Markov Model (frequencies given in %): Scoring profile (relative energy units):

14 Predicted and observed evolution of Selex populations heoretically predicted affinity profiles of successive SELEX cycles (Djordjevic & Sengupta 2006) high low affinity Weight matrix scores for successive CF/NF1 HP SELEX populations (Roulet et al. 2002) high Major Differences between New and Old CF/NFI Binding Site Models he new model contains a sixth half-site position reducing the major spacer length class to 3. his extends the consensus half-site motif to GGCA. Alternative spacer length classes N4 and N (N6 and N7 according to the old numbering system) receive much more severe penalties in the new profile. Based on the estimated frequencies, it is not certain whether these binding modes have occurred at all during SELEX amplification. he G mismatch at the first position of the half-site weigth matrix has a much lower weight in the new model.

15

16 Quality Assessment of the New Model: Comparison of Predicted Binding Scores with in vitro measured Binding Constants Data from Meisterernst et al. (1988). Nucl. Acids Res. 16,

17 Beyond simple weight matrices: correlated dinucleotide analysis HP SELEX Sequencing totals for members of the CF family SELEX Library LEF1_2 LEF1_3 LEF1_ LEF1_6 LEF1_7 SUM LBC_ LBC_6 SUM CF4_3 otal number of sites LEF1/CF-1 α with β-catenin otal number of unique sites LEF1/CF-1α CF % error rate <0.01% per bp <% per bp

18 PSSM of LEF1/CF-1α SELEX cycle 3 1 C 2 C G 7 A 8 9 C 10 A A C G PSSM of LEF1/CF-1α SELEX cycle 6 1 C 2 C G 7 A 8 9 C 10 A A C G Base frequency tables for DNA binding sites of CF family members derived by HP SELEX

19 Sequence Logos for binding sites of CF family proteins Lef-1 Lef-1/beta-catenin cf-4 Comparison of our CF4 binding site with motif obtained by affinity measurements Sequence Logo pasted from Hallikas et al. (2006). Cell 124:21. Motif obtained by competition assays with complete single base-substitution series. Note: at least one significant position is missing because of a priori restriction of motif extension.

20 Overview of HPSELEX Database Contents from raw data to HMMs: Single-read sequencing chromatograms Clone sequences (assembled by Phred/Phrap) Site sequences with estimated sequencing errors HMMs for binding sites in two formats (decodeanhmm, MAMO) Additional features: Quality-controlled sequence download Access to selected low-throughput SELEX data Experimental and computational protocols

21 Example of a HPSELEX clone entry ID LBC standard; DNA; UNC; 1023 BP. XX AC LBC XX D -Jun-200 XX DE ' Sequence of SELEX/SAGE Clone : LBC of cycle XX KW HP SELEX/SAGE, invitro transcription factor binding sites XX OS unidentified OC unidentified XX RN [1] RA Emmanuelle Roulet, Stephane Busso, Anamaria A.Camargo, Andrew J.G Simpson, RA Nicolas Mermod, and Philipp Bucher. R High-throughput SELEX-SAGE method for quantitative modelling of R transcription-factor binding sites. RL Nature Biotechnology 20:831-83(2000) XX DR RACES;LBC 003F.scf XX FH Key Location/Qualifiers FH F source F /mol_type="unassigned DNA" F /organism="unidentified" F /tissue_type="selex" F misc_binding F /bound_moiety ="LEF1/CF with beta catenin " F /label="lbc 00003_1" F /note="base quality score is e-03" F misc_binding F /bound_moiety ="LEF1/CF with beta catenin " F /label="lbc 00003_2" F /note="base quality score is e-03" XX SQ Sequence 1023 BP; 230 A; 291 C; 260 G; 242 ; 0 other; AAAACCAA AAAGGGGCA GAAGGGCC CCCGAGC GCCGAGCG GCCGCCAGG GAGGAA CGCAGAA CCAGCACAC GGCGGCCG ACAGGGA CAGGCGG

Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq

Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne Data flow in ChIP-Seq data analysis Level 1:

More information

Hidden Markov Models. Some applications in bioinformatics

Hidden Markov Models. Some applications in bioinformatics Hidden Markov Models Some applications in bioinformatics Hidden Markov models Developed in speech recognition in the late 1960s... A HMM M (with start- and end-states) defines a regular language L M of

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057 Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Reviewing sites: affinity and specificity representation binding

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec5: Interpreting your MSA Using Logos Using Logos - Logos are a terrific way to generate

More information

Extraction of Hidden Markov Model Representations of Signal Patterns in. DNA Sequences

Extraction of Hidden Markov Model Representations of Signal Patterns in. DNA Sequences 686 Extraction of Hidden Markov Model Representations of Signal Patterns in. DNA Sequences Tetsushi Yada The Japan Information Center of Science and Technology (JICST) 5-3 YonbancllO, Clliyoda-ku, Tokyo

More information

CS273B: Deep learning for Genomics and Biomedicine

CS273B: Deep learning for Genomics and Biomedicine CS273B: Deep learning for Genomics and Biomedicine Lecture 2: Convolutional neural networks and applications to functional genomics 09/28/2016 Anshul Kundaje, James Zou, Serafim Batzoglou Outline Anatomy

More information

Methoden zur Analyse von Transkriptionsfaktoren. Seminar: BCII, Lausen

Methoden zur Analyse von Transkriptionsfaktoren. Seminar: BCII, Lausen Methoden zur Analyse von Transkriptionsfaktoren Seminar: BCII, Lausen Gene expression: from transcription to translation Orphanides G, Reinberg D.Cell. 2002 Feb 22;108(4):439-51. Schematic of a gene regulatory

More information

Regulation of eukaryotic transcription:

Regulation of eukaryotic transcription: Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:

More information

Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed

More information

Representing Errors and Uncertainty in Plasma Proteomics

Representing Errors and Uncertainty in Plasma Proteomics Representing Errors and Uncertainty in Plasma Proteomics David J. States, M.D., Ph.D. University of Michigan Bioinformatics Program Proteomics Alliance for Cancer Genomics vs. Proteomics Genome sequence

More information

The application of hidden markov model in building genetic regulatory network

The application of hidden markov model in building genetic regulatory network J. Biomedical Science and Engineering, 2010, 3, 633-637 doi:10.4236/bise.2010.36086 Published Online June 2010 (http://www.scirp.org/ournal/bise/). The application of hidden markov model in building genetic

More information

Design. Construction. Characterization

Design. Construction. Characterization Design Construction Characterization DNA mrna (messenger) A C C transcription translation C A C protein His A T G C T A C G Plasmids replicon copy number incompatibility selection marker origin of replication

More information

Molecular Genetics Techniques. BIT 220 Chapter 20

Molecular Genetics Techniques. BIT 220 Chapter 20 Molecular Genetics Techniques BIT 220 Chapter 20 What is Cloning? Recombinant DNA technologies 1. Producing Recombinant DNA molecule Incorporate gene of interest into plasmid (cloning vector) 2. Recombinant

More information

BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) -

BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) - Protocol BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) - A quantitative investigation of various determinants of TF binding; going beyond the characterization of core site Einat Zalckvar*

More information

2/19/13. Contents. Applications of HMMs in Epigenomics

2/19/13. Contents. Applications of HMMs in Epigenomics 2/19/13 I529: Machine Learning in Bioinformatics (Spring 2013) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Background:

More information

Module Overview. Lecture

Module Overview. Lecture Module Overview Day 1 2 3 4 5 6 7 8 Lecture Introduction SELEX I: Building a Library SELEX II: Selecting RNA with target functionality SELEX III: Technical advances & problem-solving Characterizing aptamers

More information

2/10/17. Contents. Applications of HMMs in Epigenomics

2/10/17. Contents. Applications of HMMs in Epigenomics 2/10/17 I529: Machine Learning in Bioinformatics (Spring 2017) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Background:

More information

Structure-Guided Deimmunization CMPS 3210

Structure-Guided Deimmunization CMPS 3210 Structure-Guided Deimmunization CMPS 3210 Why Deimmunization? Protein, or biologic therapies are proving to be useful, but can be much more immunogenic than small molecules. Like a drug compound, a biologic

More information

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these

More information

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group The ChIP-Seq project Giovanna Ambrosini, Philipp Bucher EPFL-SV Bucher Group April 19, 2010 Lausanne Overview Focus on technical aspects Description of applications (C programs) Where to find binaries,

More information

Finding Eukaryotic Genes Computationally

Finding Eukaryotic Genes Computationally Gene Identification Finding Eukaryotic Genes Computationally ü Content-based Methods GC content, hexamer repeats, composition statistics, codon frequencies ü Site-based Methods donor sites, acceptor sites,

More information

Biochemistry 674, Fall, 1995: Nucleic Acids Exam II: November 16, 1995 Your Name Here: PCR A C

Biochemistry 674, Fall, 1995: Nucleic Acids Exam II: November 16, 1995 Your Name Here: PCR A C CHM 674 Exam II, 1995 1 iochemistry 674, Fall, 1995: Nucleic cids Prof. Jason Kahn Exam II: November 16, 1995 Your Name Here: This exam has five questions worth 20 points each. nswer all five. You do not

More information

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc. Chapter 20 Recombinant DNA Technology Copyright 2009 Pearson Education, Inc. 20.1 Recombinant DNA Technology Began with Two Key Tools: Restriction Enzymes and DNA Cloning Vectors Recombinant DNA refers

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

Motivation From Protein to Gene

Motivation From Protein to Gene MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein

More information

Roche Molecular Biochemicals Technical Note No. LC 12/2000

Roche Molecular Biochemicals Technical Note No. LC 12/2000 Roche Molecular Biochemicals Technical Note No. LC 12/2000 LightCycler Absolute Quantification with External Standards and an Internal Control 1. General Introduction Purpose of this Note Overview of Method

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm

Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm Anja Smith Director R&D Dharmacon, part of GE Healthcare Imagination at work crrna:tracrrna program Cas9 nuclease Active crrna is

More information

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8 Bi 8 Lecture 4 DNA approaches: How we know what we know Ellen Rothenberg 14 January 2016 Reading: from Alberts Ch. 8 Central concept: DNA or RNA polymer length as an identifying feature RNA has intrinsically

More information

Chapter 1. from genomics to proteomics Ⅱ

Chapter 1. from genomics to proteomics Ⅱ Proteomics Chapter 1. from genomics to proteomics Ⅱ 1 Functional genomics Functional genomics: study of relations of genomics to biological functions at systems level However, it cannot explain any more

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Module Overview. Lecture. DNA library synthesis (PCR) Introduction

Module Overview. Lecture. DNA library synthesis (PCR) Introduction Module Overview Day 1 Lecture Introduction Lab DNA library synthesis (PCR) 2 3 4 5 6 7 8 SELEX I: Building a Library SELEX II: Selecting RNA with target functionality SELEX III: Library deconvolution,

More information

Applications of HMMs in Epigenomics

Applications of HMMs in Epigenomics I529: Machine Learning in Bioinformatics (Spring 2013) Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Background:

More information

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course logistics Genomes (so many genomes) The computational bottleneck Python: Programs, input and output Number and

More information

Übung V. Einführung, Teil 1. Transktiptionelle Regulation TFBS

Übung V. Einführung, Teil 1. Transktiptionelle Regulation TFBS Übung V Einführung, Teil 1 Transktiptionelle Regulation TFBS Transcription Factors These proteins promote transcription 1. Bind DNA 2. Activate Transcription These two functions usually reside on separate

More information

Biochemistry 111. Carl Parker x A Braun

Biochemistry 111. Carl Parker x A Braun Biochemistry 111 Carl Parker x6368 101A Braun csp@caltech.edu Central Dogma of Molecular Biology DNA-Dependent RNA Polymerase Requires a DNA Template Synthesizes RNA in a 5 to 3 direction Requires ribonucleoside

More information

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland ChIP-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? Short introduction to ChIP-seq Analyzing ChIP-seq data Central concepts Analysis

More information

Methods and tools for exploring functional genomics data

Methods and tools for exploring functional genomics data Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for

More information

In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features

In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features Yiliang Ding, Yin Tang, Chun Kit Kwok, Yu Zhang, Philip C. Bevilacqua & Sarah M. Assmann (2014) Seminar RNA Bioinformatics

More information

Module 2 overview SPRING BREAK

Module 2 overview SPRING BREAK 1 Module 2 overview lecture lab 1. Introduction to the module 1. Start-up protein eng. 2. Rational protein design 2. Site-directed mutagenesis 3. Fluorescence and sensors 3. DNA amplification 4. Protein

More information

ChIP. November 21, 2017

ChIP. November 21, 2017 ChIP November 21, 2017 functional signals: is DNA enough? what is the smallest number of letters used by a written language? DNA is only one part of the functional genome DNA is heavily bound by proteins,

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: January 16, 2013 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Polymerase Chain Reaction (PCR) and Its Applications

Polymerase Chain Reaction (PCR) and Its Applications Polymerase Chain Reaction (PCR) and Its Applications What is PCR? PCR is an exponentially progressing synthesis of the defined target DNA sequences in vitro. It was invented in 1983 by Dr. Kary Mullis,

More information

Module 2 overview SPRING BREAK

Module 2 overview SPRING BREAK 1 Module 2 overview lecture lab 1. Introduction to the module 1. Start-up protein eng. 2. Rational protein design 2. Site-directed mutagenesis 3. Fluorescence and sensors 3. DNA amplification 4. Protein

More information

March 9, Hidden Markov Models and. BioInformatics, Part I. Steven R. Dunbar. Intro. BioInformatics Problem. Hidden Markov.

March 9, Hidden Markov Models and. BioInformatics, Part I. Steven R. Dunbar. Intro. BioInformatics Problem. Hidden Markov. and, and, March 9, 2017 1 / 30 Outline and, 1 2 3 4 2 / 30 Background and, Prof E. Moriyama (SBS) has a Seminar SBS, Math, Computer Science, Statistics Extensive use of program "HMMer" Britney (Hinds)

More information

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme Interactomics and Proteomics 1. Interactomics The field of interactomics is concerned with interactions between genes or proteins. They can be genetic interactions, in which two genes are involved in the

More information

Sequencing technologies

Sequencing technologies Sequencing technologies part of High-Throughput Analyzes of Genome Sequenzes Computational EvoDevo University of Leipzig Leipzig, WS 2014/15 Sanger Sequencing (Chain Termination Method) Sequencing of one

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA

More information

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015).

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015). F op-scoring motif Optimized motifs E Input sequences entral 1 bp region Dinucleotideshuffled seqs B D ll B1H-R predicted motifs Enriched B1H- R predicted motifs L!=!7! L!=!6! L!=5! L!=!4! L!=!3! L!=!2!

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain

More information

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M. Contents Computational Methods for Protein Structure Prediction and Fold Recognition........................... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M. Bujnicki 1 Primary Structure Analysis...................

More information

Bootcamp: Molecular Biology Techniques and Interpretation

Bootcamp: Molecular Biology Techniques and Interpretation Bootcamp: Molecular Biology Techniques and Interpretation Bi8 Winter 2016 Today s outline Detecting and quantifying nucleic acids and proteins: Basic nucleic acid properties Hybridization PCR and Designing

More information

Deoxyribonucleic Acid DNA

Deoxyribonucleic Acid DNA Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Companion lecture to the textbook: Fundamentals of BioMEMS and Medical Microdevices, by Prof., http://saliterman.umn.edu/

More information

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Companion lecture to the textbook: Fundamentals of BioMEMS and Medical Microdevices, by Prof., http://saliterman.umn.edu/

More information

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology Chapter 10 Genetic Engineering: A Revolution in Molecular Biology Genetic Engineering Direct, deliberate modification of an organism s genome bioengineering Biotechnology use of an organism s biochemical

More information

Biotechnology. Review labs 1-5! Ch 17: Genomes. Ch 18: Recombinant DNA and Biotechnology. DNA technology and its applications

Biotechnology. Review labs 1-5! Ch 17: Genomes. Ch 18: Recombinant DNA and Biotechnology. DNA technology and its applications Biotechnology DNA technology and its applications Biotechnology and Molecular Biology Concepts: Polymerase chain reaction (PCR) Plasmids and restriction digests Recombinant protein production UV spectrophotometry

More information

Profile HMMs. 2/10/05 CAP5510/CGS5166 (Lec 10) 1 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END

Profile HMMs. 2/10/05 CAP5510/CGS5166 (Lec 10) 1 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END Profile HMMs START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END 2/10/05 CAP5510/CGS5166 (Lec 10) 1 Profile HMMs with InDels Insertions Deletions Insertions & Deletions DELETE 1 DELETE 2 DELETE 3

More information

Reverse Transcription & RT-PCR

Reverse Transcription & RT-PCR Creating Gene Expression Solutions Reverse Transcription & RT-PCR Reverse transcription, a process that involves a reverse transcriptase (RTase) which uses RNA as the template to make complementary DNA

More information

Factors affecting PCR

Factors affecting PCR Lec. 11 Dr. Ahmed K. Ali Factors affecting PCR The sequences of the primers are critical to the success of the experiment, as are the precise temperatures used in the heating and cooling stages of the

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Name with Last Name, First: BIOE111: Functional Biomaterial Development and Characterization MIDTERM EXAM (October 7, 2010) 93 TOTAL POINTS

Name with Last Name, First: BIOE111: Functional Biomaterial Development and Characterization MIDTERM EXAM (October 7, 2010) 93 TOTAL POINTS BIOE111: Functional Biomaterial Development and Characterization MIDTERM EXAM (October 7, 2010) 93 TOTAL POINTS Question 0: Fill in your name and student ID on each page. (1) Question 1: What is the role

More information

In-Fusion HD Cloning Plus System

In-Fusion HD Cloning Plus System In-Fusion HD Cloning Plus System One trustworthy solution for all your cloning and mutagenesis projects Seamless 15-30 Directional Any vector GOI + Any insert Anywhere Large & small inserts or vectors

More information

CollecTF Documentation

CollecTF Documentation CollecTF Documentation Release 1.0.0 Sefa Kilic August 15, 2016 Contents 1 Curation submission guide 3 1.1 Data.................................................... 3 1.2 Before you start.............................................

More information

Biochemistry 674 Your Name: Nucleic Acids Prof. Jason Kahn Exam II (100 points total) November 17, 2005

Biochemistry 674 Your Name: Nucleic Acids Prof. Jason Kahn Exam II (100 points total) November 17, 2005 Biochemistry 674 ucleic Acids Your ame: Prof. Jason Kahn Exam II (100 points total) ovember 17, 2005 You have 80 minutes for this exam. Exams written in pencil or erasable ink will not be re-graded under

More information

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016 Bi 8 Lecture 5 MORE ON HOW WE KNOW WHAT WE KNOW and intro to the protein code Ellen Rothenberg 19 January 2016 SIZE AND PURIFICATION BY SYNTHESIS: BASIS OF EARLY SEQUENCING complex mixture of aborted DNA

More information

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Learning Methods for DNA Binding in Computational Biology

Learning Methods for DNA Binding in Computational Biology Learning Methods for DNA Binding in Computational Biology Mark Kon Dustin Holloway Yue Fan Chaitanya Sai Charles DeLisi Boston University IJCNN Orlando August 16, 2007 Outline Background on Transcription

More information

International Journal of Engineering & Technology IJET-IJENS Vol:14 No:01 9

International Journal of Engineering & Technology IJET-IJENS Vol:14 No:01 9 International Journal of Engineering & Technology IJET-IJENS Vol:14 No:01 9 Analysis on Clustering Method for HMM-Based Exon Controller of DNA Plasmodium falciparum for Performance Improvement Alfred Pakpahan

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

Bioinformatics overview

Bioinformatics overview Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance

More information

RNA Expression of the information in a gene generally involves production of an RNA molecule transcribed from a DNA template. RNA differs from DNA

RNA Expression of the information in a gene generally involves production of an RNA molecule transcribed from a DNA template. RNA differs from DNA RNA Expression of the information in a gene generally involves production of an RNA molecule transcribed from a DNA template. RNA differs from DNA that it has a hydroxyl group at the 2 position of the

More information

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC

More information

Supplementary Information for:

Supplementary Information for: Supplementary Information for: A streamlined and high-throughput targeting approach for human germline and cancer genomes using Oligonucleotide-Selective Sequencing Samuel Myllykangas 1, Jason D. Buenrostro

More information

Transcription Gene regulation

Transcription Gene regulation Transcription Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including RNA polymerase, the enzyme that converts DNA code into RNA code. A crew of transcription

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Site directed mutagenesis, Insertional and Deletion Mutagenesis. Mitesh Shrestha

Site directed mutagenesis, Insertional and Deletion Mutagenesis. Mitesh Shrestha Site directed mutagenesis, Insertional and Deletion Mutagenesis Mitesh Shrestha Mutagenesis Mutagenesis (the creation or formation of a mutation) can be used as a powerful genetic tool. By inducing mutations

More information

ProductInformation INTRODUCTION TO THE VECTORETTE SYSTEM

ProductInformation INTRODUCTION TO THE VECTORETTE SYSTEM INTRODUCTION TO THE VECTORETTE SYSTEM ProductInformation The following is background information on the Vectorette System, included to familiarize the researcher with the Vectorette Unit and its function

More information

BurrH: a new modular DNA binding protein for genome engineering

BurrH: a new modular DNA binding protein for genome engineering Supplementary information for: BurrH: a new modular protein for genome engineering Alexandre Juillerat, Claudia Bertonati, Gwendoline Dubois, Valérie Guyot, Séverine Thomas, Julien Valton, Marine Beurdeley,

More information

Sequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing

Sequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence

More information

Admission Exam for the Graduate Course in Bioinformatics. November 17 th, 2017 NAME:

Admission Exam for the Graduate Course in Bioinformatics. November 17 th, 2017 NAME: 1 Admission Exam for the Graduate Course in Bioinformatics November 17 th, 2017 NAME: This exam contains 30 (thirty) questions divided in 3 (three) areas (maths/statistics, computer science, biological

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

The Biotechnology Toolbox

The Biotechnology Toolbox Chapter 15 The Biotechnology Toolbox Cutting and Pasting DNA Cutting DNA Restriction endonuclease or restriction enzymes Cellular protection mechanism for infected foreign DNA Recognition and cutting specific

More information

Non-coding Function & Variation, MPRAs II. Mike White Bio /5/18

Non-coding Function & Variation, MPRAs II. Mike White Bio /5/18 Non-coding Function & Variation, MPRAs II Mike White Bio 5488 3/5/18 MPRA Review Problem 1: Where does your CRE DNA come from? DNA synthesis Genomic fragments Targeted regulome capture Problem 2: How do

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

Some types of Mutagenesis

Some types of Mutagenesis Mutagenesis What Is a Mutation? Genetic information is encoded by the sequence of the nucleotide bases in DNA of the gene. The four nucleotides are: adenine (A), thymine (T), guanine (G), and cytosine

More information

3 Designing Primers for Site-Directed Mutagenesis

3 Designing Primers for Site-Directed Mutagenesis 3 Designing Primers for Site-Directed Mutagenesis 3.1 Learning Objectives During the next two labs you will learn the basics of site-directed mutagenesis: you will design primers for the mutants you designed

More information

SELECTED TECHNIQUES AND APPLICATIONS IN MOLECULAR GENETICS

SELECTED TECHNIQUES AND APPLICATIONS IN MOLECULAR GENETICS SELECTED TECHNIQUES APPLICATIONS IN MOLECULAR GENETICS Restriction Enzymes 15.1.1 The Discovery of Restriction Endonucleases p. 420 2 2, 3, 4, 6, 7, 8 Assigned Reading in Snustad 6th ed. 14.1.1 The Discovery

More information

Authors: Vivek Sharma and Ram Kunwar

Authors: Vivek Sharma and Ram Kunwar Molecular markers types and applications A genetic marker is a gene or known DNA sequence on a chromosome that can be used to identify individuals or species. Why we need Molecular Markers There will be

More information

Data Retrieval from GenBank

Data Retrieval from GenBank Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing

More information

Lecture Four. Molecular Approaches I: Nucleic Acids

Lecture Four. Molecular Approaches I: Nucleic Acids Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single

More information

Genetic Engineering & Recombinant DNA

Genetic Engineering & Recombinant DNA Genetic Engineering & Recombinant DNA Chapter 10 Copyright The McGraw-Hill Companies, Inc) Permission required for reproduction or display. Applications of Genetic Engineering Basic science vs. Applied

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics RNA Sequencing T TM variation genetics validation SNP ncrna metagenomics private trio de novo exome mendelian ChIP-seq RNA DNA bioinformatics custom target high-throughput resequencing storage ncrna comparative

More information