Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018

Similar documents
Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2009

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

BGGN 213: Foundations of Bioinformatics (Fall 2017)

Epigenetics and DNase-Seq

Introduction to Bioinformatics

Genomics and Bioinformatics GMS6231 (3 credits)

Our website:

Identifying Signaling Pathways. BMI/CS 776 Spring 2016 Anthony Gitter

BTRY 7210: Topics in Quantitative Genomics and Genetics

AC Algorithms for Mining Biological Sequences (COMP 680)

Genome 373: Genomic Informatics. Elhanan Borenstein

Biology 644: Bioinformatics

Genetics and Bioinformatics

ST 591: Introduction to Quantitative Genomics Syllabus

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

MB 668 Microbial Bioinformatics and Genome Evolution. 4 credits Spring, 2017

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Measuring transcriptomes with RNA-Seq

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Instructor: Jianying Zhang, M.D., Ph.D.; Office: B301, Lab: B323; Phone: (O); (L)

Lecture 10. Ab initio gene finding

BIOL 205. Fall Term (2017)

Computational Genomics ( )

Introduction to Genome Science - BISC 434 Syllabus Spring Semester

MOLECULAR BIOLOGY OF EUKARYOTES 2016 SYLLABUS

MATH 5610, Computational Biology

Recommendations from the BCB Graduate Curriculum Committee 1

CISC 436/636 Computational Biology &Bioinformatics (Fall 2016) Lecture 1

ZOO 4926 Special Topics: Genomics and Biotechnology

Linking Genetic Variation to Important Phenotypes

BTRY 7210: Topics in Quantitative Genomics and Genetics

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Bioinformatics (Globex, Summer 2015) Lecture 1

PBIO4_5280_Laboratory in Genomics Techniques (Spring 2017)

Biochemistry 9/1/17. Course No: CHM4006; Credits: Day / Time: Wednesday & Friday: 13:30-15:00. Ro # H & 208 (IT/BT)

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

Examination Assignments

Office hours: Wednesday from 11:30 to 12:30 and Friday from 11:30 to 12:30. Lectures: Wednesday and Friday 10:15 AM -11:30 AM, Room HB-130

MCB 102 Spring 2016 General Syllabus M-W-F 11am to 12pm Wheeler Auditorium

Bioinformatics - Lecture 1

Technician: Dionne Lutz, BS: Biology & MsED Office: Kanbar Center, Room 704, 41 Cooper Sq. (212) (office)

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

GNET 744 Biological Sequence Analysis, Protein Structure and Genome-Wide Data

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Bioinformatics for Cell Biologists

CENTER FOR BIOTECHNOLOGY

ELE4120 Bioinformatics. Tutorial 5

Applications of HMMs in Computational Biology. BMI/CS Colin Dewey

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D.

Capabilities & Services

Prof. Clare Bates Congdon, PhD

NCBI web resources I: databases and Entrez

BSC 4445C Genomics Lab: Methods in Data Collection and Analysis

BIOL 274 Introduction to Bioinformatics Fall 2016

Practical Bioinformatics for Biologists (BIOS493/700)

Introduction to Bioinformatics and Gene Expression Technology

VALLIAMMAI ENGINEERING COLLEGE

Analysis of Microarray Data

INTRODUCTION TO PLANT MOLECULAR BIOLOGY

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

Data Science 101: an introduction

Prerequisites: IST 241 (Introduction to DNA Cloning) or permission of instructor.

Analysis of Microarray Data

BIOLOGY 200 Molecular Biology Students registered for the 9:30AM lecture should NOT attend the 4:30PM lecture.

M252 M262 Course Syllabus M252A-B & M262A-B. Molecular Mechanisms of Human Diseases. Course Director: Professor Yibin Wang

COURSE APPROVAL DOCUMENT Southeast Missouri State University. Department: Biology Course No. BI 283. Title of Course: Genetics Date: Fall 2017

COMP 555 Bioalgorithms. Fall Lecture 1: Course Introduction

COURSE CATALOGUE UH SEMESTER 3

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

Practical Bioinformatics for Biologists (BIOS 441/641)

Course Syllabus for FISH/CMBL 7660 Fall 2008

2017 Qualifying Examination

Biology 142 Advanced Topics in Genetics and Molecular Biology Course Syllabus Spring 2006

GREG GIBSON SPENCER V. MUSE

Introduction to Bioinformatics Finish. Johannes Starlinger

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

AGR 5307: Molecular Genetics for Crop Improvement Course Objectives: Learning Outcomes: 65 % lectures 15 % laboratory demonstrations 15 % papers 5 %

BIOLOGY 429 PROTEOMICS LABORATORY Fall 2004

Bioinformatics for Proteomics. Ann Loraine

Applied Bioinformatics

Chemistry 171: Biological Synthesis

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Bioinformatics 2. Lecture 1

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Syllabus for Biology 421L and 422L General information Biology 422 has an optional lab. There are two lab sections: a two credit section on Tuesday

China National Grid --- BioNode. Jun Wang Beijing Genomics Institute

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

CHEM 4411 / 4421 Biological Chemistry Lab I and II

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Computational Biology

BioG 1110 Spring 2010 Syllabus

Computing with large data sets

Master's Programme in OMICS. Academic regulations

Workshop on Genomics. Český Krumlov Monday, January 7, 13

Survey of Chemistry I Lecture Office hours Overall course objectives

Transcription:

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter gitter@biostat.wisc.edu www.biostat.wisc.edu/bmi776/ These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter

Agenda Today Introductions Course information Overview of topics

Course Web Site www.biostat.wisc.edu/bmi776/ Syllabus and policies Readings Tentative schedule Lecture slides (draft posted before lecture) Announcements Homework Project information Link to Piazza discussion board

Your Instructor: Anthony Gitter Email: gitter@biostat.wisc.edu Office: room 3268, Discovery Building Assistant professor, Biostatistics & Medical Informatics Affiliate faculty, Computer Sciences Investigator, Morgridge Institute for Research Research interests: biological networks, time series analysis, applied machine learning

Your TA: Fangzhou Mu Email: fmu2@wisc.edu Office: WID 3365C Graduate student, Pharmacy

Office Hours Instructor: Tuesday and Thursday, 2:30-3:30 PM Immediately after class TA: Monday, 1:00-2:00 PM

Finding Our Offices: Discovery Building our offices Discovery Building Engineering Hall Union South Computer Sciences 3 rd floor has restricted access See Piazza before 1/28 if you need building access Stop at visitor desk to call my office if card does not work

Finding Our Offices: Discovery Building my office TA office

You So that we can all get to know each other better, please tell us your name major or graduate program research interests and/or topics you re especially interested in learning about favorite programming language

Course Requirements 4 or 5 homework assignments: ~40% Written exercises Programming (Python) Computational experiments (e.g. measure the effect of varying parameter x in algorithm y) Five late days permitted Project: ~25% Midterm: ~15% Final exam: ~15% Class participation: ~5%

Exams Midterm: March 13, in class Final: Wednesday May 9, 12:25-2:25 PM Let me know immediately if you have a conflict with either of these exam times

Computing Resources for the Class Linux servers in Dept. of Biostatistics & Medical Informatics No lab, must log in remotely (use WiscVPN) Will create accounts for everyone on course roster Two machines mi1.biostat.wisc.edu mi2.biostat.wisc.edu HW0 tests your access to these machines Homework must be able to run on these machines CS department usually offers Unix orientation sessions at beginning of semester

Programming Assignments All programming assignments require Python Project can be in any language Have a Python 3 environment on biostat servers Permitted packages on course website Can request others HW0 will be Python introduction Use Piazza for Python discussion If you know Python, please help answer questions

Project Design and implement a new computational method for a task in molecular biology Improve an existing method Perform an evaluation of several existing methods Run on real biological data Suggestions will be provided Not directly related to your existing research Can email me now to discuss ideas

Participation Do the assigned readings before class Show up to class No one will have the perfect background Ask questions about computational or biological concepts Correct me when I am wrong Seriously, it will happen Piazza discussion board Questions and answers

Piazza Discussion Board Instead of a mailing list http://piazza.com/wisc/spring2018/bmics776/home Post your questions to Piazza instead of emailing the instructor or TA Unless it is a private issue or project-related Answer your classmates questions Announcements will also be posted to Piazza Supplementary material for lecture topics

Course Readings Mostly articles from the primary literature Must be using a campus IP address to download some of the articles (can use WiscVPN from off campus) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Cambridge University Press, 1998.

Prerequisites BMI/CS 576 or equivalent Knowledge of basic biology and methods from that course will be assumed May want to go over the material on the 576 website to refresh http://www.biostat.wisc.edu/bmi576/

What you should get out of this course An understanding of some of the major problems in computational molecular biology Familiarity with the algorithms and statistical techniques for addressing these problems How to think about different data types At the end you should be able to Read the bioinformatics literature Apply the methods you have learned to other problems both within and outside of bioinformatics Write a short bioinformatics research paper

Major Topics to be Covered (the algorithms perspective) Expectation Maximization Gibbs sampling Mutual information Multiple hypothesis testing correction Convolutional neural networks Linear programming Interpolated Markov models Duration modeling and semi-markov models Tries and suffix trees Markov random fields Stochastic context free grammars

Major Topics to be Covered (the task perspective) Modeling of motifs and cis-regulatory modules Identification of transcription factor binding sites Genotype analysis and association studies Regulatory information in epigenomic data Transcriptome quantification Mass spectrometry peptide and protein identification Pathways in cellular networks Gene finding Large-scale sequence alignment RNA sequence and structure modeling

Motif Modeling What sequence motif do these promoter regions have in common?

cis-regulatory Modules What configuration of sequence motifs do these promoter regions have in common? RNAP accgcgctgaaaaaattttccgatgagtttagaagagtcaccaaaaaattttcatacagcctactggtgttctctgtgtgtgctaccactggctgtcatcatggttgta RNAP caaaattattcaagaaaaaaagaaatgttacaatgaatgcaaaagatgggcgatgagataaaagcgagagataaaaatttttgagcttaaatgatctggcatgagcagt RNAP gagctggaaaaaaaaaaaatttcaaaagaaaacgcgatgagcatactaatgctaaaaatttttgaggtataaagtaacgaattggggaaaggccatcaatatgaagtcg

Genome-wide Association Studies Which genes are involved in diabetes?

Noncoding Genetic Variants How do genetic variants outside protein coding regions impact phenotypes? Patton and Harrington, Nature, 2013

Transcriptome Analysis with RNA-Seq What genes are expressed and at what levels? Scale chr1: 10 _ 100 bases 133546350 133546400 133546450 133546500 133546550 133546600 133546650 133546700 SRR065546 SRR065546 0 _ SRR065546_bam env Ctse UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Proteomic Analysis with Mass Spectrometry What proteins are expressed and at what levels? Steen and Mann, Nature Reviews Molecular Cell Biology, 2004

Identifying Signaling Pathways How do proteins coordinate to transmit information? Yeger-Lotem et al., Nature Genetics, 2009

Gene Finding Where are the genes and functional elements in a genome?

Large Scale Sequence Alignment What is the best alignment of these 6 genomes?

RNA Sequence and Structure Modeling How can we identify sequences that encode this RNA structure?

Other Topics Many topics we aren t covering Protein structure prediction Protein function annotation Modeling long reads Metagenomics Metabolomics Sequence compression Graph genomes Single-cell sequencing Pseudo- and quasi-alignment Text mining Others?

Reading Groups Computational Systems Biology Reading Group http://lists.discovery.wisc.edu/mailman/listinfo /compsysbiojc AI Reading Group http://lists.cs.wisc.edu/mailman/listinfo/airg ComBEE Python Study Group https://combee-uwmadison.github.io/studygroup/ Many relevant seminars on campus