Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Similar documents
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics

Lecture 1: Introduction to Population Genetics. August 20, 2012

Biotechnology, People and the Environment (3 credits) (CFAN 1501) Spring Semester 2014 Syllabus

Biology 142 Advanced Topics in Genetics and Molecular Biology Course Syllabus Spring 2006

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018

Agronomy Genetics Online Course Syllabus

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Undergraduate Research in the Brzustowicz Laboratory Human Genetics Institute Department of Genetics Rutgers University

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Office hours: Wednesday from 11:30 to 12:30 and Friday from 11:30 to 12:30. Lectures: Wednesday and Friday 10:15 AM -11:30 AM, Room HB-130

BIOL 205. Fall Term (2017)

LeTourneau University BIOL General Genetics and Laboratory

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Chemistry 171: Biological Synthesis

Course: IT and Health

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Texas A&M University Departmental Request for a Change in Course Undergraduate Graduate Professional Submit original form and attachments.

Course Justifications for ANSC 650 (DNA and Genetic Analysis) 1. What is the new course?

Introduction to Genome Science - BISC 434 Syllabus Spring Semester

BIO-2060: PRINCIPLES OF GENETICS

Computational Genomics ( )

Biology 423L Syllabus 2016 Laboratory Experiments in Genetics

Crash-course in genomics

Course Syllabus for FISH/CMBL 7660 Fall 2008

bioinformatica 6EF2F181AA1830ABC10ABAC56EA5E191 Bioinformatica 1 / 5

PBIO4_5280_Laboratory in Genomics Techniques (Spring 2017)

CHEM 4411 / 4421 Biological Chemistry Lab I and II

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

ZOO 4926 Special Topics: Genomics and Biotechnology

Technician: Dionne Lutz, BS: Biology & MsED Office: Kanbar Center, Room 704, 41 Cooper Sq. (212) (office)

Texas A&M University Departmental Request for a Change in Course Undergraduate Graduate Professional Submit original fonn and attachments.

M252 M262 Course Syllabus M252A-B & M262A-B. Molecular Mechanisms of Human Diseases. Course Director: Professor Yibin Wang

RSCI301 Human Genetics

9 polymorphisms (SNPs) DNA fingerprinting with microsatellites, restriction

BIOL 461/ 661 Cell Biology 4 Credits Instructor: Dr. Kristin O Brien. T/TH 11:30-1:00, Irving I 208 Office hours: M 9-10, TH 3-4

Principles of Genetics, Spring 2013, 4.0 credits

Instructor: Jianying Zhang, M.D., Ph.D.; Office: B301, Lab: B323; Phone: (O); (L)

Syllabus for BIOS 101, SPRING 2013

CS 6824: New Directions in Computational Systems Biology

DNA. Grade Level: 5-6

Course syllabus Sales and distribution

AGR 5307: Molecular Genetics for Crop Improvement Course Objectives: Learning Outcomes: 65 % lectures 15 % laboratory demonstrations 15 % papers 5 %

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

0. Courses Other Information 1. The Molecules of Life 2. The Origin of Life 3. The Cell an Introduction

COURSE APPROVAL DOCUMENT Southeast Missouri State University. Department: Biology Course No. BI 283. Title of Course: Genetics Date: Fall 2017

Agronomy 315 Genetics Non-Credit Online Course Syllabus

Microbial Physiology Fall, 2014 MICRO-3345

Integrated M.Tech. in Biotechnology (B.Tech + M.Tech) programme. Semester 1

AGENDA for 10/10/13 AGENDA: HOMEWORK: Due end of the period OBJECTIVES: Due Fri, 10-11

Linking Genetic Variation to Important Phenotypes

Introduction to Algorithms in Computational Biology Lecture 1

Biology 210- Introduction to Cellular and Molecular Biology

BEE 4590/6590 Biosensors and Bioanalytical Techniques

Syllabus for Biology 421L and 422L General information Biology 422 has an optional lab. There are two lab sections: a two credit section on Tuesday

Genomics and Bioinformatics GMS6231 (3 credits)

Elective Type: Credit Hours: 4 Corequisites: Developmental: (yes/no) Lecture: 3 Clinical: 0 Lab: 3 Studio 0 Other: 0. TOTAL: 6 Other Requirements:

Course Syllabus of ISOM 2700 Operations Management

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Practical Bioinformatics for Biologists (BIOS 441/641)

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

SCHOOL OF PHARMACY AND HEALTH SCIENCES

Introduction to BIOINFORMATICS

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

SNPs - GWAS - eqtls. Sebastian Schmeier

SYLLABUS: 53:154 ENVIRONMENTAL MICROBIOLOGY Assoc. Prof. Timothy E. Mattes Phone: Lectures: Mon/Wed/Fri 12:30 1:20 p.m.

COURSE INFORMATON BTEC 504. Prof. Dr.Recep Serdar Alpan. Prof. Dr.Recep Serdar Alpan Prof. Dr. Ece Genç NONE

Functional Genomics Research Stream

ANIMAL SCIENCE COURSE PLANNING

By the end of today, you will have an answer to: How can 1 strand of DNA serve as a template for replication?

General Education Learning Outcomes

SCSC, GENE, MEPS and BIOT 654: Analysis of Complex Genomes (Lec) Spring 2018

DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE

Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

MAPPING GENES TO TRAITS IN DOGS USING SNPs

BIOLOGY 247 Applied Biosciences: Biotechnology Spring 2007 Phoenix College SYLLABUS

BIOT 107: BIOTECHNOLOGY: TRANSFORMING SOCIETY THROUGH BIOLOGY

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Short Course Instructors

So, just exactly when will genetics revolutionise medicine? - Part 1

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

UNIT OUTLINE. Biotechnology 331. Unit Code Unit Coordinator: Dr Keith Gregg. Semester School of Biomedical Sciences

Kickstart Biology. Year 11 and Year 12

Algorithms for Genetics: Introduction, and sources of variation

Drugs and Human Behavior Course Syllabus SPRING 2018

Introduction to Quantitative Genomics / Genetics

Drugs and Human Behavior Course Syllabus SPRING 2019

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

ADAMAS UNIVERSITY FACULTY OF SCIENCE - DEPARTMENT OF BIOTECHNOLOGY BACHELOR OF SCIENCE (Honours) SEMESTER - I

Answers to additional linkage problems.

Algorithms in Bioinformatics ONE Transcription Translation

Bio Course Syllabus Spring Biology Spring Biology Second Year Laboratory

UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY

B. Information for Spring Semester TIMES: Lecture 1 M,W,F 12:00-12:50 Room Science A109 Lab 1 T 11:00-13:50 Science D118

Transcription:

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine Institute for Computational Biomedicine jgm45@cornell.edu Cornell TA: Mahya Mehrmohamadi mm2489@cornell.edu WCMC TA: Jin Hyun Ju jj328@cornell.edu Spring 2014: Jan. 28 - May 10 T/Th: 8:40-9:55

Why you re here

Today Logistics (time/locations, registering, syllabus, schedule, requirements, computer labs, video-conferencing, etc.) Intuitive overview of the goals and the field of quantitative genomics The foundational connection between biology and probabilistic modeling Begin our introduction to modeling and probability

Times and Locations 1 This is a distance learning class taught in two locations: Cornell, Ithaca and Weill, NYC I will teach all lectures from EITHER Ithaca or NYC (all lectures will be video-conferenced) I expect questions from both locations Lectures will be recorded: These will be posted along with slides / notes These will also function as backup (if needed) I encourage you to come to class...

Times and Locations II Lectures are (almost) every Tues. / Thurs. 8:40-9:55AM - see class schedule Ithaca lecture will always be 224 Weill Hall DEPENDING ON THE DATE, the Weill lecture location will be: Belfer 204B or 204C Weill-Greenberg, 2nd floor A or B A spreadsheet will be made available with these locations (please read it carefully!!)

Times and Locations III There is a REQUIRED computer lab for this course (if you take the course for credit) Note that the computer lab for both Cornell and WCMC, the lab will meet 5-6PM on Thurs. (!!) - if you have an unavoidable conflict at this time, please send me an email (we will do our best to accommodate but...) In Ithaca will be taught by Mahya in room MNLB30A (!!) Mann library In NYC will be taught by Jin - same issue as class, depends on the week (see same spreadsheet...) Please bring your own laptop the first week (please email me if this is an issue) THE FIRST COMPUTER LAB IS NEXT WEEK (!!)

Times and Locations IV Office hours: Jason will hold office hours on both campuses by video-conference each Thurs. 3-5PM - locations will be in 101 Biotech in Ithaca and in NYC, the main conference room of the Dept. of Genetic Med.,13th floor, Weill-Greenberg (subject to change!) Mahya will hold office hours for Ithaca students only on Tues. 3-5PM in 101 Biotechnology Building Jin will not have official office hours NOTE: unofficial help sessions can be scheduled with Jason, Mahya, or Jin by appointment NO office hours this week (!!)

Email list There is an official class email list that you must be on (officially registered or not): mezey-groupm-l@cornell.edu All information (short notice change in classrooms, homework announcements, etc.) will be distributed using this list (!!) so please make sure you are on it! To get on this list (or to be removed) In Ithaca email Mayha: mm2489@cornell.edu In NYC email Jin: jj328@cornell.edu

Website The class website will be a under the Classes link on my site: http://mezeylab.cb.bscb.cornell.edu/

Website resources We will post information about the course and a schedule updated during the semester (check back often!!) There is no textbook for the class but I will post slides for all lectures I will post detailed notes for most lectures - there may be a significant delay for these posts (!!) There will also be supplementary readings (and other useful documents) that will be posted We will post videos of lectures and lecture slides (1-2 day delay in most cases) We will post all homeworks, exams, keys, etc. We will post slides for the computer labs and code

Registering for the class I You may take this class for a letter grade, S/U, or Audit If you can register for this class, please do so (even if you plan to audit!!) If you cannot register (you are a student at MSKCC, have a conflict, you are a postdoc, lab tech, etc.) or do not wish to register you are still welcome to sit in the class If you audit or do not register officially, I strongly recommend that you do the work for the class, i.e. homework/exams/project/lab (we will grade your work!) My observation is that you are likely to be wasting your time if you do not do the work but I leave this up to you...

Registering for the class II In Ithaca: You must register for both the lecture (3 credits) and computer lab (1 credit) if you take the course for a letter grade If you are an undergraduate, register for BTRY 4830 (lecture and lab); graduate student, register for BTRY 6830 (same) In NYC: Weill: the course (PBSB.5021.01) should be available in the Graduate School drop-down at learn.weill.cornell.edu (2015-2016 Spring, Graduate-Quarter 3-4) Rockefeller: email Kristen Cullen cullenk@mail.rockefeller.edu Please contact me if there are any issues with registering (!!)

Grading We will grade undergraduates and graduates separately (!!) Grading: problem sets (20%), computer lab attendance (5%), project (25%), mid-term (20%), final (30%) A short problem set (almost) every week Exams will be take-home (open book) A single project (~1 month)

Should I be in this class? No probability or statistics: not recommended Limited probability or statistics (high school, a long time ago, etc.): if you take the class be ready to work (!!) Prob / stats (e.g. BTRY 4080+4090 or BTRY 6010+6020 in Ithaca, Quantitative understanding in biology at Weill, etc.): you ll be fine No or limited exposure to genetics: you ll be fine No or limited exposure to programming: you ll be fine (we will teach you programming in R from the ground up) Strong quantitative background (e.g. stats or CS graduate student): you may find the intuitive discussion of quantitative subjects and the applications interesting

What you will learn in this class I A rigorous introduction to basics of probability and statistics that is intuition based (not proof based) Foundational concepts of how probability and statistics are at the core of genetics, which are complete enough to build additional / more advance understanding (i.e., enough to get your hooks into the subject ) Exposure to many advanced probability / statistics / genetics / algorithmic concepts that will allow you to build additional understanding beyond this class (as brief as a mention to entire lectures - depending on the subject) Clear explanations for convincing yourself that the basics of mathematics and programing are not hard (i.e. anyone can do it if they devote the time)

What you will learn in this class II An intuitive and practical understanding of linear models and related concepts that are the foundation of many subjects in statistics, machine learning, and computational biology The computational approaches necessary to perform inference with these models (EM, MCMC, etc.) The statistical model and frameworks that allow us to identify specific genetic differences responsible for differences in organisms that we can measure You will be able to analyze a large data set for this particular problem, e.g. a Genome-Wide Association Study (GWAS) You will have a deep understanding of quantitative genomics that from the outside seems diffuse and confusing

Questions about logistics?

Subject overview We know that aspects of an organism (measurable attributes and states such as disease) are influenced by the genome (the entire DNA sequence) of an individual This means difference in genomes (genotype) can produce differences in a phenotype: Genotype - any quantifiable genomic difference among individuals, e.g. Single Nucleotide Polymorphisms (SNPs). Other examples? Phenotype - any measurable aspect of an organisms (that is not the genotype!). Examples?

Example: People are different... An illustration Physical, metabolism, disease, countable ways. We know that environment plays a role in these differences...and for many, differences in the genome play a role For any two people, there are millions of differences in their DNA, a subset of which are responsible for producing differences in a given measurable aspect.

An illustration continued... The problem: for any two people, there can be millions of differences their genomes... How do we figure out which differences are involved in producing differences and which ones are not? This course is concerned with how we do this. Note that the problem (and methodology) applies to any measurable difference, for any type of organism!!

Why do we want to know this? If you know which genome differences are responsible: From a child s genome we could predict adult features We target genomic differences responsible for genetic diseases for gene therapy We can manipulate genomes of agricultural crops to be disease resistant strains We can explain why a disease has a particular frequency in a population, why we see a particular set of differences These differences provide a foundation for understanding how pathways, developmental processes, physiological processes work The list goes on...

Quantitative genetics and connection to other disciplines Quantitative genomics is a field concerned with the modeling of the relationship between genomes and phenotypes and using these models to discover and predict Broad Classification of Fields of Genetics: Modeling Genetic Fields: quantitative genetics; system genetics; population genetics; etc. Note (!!) Take Prof. Alon Keinan s Population Genetics Class (!!) T/Th 10:10-11:25 Comstock B108 http://keinanlab.cb.bscb.cornell.edu/content/btry-6820-4820-2016 Mechanism Genetic Fields: Molecular Genetics; Cellular Genetics; etc. Model System Genetic Fields: Human Genetics; Yeast Genetics; etc. Extension Genetic Fields: Medical genetics; Developmental Genetics; Evolutionary Genetics; Agricultural Genetics, etc.

History of genetics (relevant to Quantitative Genetics) In sum: during the last decade, the greater availability of DNA sequence data has completely changed our ability to make connections between genome differences and phenotypes

Connection of genomics-genetics Traditionally, studying the impact / relationship of the genome to phenotypes was the province of fields of Genetics Given this dependence on genomes, it is no surprise that modern genetic fields now incorporate genomics: the study of an organism s entire genome (wikipedia definition) However, one can study genetics without genomics (i.e. without direct information concerning DNA) and the merging of geneticsgenomics is quite recent

Present / future: advances in nextgeneration sequencing driving the field

Why this is a good time to be learning about this subject Mapping (identifying) genotypes (genetic loci) with effects on important phenotypes is fast becoming the major use of genomic data and a major focus of genomics However, the data collection, experimental, and statistical analysis techniques for doing this are still being developed The current statistical approaches are the focus of this course (i.e., you will have a solid foundation by the end) The importance is just now starting to permeate broadly (i.e., we are entering the internet generation for genomics and the impact of genomics on biology)

Foundational biology concepts In this class, we will use statistical modeling to say something about biology, specifically the relationships between genotype (DNA) and phenotype Let s start with the biology by asking the following question: why DNA? The structure of DNA has properties that make it worthwhile to focus on...

It s the same in all cells Figure 1: A simplified schematic showing genome organization in human cells. The DNA of a genome is located within the nucleus of a cell. The genome is organized in long strings that are tightly coiled around protein structures to form chromosomes. Each string is a double helix where the building blocks are A-T and G-C nucleotide pairs c kintalk.org. with a few exceptions (e.g. cancer, immune system...)

It s passed on to the next generation

Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004 Credit: Jones and Pevzner, An Introduct It has convenient structure for quantifying differences Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004 Credit: Watson et al.,

It s responsible for the construction and maintenance of organisms Note: other regions of genomes can impact phenotypes...

Statistics and probability I Quantitative genomics is a field concerned with the modeling of the relationship between genomes and phenotypes and using these models to discover and predict We will use frameworks from the fields of probability and statistics for this purpose Note that this is not the only useful framework (!!) - and even more generally - mathematical based frameworks are not the only useful (or even necessarily the best ) frameworks for this purpose So, why use a probability and statistics framework? Let s start by considering a definition of probability

Statistics and probability II A non-technical definition of probability: a mathematical framework for modeling under uncertainty Such a system is particularly useful for modeling systems where we don t know and / or cannot measure critical information for explaining the patterns we observe This is exactly the case we have in quantitative genomes when connecting differences in a genome to differences in phenotypes

Statistics and probability III We will therefore use a probability framework to model, but we are also interested in using this framework to discover and predict More specifically, we are interested in using a probability model to identify relationships between genomes and phenotypes using DNA sequences and phenotype measurements For this purpose, we will use the framework of statistics, which we can (non-technically) define as a system for interpreting data for the purposes of prediction and decision making given uncertainty

Examples of successful applications of the framework Published Genome-Wide Associations through 12/2012-8 for 17 trait categories P NHGRI GWA Catalog www.genome.gov/gwastudies www.ebi.ac.uk/fgpt/gwas/ 5.5 5.0 4.5 ERAP2 expression 3.5 4.0 5.5 5.0 4.5 ERAP2 expression 4.0 3.5 -log10(p-value) 6.0 No eqtl 6.0 cis eqtl A/A A/G G/G T/T rs27290 genotype T/C C/C rs1908530 genotype Chromosome

That s it for today Next lecture, we will begin our formal and technical introduction to probability We will start by defining the concepts of a system, experiments and experimental trials, and sample outcomes and sample spaces