Genetics and Psychiatric Disorders Lecture 1: Introduction

Similar documents
Human linkage analysis. fundamental concepts

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genome-Wide Association Studies (GWAS): Computational Them

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Concepts: What are RFLPs and how do they act like genetic marker loci?

Genome-wide association studies (GWAS) Part 1

Answers to additional linkage problems.

An introduction to genetics and molecular biology

Computational Workflows for Genome-Wide Association Study: I

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

H3A - Genome-Wide Association testing SOP

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

An introductory overview of the current state of statistical genetics

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Observing Patterns in Inherited Traits. Chapter 11

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Oral Cleft Targeted Sequencing Project

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

MICROSATELLITE MARKER AND ITS UTILITY

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Mutations during meiosis and germ line division lead to genetic variation between individuals

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

General aspects of genome-wide association studies

AP BIOLOGY Population Genetics and Evolution Lab

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Gene mutation and DNA polymorphism

Population and Community Dynamics. The Hardy-Weinberg Principle

PLINK gplink Haploview

Population Genetics (Learning Objectives)

LECTURE 5: LINKAGE AND GENETIC MAPPING

Population Genetics in the Genomic Era

Biol Lecture Notes

Runs of Homozygosity Analysis Tutorial

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Genetic variation, genetic drift (summary of topics)

Linkage Disequilibrium. Adele Crane & Angela Taravella

Chapter 4 Gene Linkage and Genetic Mapping

Exploring the Genetic Basis of Congenital Heart Defects

Conifer Translational Genomics Network Coordinated Agricultural Project

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino

BTRY 7210: Topics in Quantitative Genomics and Genetics

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

JS 190- Population Genetics- Assessing the Strength of the Evidence Pre class activities

Human Genome Most human cells contain 46 chromosomes: Statistical Human Genetics Linkage and Association Haplotyping algorithms.

3I03 - Eukaryotic Genetics Repetitive DNA

PERSPECTIVES. A gene-centric approach to genome-wide association studies

Gen e e n t e i t c c V a V ri r abi b li l ty Biolo l gy g Lec e tur u e e 9 : 9 Gen e et e ic I n I her e itan a ce

REVIEWS GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS. Joel N. Hirschhorn* and Mark J. Daly*

Report. General Equations for P t, P s, and the Power of the TDT and the Affected Sib-Pair Test. Ralph McGinnis

Using the Association Workflow in Partek Genomics Suite

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual

Basic Concepts of Human Genetics

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Bio 311 Learning Objectives

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

Statistical Methods for Genome Wide Association Studies

Genetics of extreme body size evolution in mice from Gough Island

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

Topic 11. Genetics. I. Patterns of Inheritance: One Trait Considered

MENDELIAN GENETICS This presentation contains copyrighted material under the educational fair use exemption to the U.S. copyright law.

Using Single Nucleotide Polymorphism (SNP) to Predict Bitter Tasting Ability

Observing Patterns In Inherited Traits

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

EVOLUTION/HERDEDITY UNIT Unit 1 Part 8A Chapter 23 Activity Lab #11 A POPULATION GENETICS AND EVOLUTION

Fine-scale mapping of meiotic recombination in Asians

MAPPING GENES TO TRAITS IN DOGS USING SNPs

Chapter 11: Genome-Wide Association Studies

BS 50 Genetics and Genomics Week of Nov 29

Mapping and Mapping Populations

Chapter 14: Mendel and the Gene Idea

Basic Concepts of Human Genetics

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Chicken Genomics - Linkage and QTL mapping

Einführung in die Genetik

GENETICS. I. Review of DNA/RNA A. Basic Structure DNA 3 parts that make up a nucleotide chains wrap around each other to form a

DO NOT OPEN UNTIL TOLD TO START

The principles of QTL analysis (a minimal mathematics approach)

SNPs - GWAS - eqtls. Sebastian Schmeier

Axiom mydesign Custom Array design guide for human genotyping applications

A haplotype map of the human genome

Molecular markers in plant breeding

Conifer Translational Genomics Network Coordinated Agricultural Project

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Lab 8: Population Genetics and Evolution. This may leave a bad taste in your mouth

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

Concepts and relevance of genome-wide association studies

Age-Adjusted Death Rates for Coronary Heart Disease, U.S.,

How about the genes? Biology or Genes? DNA Structure. DNA Structure DNA. Proteins. Life functions are regulated by proteins:

How to use genetic methods for detecting linkage. David D. Perkins. Background

Genomic Selection Using Low-Density Marker Panels

Amapofhumangenomevariationfrom population-scale sequencing

Linking Genetic Variation to Important Phenotypes

Transcription:

Genetics and Psychiatric Disorders Lecture 1: Introduction Amanda J. Myers LABORATORY OF FUNCTIONAL NEUROGENOMICS

All slides available @: http://labs.med.miami.edu/myers Click on courses First two links Pswd: Lecture 1=furbies1 Lecture 2=furbies2

Epidemiology, Genetics, and Computational biology these are not the same things.and take care when identifying a analyst for your study

Epidemiology- not what we will discuss Study of psychiatry in relation to populations and environments Not genetics, some overlap of tests, but many of tests in epidemiology are ONLY appropriate for samples collected as populations Some tests: Risk ratios, survival curves Some studies: Religious Orders Study, Honolulu-Asia Aging Study, Baltimore Longitudinal Study on Aging, Genetics of Healthy Aging, Kungsholmen Project, SardiNIA Programs employed: SAS/ SPSS

Genetics Study of psychiatry in relation to families or groups Some tests: Odds ratios, chi tests, LOD scores, transmission disequilibrium test, t-tests, ANOVA, generalized linear models Some studies: next lecture gives a few examples Programs employed: SAS/ SPSS, GENEHUNTER/GENEFINDER, MERLIN, ALLEGRO, ARLEQUIN, SOLAR, CLUMP, CYRILLIC, EH, EH-PLUS, FASTLINK, FBAT and on and on Rockefeller keeps a good list: http://linkage.rockefeller.edu/soft/list1.html#d

Computational/Bioinformatics Genetics on steroids usually these studies deal with case/control studies on a grandiose scale (i.e. full genome * 10K samples) Some tests: Similar to conventional genetics, but dealing with this level of data requires a new set of programming skills Non-independence of tests + corrections Some studies: GWAS, Next Generation Sequencing Programs usually employed: PLINK, Birdseed, Affymetrix (Affymetrics Power Tools) and Illumina (Genomestudio)

CONCEPTS

Hypothesis Testing The Basics A lot of human genetics relies on this Always think about what the null hypothesis is for the study/ test statistic of interest The null is not always what you think P-values Conclusiveness index NOT that your data is right, more of a measure of how not wrong it is Formally: it is a measure of the probability of obtaining a result equal or greater than your test statistic given that the null hypothesis is true P-VALUES HAVE NOTHING TO DO WITH EFFECT SIZE/BIOLOGICAL IMPORTANCE

Alpha + Beta The Basics For every study these two statistics are the framework around which studies are designed - TYPE I error Acceptable false positive error rate: i.e. reject the null hypothesis when it is true not the same as the p-value- alpha is set prior to the study, and is inherent to the study design, typically set to 0.05 -TYPE II error False negative error rate i.e. retain the null hypothesis when it is false

OUR DECISION IN THE STUDY Alpha + Beta The Basics REALITY null hypothesis is true test hypothesis is true Accept null hypothesis correct decision TYPE II error 1-alpha beta Reject null hypothesis TYPE I error correct decision alpha 1-beta, equivalent to power

Power The Basics For every study this is used to predict the appropriate sample size for the study of interest However, this calculation relies on input that is inherently unknown Power= 1-beta Input to determine Power is inherent to study design i.e. for a case-control screen, you need: Alpha, your predicted model for how the gene is inherited (recessive or dominant), disease prevalence, relative risk of disease allele, and the measure of linkage disequilibrium between the real locus and the marker you typed and sample size One Program: GPC: http://pngu.mgh.harvard.edu/~purcell/gpc/ There are many others.

Power Sample sizes IMHO: The Basics FAMILY BASED STUDIES: 1. Linkage Screens- bigger the better, restricted by finding large families with multiple generations affected -first early onset Alzheimer s gene-> pedigree ~40 people -second early onset Alzheimer s gene-> pedigrees ~ 100 people 2. Sibling-pair Screens ~ 200-600 sibling pairs 3. Transmission disequilibrium screens ~ 200-600 sibling pairs QUANTITATIVE TRAIT STUDIES: N~ 100-1000, depends upon what your effect size is CASE CONTROL STUDIES N ~ 100-40,000 Case in point: APOE verses CLU

The Basics Allele Variants of a particular gene/dna position Haplotype Collection of alleles that are transmitted together that occur on a single chromosome Phase Allele transmission relative to parental inheritance If allele 1 from Marker 1 is inherited from the same parent as allele 1 from Marker 2, they are in phase Homozygotes are phase known

Penetrance: The Basics Important for both linkage and association screens Does the allele I just mapped cause disease? Fully penetrant alleles = 1:1 relationship between that allele and genotype Everything else is a risk factor Typically balance between: Common allele, low penetrance Rare allele, high penetrance Determination of penetrance can be tricky because of sampling issues and ascertainment bias

The Basics Two main flavors of screens: 1. Linkage 2. Association Two main flavors of samples: 1. Family based Can get both linkage and/or association 2. Unrelated/ case control Can only get association

FAMILY BASED SCREENS

Linkage Marker A Disease locus? marker b Linkage is determining the position of a disease locus in relation to marker loci along a chromosome Is my disease locus in the same block of inherited chromosome as Marker A or marker b (or both)? Null: there is recombination between trait and marker Typically big distances (depends upon number of generations) Not allele specific

The Basics Distance: Physical- # of base-pairs between 2 points on a chromosome the glory of the sequencing of the human genome, this can now be perfectly mapped Genetic- either recombination fractions ( ) or map length : the probability that 2 alleles at 2 different loci are derived from different parental chromosomes = 0= no recombination = 0.5= loci are unlinked Only observed if there is an odd number of cross-overs Centimorgan (Map length): counts of average number of cross-overs in a particular interval of a single chromatid On average 1 megabase= 1% recombination= 1cM, but genetic distances are location specific

Markers Microsatellites Also called: SSR : simple sequence repeat STR: short tandem repeat STRP: short tandem repeat polymorphism VNTR: variable nucleotide tandem repeat Simple runs of repetitive DNA sequence Dinucleotide i.e. (CA)n Trinucleotide- i.e. (ATA)n Tetranucleotide- i.e. (GATA)n - Preference: tetras>tris>dis- because run out on PAGE gels, better separation with tetras, but tetras less common Scoring by size range, which is equivalent to number of repeats, except where have slippage usually bin sizes

Markers Microsatellites Naming by HUGO nomenclature D/S number is used to indicate anonymous DNA sequences D= chromosome S=segment E= expressed D10S1211 is not expressed, D3S2250E is expressed Advantages: Highly informative Dispersed throughout the genome Easy to type (used to be true) Disadvantages Easy to type is relative- no chip technology Distance is based on recombination different in males/females- maps are typically averaged

Markers Microsatellites Where find the maps? Originally done by Marshfield= 8,325 markers CEPH families Predominantly from UTAH 8 families, ~ 130 individuals Now DECODE Maps HUGE Icelandic pedigrees 146 families, 869 individuals, 1257 meiotic events

Method PAGE: polyacrylamide gel electrophoresis Marker 1 Marker 2 Marker 3 Standard Sizing of repeats

Test Statistic Likelihood Ratio LR: P data Test Hypothesis vs. P data Alternate hypothesis H a : P (observed marker data 0< <0.5) i.e. the recombination fraction between marker and trait is small H 0 : P (observed marker data =0.5) i.e. the marker is unlinked to the trait LOD scores Logarithm of Odds= log 10 likelihood ratio LRs are log transformed so that they can be summed across families or studies LOD= 3 means data is roughly 100 times more likely as a LOD=1 Multipoint LOD score= MLS Look at the information at the marker loci as well as between markers Disease locus could be anywhere along the chromosome

FAMILY BASED SCREENS STANDARD LINKAGE

Linkage: Standard Analysis Need: Multiple generation pedigrees Model of disease inheritance Procedure: Follow segregation of markers throughout a family, trace which regions of the genome cosegregate with disease phenotype Issues: Late onset diseases, no parents Diseases with genetic heterogeneity, can t properly specify the model of inheritance

FAMILY BASED SCREENS LINKAGE, SIBLING PAIRS

Linkage: Allele Sharing Methods Compare genotypes of affected sibling pairs (ASPs) do not need parental genotypes try to test whether the inheritance pattern of a particular region is not consistent with random segregation Know chance probability of siblings sharing DNA: 50% of the time sibs will share 1 allele 25% of the time they will share no alleles 25% of the time they will share both alleles Non-parametric No model of inheritance specified b/c the goal is not to follow a particular allele through a family but to look at the pattern of allele sharing within a population. Issues: Power Distance

FAMILY BASED SCREENS TRIOS

Transmission Disequilibrium M1M2 M2M2 M1M2 Transmitted =M1M2 non-transmitted =M2M2 Need: parents and one affected child Procedure: Compare portion of transmitted alleles and untransmitted alleles Null: there is no preferential transmission-i.e. any given allele is transmitted to children 50% of the time Issues: Power, need heterozygotes

Linkage Screen QC Mendelian inheritance errors Genotypes have to be physically possible based on the allelic properties mapped by a monk spending a lot of time with pea plants Gender checks Pedigree errors- i.e. input issues, data reporting

UNRELATEDS SCREENS

Association CASES Allele A allele b CONTROLS Distribution difference? Mapping the relationship between an individual s genotypes and their phenotypes No distance in the statistics- not position like linkage Null: there is no difference in the distribution between affecteds and unaffecteds allele specific (this is not the case for linkage) Issues: Stratification can give a result even if there is no true relationship ETHNICITY

The Basics Distance No distance in the statistics- not position like linkage BUT: need to be aware of the concept of linkage disequilibrium LD UNEQUAL SORTING OF ALLELES SUCH THAT SOME ALLELES APPEAR TOGETHER MORE OFTEN THAN WOULD BE EXPECTED BY CHANCE Don t measure this by transmission, measure this by looking at the probabilities of a given allele relationship Test statistic: D Test haplotype frequencies against the population frequencies of each allele Low frequencies of recombinant haplotypes= high LD High LD typically D >0.8

The Basics LD- take home point 1: In an association study the finding can be that the allele found is merely in LD with the causative allele LD- take home point 2: LD can be due to the fact that the alleles are very close on the chromosome but also because of recent genetic distance. i.e. a rare recent mutation will be in LD with a lot of things

Markers Single Nucleotide Polymorphisms (SNPs) Change of a single nucleotide at a particular location in the genome Naming by dbsnp (www.ncbi.gov) Many different submitters to this DB, all submitted their variations These were then mapped against each other and each variation was assigned a unique number= rs number Advantages: common could be the actual risk affect (i.e. coding change) Easy to type (insanely easy) Disadvantages Bi-allelic, less variability= not as informative as a microsatellite Too much of a good thing how to tell what is really risk.

Markers SNPs Where find the maps? Physical: Human genome sequenced!! Yeah. www.ncbi.nlm.gov, www.ensembl.org, genome.ucsc.edu Frequencies: Hapmap www.hapmap.org African, Asian, European (CEPH) families Measure transmission of mapped SNPs Gets at LD in these families 1,000 genomes project www.1000genomes.org NextGen sequencing on many more people- better info Pilot data: low coverage n=180 Targeted 1,000 genes in 1,000 individuals

Method: old school RFLP: restriction fragment length polymorphism

Method: new school Microarray: Allelic Specific Hybridization Probes designed to hybridize one allele or the other

Method: newest school NextGen: Next Generation Sequencing Sequencing-by-synthesis Sequencing-by-ligation Single Molecule sequencing Unlike current microarrays, no predesign of probes + hybridization necessary -get the actual SNPs!

Test Statistic Chi-test: Compare observed allele counts to expected counts: is there a difference? Null: distribution of alleles is normal OBSERVED: ALLELE 1 allele 2 TOTAL AFFECTEDS A1 a2 A1+ a2 UNAFFECTEDS U1 u2 U1+ u2 TOTAL A1+ U1 a2 + u2 total number of chromosomes EXPECTED: ALLELE 1 allele 2 TOTAL AFFECTEDS A1+ a2 * A1+ U1 / total #chr A1+ a2 * a2 + u2 / total #chr A1+ a2 UNAFFECTEDSU1+ u2 * A1+ U1 / total #chr U1+ u2 * a2 + u2 / total #chr U1+ u2 TOTAL A1+ U1 a2 + u2 total number of chromosomes 2 statistic calculated by subtracting expected table counts from observed table counts, squaring the differences, dividing by the expected numbers and summing to yield the proportion of difference between observed data and expected data. What it isn t: a direct comparison of cases and controls

Test Statistic Odds Ratio: Compare observed allele frequencies to each other: are there more cases for a particular allele? Null: frequencies are the same ALLELE 1 allele 2 AFFECTEDS A1 a2 UNAFFECTEDS U1 u2 OR statistic calculated by cross product A1*u2/U1*a2 Why? A1/U1=odds of being a case given that an individual has ALLELE 1 a2/u2=odds of being a case given that an individual has allele 2 ODDS RATIO= A1/U1 divided by a2/u2, which through simple algebra= A1*u2/U1*a2 What it isn t: a measure of the risk of a particular allele i.e. big ORs don t mean alleles are common and everyone is going to get the disease.

Test Statistic Confidence Interval: The equivalent of a p-value for an Odds Ratio Measure of the robustness of the OR result Remember that: OR=A1*u2/U1*a2 So if the CI spans 1, that means the OR is not significant, i.e. there is no difference in the probability of being a case or a control given a certain allele CIs that are negative= PROTECTIVE EFFECT CIs that are positive=risk EFFECT SINCE IT IS A RATIO OF ALLELES, THE SIGN DEPENDS UPON THE ALLELE YOU ARE USING AS YOUR TEST ALLELE, TYPICALLY ONE TESTS THE MINOR ALLELE AS THE DISEASE ALLELE.

Test Statistic Confidence Interval: Forest Plots

Association Screen QC Hardy Weinberg Equilibrium When there is: 1. random mating within a large population 2. No selection or recent mutations Then: allelic/genotypic frequencies should remain stable from generation to generation Mostly test controls and cases separately, some argue that cases should be out of HWE for markers of interest BEST TEST: MARKERS KNOWN TO NOT BE LINKED TO DISEASE Gender checks Marker frequencies in controls against public DB Population checks Make sure no one in your population is related Check also for outliers

http://labs.med.miami.edu/myers