Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James
Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/ protein_technology/hupo %2C_eupa_and_nordic_qp_courses/ estoril_education_day/
Is this Course Necessary? Journal Guidelines: Journal of Proteome Research The methods for how the biological reliability of measurements was validated using biological replicates, statistical methods, independent experiments, etc. The methods for how the analytical reliability of measurements was validated using technical replicates and statistical methods. The treatment of relevant systematic error effects such as peptides shared by multiple proteins, interference from overlapping precursor ions, incomplete isotope labeling, bias correction for pipetting error, etc. The treatment of random error issues such as outlier rejection and the categorical exclusion of data by thresholds, for example, based on signal to noise or minimum ion counts. All quantitative results upon which conclusions are based must bear proper estimates of uncertainty and the methods for the error analysis should be clearly described.
Is this Course Necessary? Journal Guidelines: Journal of Proteomics The experimental design must be provided and must include details of the number of biological and analytical replicates. Only one biological/analytical replicate will not be acceptable. In clinical studies, it is highly desirable that a power analysis predicting the appropriate sample size for subsequent statistical analysis of the data is carried out. For expression analysis studies, summary statistics (mean, standard deviation) must be provided and results of statistical analysis must be shown. Reporting fold differences alone is not acceptable. Authors must report the following: methods of data normalization, transformation, missing value handling, the statistical tests used, the degrees of freedom and the statistical package or program used. Where biologically important differences in protein (gene) expression are reported, confirmatory data (e.g. from Western blot, RT-PCR analysis, etc.) are desirable. For biomarker discovery/validation studies, the sensitivity and specificity of the biomarker(s) should be provided wherever possible. It is desirable that receiver operator characteristic (ROC) curves and areas under the curves are given.
Is this Course Necessary? Journal Guidelines: Molecular and Cellular Proteomics A thorough description of the experimental design, including the biological sample size and number of technical replicates of such samples or preparations derived thereof so that (bio)statistical methods may be used to assess independently the significance of the results presented. Studies in which the number of biological and/or technical replicates equals one, can generally not be accepted particularly if only few or a single peptide is used for quantification. In exceptional circumstances, other lines of evidence such as time or dose dependent experiments may be acceptable instead of technical replicates.
Is this Course Necessary? Journal Guidelines: Molecular and Cellular Proteomics The experimental design must be provided and must include details of the number of biological and analytical replicates. Only one biological/analytical replicate will not be acceptable. In clinical studies, it is highly desirable that a power analysis predicting the appropriate sample size for subsequent statistical analysis of the data is carried out. For expression analysis studies, summary statistics (mean, standard deviation) must be provided and results of statistical analysis must be shown. Reporting fold differences alone is not acceptable. Authors must report the following: methods of data normalization, transformation, missing value handling, the statistical tests used, the degrees of freedom and the statistical package or program used. Where biologically important differences in protein (gene) expression are reported, confirmatory data (e.g. from validated immunoassays) are desirable. For biomarker discovery/validation studies, the sensitivity and specificity of the biomarker(s) should be provided wherever possible. It is desirable that receiver operator characteristic curves and areas under the curves are given.
Talk Overview Introduction to experimental design Sources of error How many replicates, controls? Experimental design flow Pilot experiments Normal data? Parametric, non-parametric Idea of power to calculate needs Journal guidelines
Is all this
Experimental Design Experimental design definition The statistics that happens before an experiment Why think about it? Proper planning can save having to repeat entire experiment Reduces analysis time and lowers error rate and costs Reduces experimental time to a minimum Design the experiment to answer a biological question
Experimental Design Flow Pilot Study Variation, Cluster and Power Analysis Full Scale Experiment Publication Data Validation Bioinformatics Complete Analysis
Goals of Experimental Design Avoid experimental artifacts Eliminate bias Use a simultaneous control group Randomization Blinding Reduce sampling error Replication Balance Blocking
Experimental Artifacts Experimental artifacts a bias in a measurement produced by unintended consequences of experimental procedures e.g. using doxycycline to activate a cloned gene in a viral vector with a teto gene promoter switches on the gene, but also many other pathways. A scrambled insert must be used as a control Conduct your experiments under conditions that are as close to reality as possible to avoid artifacts Inadequate CO 2 in cell culture experiments leads to large variations in ph and hence protein expression
Can I Compare my Data Sets? Non-normalised Normalised Correction for dye or isotope label incorporation efficiency Swap labels e.g. replicate Cy3Cy5 or TMT126 for 131
Scaling Data to a Target Intensity Target Intensity (100) Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 6 Exp. 7 TGT = Average intensity x Scaling Factor If scaling factor is < 3 fold, a comparison can be made between all experiments in the set
Eliminating Bias Use a control group A control group is a group of subjects left untreated for the treatment of interest but otherwise experiencing the same conditions as the treated subjects Randomization Randomization is the random assignment of treatments to units in an experimental study which breaks the association between potential confounding variables and the explanatory variables Blinding where some of the persons involved are prevented from knowing certain information that might lead to conscious or unconscious bias on their part, invalidating the results Single blind. Experimenter knows all facts, subjects do not Double blind. Neither experimenter nor subject know facts until the finish
Randomization Without randomization, the confounding variable differs among treatments
Randomization With randomization, the confounding variable does not differ among treatments
Balance In a balanced experiment, all treatments have equal sample size This maximizes power This makes tests more robust to violating assumptions
Blocking Blocking is the grouping of experimental units that have similar properties Within each block, treatments are randomly assigned to experimental treatments Randomized block design
Practical Questions to Consider How much variability does your system have? Understand and minimize variation How many treatments? How many controls? Comparative analysis (one experimental condition) Serial analysis design (multiple conditions) What level of significance is needed? More replicates needed for subtle changes
Three Sources of Variability Biological: Differences between samples - The ultimate goal of the research Technical: Sample preparation - Protocols and operator Systematic: MS analysis - Instruments, reagents, settings
Experimental Replicates Technical replicates from the same sample Allows an evaluation of bench effects to the overall variability Biological replicates from different samples Replicates that reproduce biological variables explored in the experiment Permit the use of formal statistical tests Also allows the interrogation of technical variability Gold standard Use of a standard protein digest to evaluate sensitivity, mass accuracy and search parameter settings Allows an estimation of systematic variation
Effective Studies may need many replicates Treatments Controls Average Differential Expression
How many Samples do I need? You should estimate the size of the three error sources The best way is to do a pilot experiment Use minimum three biological replicates Use minimum two technical replicates Check systematic errors with a gold standard Do a Power Analysis
Systematic Error Estimation: Reproducibility of retention time precision 5 days
Technical Error Estimation Coefficient of Variability CV% is a measure of variance amongst replicates Defined as the standard deviation (σ) divided by the mean multiplied by 100 Example: 5 values representing 5 replicates 230.4, 241.7, 252.9, 338.8, 178.9 Mean = 248.56; σ = 57.9; CV% = 23.29%
Which Statistical Test to Use? Assess the normality for each protein species Then select a parametric or non-parametric test Student s t-test assumes normality, independent sampling, and homogeneity of variance Mann-Whitney assumes independent sampling but not a normal distribution Frequency 0 50 100 150 2 families of tests -3-2 -1 0 1 2 3 Parametric Non-parametric
Is my Data Normally Distributed? A q-q plot is a plot of the quantiles of the data set 1 against data set 2 A quantile is the value which divides the distribution such given proportion of observation below 50% equivalent to the median value If the two sets come from a population with the same distribution, the points should fall approximately along a 45 0 reference line Alternatively plot data If it shows a symmetrical peak about the mean and 68% of the data lies within 1 standard deviation from the mean, the data is normally distributed
Biological Error Estimation Does the Experiment make sense? Hierachical Clustering is an unsupervised process It finds structures in unlabelled data A cluster is a set of objects (replicates) that are similar to each other and dissimilar to other clusters Basic way of checking results Do similar biological replicates cluster? Do technical replicates cluster within biological clusters?
Estimation of Replicates Needed How many Replicates must I have to prove my hypothesis? You must define a null hypothesis The hypothesis is that there is no statistical difference between control and experiment at a defined confidence level Power Analysis can provide an estimate of samples needed One must define a confidence level One must balance sample size against error rate and size of effect
Visualising Data -Clustering
Hierachical Clustering Nearest Neighbor Algorithm is a bottom-up approach Starts with n nodes n is the size of the sample merge the 2 most similar nodes at each step stop when the desired number of clusters is reached.
Nearest Neighbour Algorithm Nearest Neighbor, Level 1, k = 8 clusters Nearest Neighbor, Level 2, k = 7 clusters Nearest Neighbor, Level 3, k = 6 clusters
Nearest Neighbour Algorithm Nearest Neighbor, Level 4, k = 5 clusters Nearest Neighbor, Level 5, k = 4 clusters Nearest Neighbor, Level 6, k = 3 clusters
Nearest Neighbour Algorithm Nearest Neighbor, Level 7, k = 2 clusters Nearest Neighbor, Level 8, k = 1 clusters Technical replicates should cluster together within biological replicates
Verification Orthogonal validation (Physiol Genomics 28: 24 32, 2006) Western blots, enzyme activity assays, But if you don t see a change twice is it- False positive in the first experiment? False negative in the second? Need new samples Why? measurement error does not lead to false positives rather there is a need to validate against sampling variability Carry out a Power Analysis
Power Analysis You must define a null hypothesis H0 There is no difference between the experiments and controls Finding no difference does not prove the null hypothesis We simply do not have evidence to reject it Lack of a significant effect does not have to signify the means are equal Perhaps an effect exists, but the data is too noisy to demonstrate it. We need to define the Power of the experiment the probability of detecting a real effect And of not making a type II error
Possible Experimental Outcomes Experimental result statistically significant p < threshold H0 false Statistical not significant p > threshold H0 True Biologically no change H0 True False positive Type one error (α) Correct rejection Biological change H0 False Correct acceptance False negative Type two error (β)
What is Power? Power is your ability to find a difference when a real difference exists The power of a study is determined by three factors: Alpha level (what is p value -how many false positives allowed) Sample size (number of experiments needed to get result) Effect size (how large is the biological effect) Separation of Means relative to error variance. How do you Calculate Power? The best freeware solution, G*Power is available at http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3 Works on Mac OSX and Windows XP/Vista
Power and Sample Size Power analysis can be used to estimate the sample size required for a particular study Too small an effect size and an effect may be missed Too large an effect size too expensive a study Different formulae/tables for calculating sample size are required according to experimental design
Power and Effect Size As the separation between two means increases the power also increases
Power and Effect Size As the variability about a mean decreases power also increases
Should I Pool my Samples? Pooling Taking same amount of protein from different samples and create pool. Assumption: Signal from pool represents mathematical average Advantage: Can increase number of samples measured Disadvantage: Intra-group biological variation is lost Option: Sub-pooling, possible to estimate biological variation Can result in irreversible loss of information Pool of all samples can be used as internal reference in DIGE, itraq, etc. Pool minimum three or maximum five samples Equal pooling of samples is essential
Mixing Replicate Types 3 readings on the 3 biological gives a total of 18 readings This is an example of pseudoreplication There are only really 3 different subjects Student s t-test, requires independent samples and cannot be used A test which allows for hierarchy in the data is needed such as a nested ANOVA
Getting Help Learn the Basics of Statistics Look up Wikipedia for a starting point Collaborate with Statisticians, Informatics groups etc, BEFORE you start Use a reliable Statistics Program such as SPSS now called PASW This has extensive on-line Tutorials
Thanks To the following for providing many slides Morten Krogh Michaela Scigelova Natasha Karp Marianne Sandin Fredrik Levander And many others