Descriptive Statistics Tutorial

Similar documents
Chapter 8 Script. Welcome to Chapter 8, Are Your Curves Normal? Probability and Why It Counts.

Introduction to Control Charts

Chapter 1 Data and Descriptive Statistics

GETTING READY FOR DATA COLLECTION

Chapter 12 Module 3. AMIS 310 Foundations of Accounting

DDBA8437: Central Tendency and Variability Video Podcast Transcript

Introduction to Statistics. Measures of Central Tendency

Introduction to Statistics. Measures of Central Tendency and Dispersion

The Dummy s Guide to Data Analysis Using SPSS

Day 1: Confidence Intervals, Center and Spread (CLT, Variability of Sample Mean) Day 2: Regression, Regression Inference, Classification

STAT/MATH Chapter3. Statistical Methods in Practice. Averages and Variation 1/27/2017. Measures of Central Tendency: Mode, Median, and Mean

LECTURE 17: MULTIVARIABLE REGRESSIONS I

How to do Statistics in Excel

1. Contingency Table (Cross Tabulation Table)

Mathematics in Contemporary Society - Chapter 5 (Spring 2018)

Chapter 3. Displaying and Summarizing Quantitative Data. 1 of 66 05/21/ :00 AM

Lecture-16. Data Tables, Scenarios & Goal Seek in Excel 2007

Gush vs. Bore: A Look at the Statistics of Sampling

How to Use Excel for Regression Analysis MtRoyal Version 2016RevA *

Math 1 Variable Manipulation Part 8 Working with Data

Math 1 Variable Manipulation Part 8 Working with Data

Bar graph or Histogram? (Both allow you to compare groups.)

KING ABDULAZIZ UNIVERSITY FACULTY OF COMPUTING & INFORMATION TECHNOLOGY DEPARTMENT OF INFORMATION SYSTEM. Lab 1- Introduction

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

Module - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches

Super-marketing. A Data Investigation. A note to teachers:

Excel #2: No magic numbers

Capability on Aggregate Processes

Survey Question Analysis (Draft )

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Eliminating waste isn t enough; you have to reduce inputs to save money. lean accounting. By Reginald Tomas Yu-Lee.

Operations and Supply Chain Management Prof. G. Srinivisan Department of Management Studies Indian Institute of Technology, Madras

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

Descriptive Statistics

Slide 1 Hello this is Carrie Tupa with the Texas Workforce Commission and I want to welcome you to part two of

Computing Descriptive Statistics Argosy University

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Chapter 10 Regression Analysis

Tutorial Formulating Models of Simple Systems Using VENSIM PLE System Dynamics Group MIT Sloan School of Management Cambridge, MA O2142

Marginal Costing Q.8

And the numerators are the shaded parts We talking fractions. Hook

SPSS 14: quick guide

Guest Concepts, Inc. (702)

Enterprise Diversification: Will It Reduce Your Risk?

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

Basic Statistics, Sampling Error, and Confidence Intervals

Winning more business in professional services firms

Online Student Guide Types of Control Charts

Section 9: Presenting and describing quantitative data

Statistics Chapter 3 Triola (2014)

Modelling buyer behaviour - 2 Rate-frequency models

MAS187/AEF258. University of Newcastle upon Tyne

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Managing stock levels: materials management and inventory control

Chapter 5. Statistical Reasoning

Strong Interest Inventory Certification Program Program Pre-Reading Assignment

Script for 408(b)(2) Disclosure Focus Groups

Pivot Table Tutorial Using Ontario s Public Sector Salary Disclosure Data

AP Statistics Test #1 (Chapter 1)

Module 55 Firm Costs. What you will learn in this Module:

Central Tendency. Ch 3. Essentials of Statistics for the Behavior Science Ch.3

THE NORMAL CURVE AND SAMPLES:

STEP BY STEP INTRODUCTION TO STATISTICS FOR BUSINESS. Edition. Second. Richard N. Landers

Module 1: Fundamentals of Data Analysis

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition

Glossary of Standardized Testing Terms

Point Sampling (a.k.a. prism cruising)

Measuring Performance with Objective Evaluations

Using Key Principles to Build Rapport

Forecasting Introduction Version 1.7

GLOSSARY OF COMPENSATION TERMS

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Confidence Intervals

Benchmarking with international partners: an interview with Robert Camp

Module 5: Production and costs

CHAPTER 10 REGRESSION AND CORRELATION

HIMSS ME-PI Community. Quick Tour. Sigma Score Calculation Worksheet INSTRUCTIONS

Physics 141 Plotting on a Spreadsheet

VIII. STATISTICS. Part I

ICTCM 28th International Conference on Technology in Collegiate Mathematics

The Human Tendency to Infer the Worst: Why the Absence of a Proper Cover Letter Can Severely Damage Your Candidacy

HUD-US DEPT OF HOUSING & URBAN DEVELOPMENT: Understanding Internal Controls. Ladies and gentlemen, thank you for standing by and welcome to the

10.2 Correlation. Plotting paired data points leads to a scatterplot. Each data pair becomes one dot in the scatterplot.

Don t We Need to Remove the Outliers?

Lecture 10. Outline. 1-1 Introduction. 1-1 Introduction. 1-1 Introduction. Introduction to Statistics

Chapter 9 Assignment (due Wednesday, August 9)

THE GUIDE TO SPSS. David Le

The most frequent question that I am asked after

1/26/18. Averages and Variation. Measures of Central Tendency. Focus Points. Example 1 Mode. Measures of Central. Median, and Mean. Section 3.

Let us introduce you to Course Match

Chapter 8: Exchange. 8.1: Introduction. 8.2: Exchange. 8.3: Individual A s Preferences and Endowments

An Introduction to Descriptive Statistics (Will Begin Momentarily) Jim Higgins, Ed.D.

Designing with LRFD for Wood by Robert J. Taylor, Ph.D., P.Eng., M.ASCE, Assoc. AIA

Applying Statistical Techniques to implement High Maturity Practices At North Shore Technologies (NST) Anand Bhatnagar December 2015

Statistical Pay Equity Analyses: Data and Methodological Overview

Using the Percent Equation

MSMGT 782 Lesson 2 Important note: Transcripts are not substitutes for textbook assignments.

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Weka Evaluation: Assessing the performance

Transcription:

Descriptive Statistics Tutorial Measures of central tendency Mean, Median, and Mode Statistics is an important aspect of most fields of science and toxicology is certainly no exception. The rationale behind the importance of statistics in a field such as this is that no two individuals are the same. There is a lot of variation in the population. In trying to make sense of this variation in a population, it is sometimes important to generalize. This is often done by providing an average. Given the range of salaries for the safety and health profession, for instance, one might argue that the average salary in the US is around $70,000 per year. There are some questions that might be asked about this figure, however. For instance, how was this average determined? There are actually a number of ways to determine an average and each is used for a different reason. Usually when somebody discusses salaries, they use an average known as the median. The median is determined by lining up all the salaries in numerical order and picking the middle number. If there are an even number of values in the sample, the median will equal the sum of the middle two values divided by two. The reason one would use the median for salaries is because there are often extreme outliers in salary data. There may be safety professionals, for instance, who have made millions of dollars as a result of owning a very successful consulting firm. If a person is looking for a measure of central tendency here, the median allows folks to not have to account for these extreme values because one is simply lining up the numbers and picking the one in the middle. Here is an example: Calculate the median for the following data that represents the number of hours rodents in a sample went before positively responding to a specific experimental treatment: 2, 11, 14, 15, 15, 17, 17, 17, 17, 19, 19, 20, 20, 21, 22, 378 There are 16 numbers here. The two middle numbers (8 th and 9 th number) are 17 and 17. 17 X 2 = 34. 34/2 = 17 Notice that if we were to calculate an average known as the Mean, which would essentially be calculated by adding all the numbers together and dividing by the total number in the sample, we would get an average of 624/16 = 39 Notice how that large number (378) moved the average from the middle value of 17 to something much larger. This is why, when we know there might be unusual outliers, we often consider using the median as the reported value of central tendency. Sometimes, outliers are also removed in order to conduct standard statistical analysis. Perhaps something unusual was going on with our rodent that failed to respond for 378 hours that did not really reflect the response of the general population. One other measure of central tendency that is sometimes used is the mode. This is used when a person wants to know what value is repeated the most in a sample. Looking at our example above, for instance, we can see that the number 17 is repeated 4 times. No other number is repeated that many times, so the mode would be 17. With this said, it should be obvious that a given sample can have more than one mode. Measures of Dispersion

The Range When we talk about measures of dispersion, we are typically referring to how wide the data is spread. There are a number of ways to do this. One way to do this is to report the range. The range is basically the difference between the highest value and the lowest value. Let s look at our sample above again with the outlier removed. 2, 11, 14, 15, 15, 17, 17, 17, 17, 19, 19, 20, 20, 21, 22 In this situation, the highest number is 22 and the lowest number is 2, so the range would be 20. Note, if we kept the outlier in the sample, the range would be much larger. The range is the most basic measure of dispersion and does not really convey a lot of information. It is like asking someone to describe his or her daily driving habits and getting a response like Sometimes I do not drive at all and the most I drive is 20 miles on a given day. This does not convey much information about driving habits, does it? However, if the person indicated he or she drove an average of 10 miles a day and also reported the range, we would have a much better idea as to driving habits. Two more commonly used measures of dispersion are the variance and the standard deviation. They are related because the latter is the square root of the former. That is, take the square root of the variance and you get the standard deviation. In order to discuss these concepts further, it is important to first consider the concept of the normal distribution or what is commonly known as the Bell Curve. Chances are you have seen something like this or at least heard of the bell curve somewhere along the way: Often when we take a measurement of different subjects in a sample, we see the distribution best represented as a bell curve as depicted above. Let s consider a variable like weight, for example, and its relationship with blood alcohol dosages. If we take the weight measurement of 3000 randomly selected adult males, we will likely note that there are a few individuals who are much lighter than average and a few people who are much heavier, but most people will be somewhere in the middle, much closer to the average. This is why the bell curve is shaped the way it is. The left tail would represent the few very light individuals in the population and the right tail would represent the few very heavy individuals, but most males would fall somewhere in the middle, around the mean average which is represented by the middle, and the highest point on the curve.

Variance and Standard Deviation: A more commonly used measure of dispersion used in most sciences is the standard deviation. This value basically represents the average distance most of the data falls from the mean. Let s take a look at the normal distribution diagram below where we set our mean average to zero. Please note that the Greek letter σ, or sigma is used here to represent the population standard deviation. Standard Deviation. (n.d.) In a normally distributed sample, one standard deviation on either side of the mean typically accounts for 68.2% of the variation around the mean. Considering our adult male sample above, that would mean that 68.2 % of all males, or 1860 individuals, weighed in within one standard deviation. It is also important to note that the standard deviation is a calculation that depends on the data in the sample and so this value can fluctuate depending on the variation within the sample. Let s say we have two samples of 3,000 individuals from different countries we will weigh for a study. We calculate a standard deviation of 35 pounds for group A and 10 pounds for group B. What these two standard deviations tell us is that there is a lot more variation in Group A than there is in group B. In group B, most people (68.2% in fact) are within 10 pounds of the average. In group A, however, most people are within 35 pounds of the average. We can say, therefore, that there is a lot more variation in group A as compared to group B. This is why the standard deviation is frequently reported along with the mean. It gives the reader an idea as to how much variation exists in the population or sample being considered. If one hears, for instance, that the average weight of a group is 170 pounds with a standard deviation of 10.7 pounds, it gives the person a much better picture than reporting the mean average alone. Calculating Standard Deviation We pretty much know how to calculate the mean average but as indicated above, but it is also good to be able to determine the standard deviation of a sample. It is quite a bit of work to calculate the standard deviation of a sample, but it is doable if it is undertaken step by step. The first step in calculating the standard deviation is to calculate the variance. The variance is essentially the square of the standard deviation. Once the variance is calculated, one only needs to click the square root button on the calculator to get to the standard deviation. Also, above we pointed out that the standard deviation of a population is typically depicted with the Greek letter σ. When dealing with samples (as opposed to an entire population), the value is reported as an italicized letter s. Since the sample variance is simply the square of the sample standard deviation, it is commonly depicted as follows s 2. For the purposes of this tutorial, we will not get into too much

discussion regarding the usefulness of the variance value except to indicate that it needs to be calculated first in order to determine the standard deviation. Here is how we calculate the variance of a sample. And here is the formula for calculating the standard deviation of a sample: (Formula images from Standard Deviation (n.d.)) Again, both of these look like fairly scary formulas, but it is nothing to be overly concerned about. For our purposes, we will calculate the variance using a step-by-step process and we will save deciphering these seemingly complex mathematical equations for another time. Here are the steps for calculating a sample standard deviation (the formulas above actually instruct us to perform these steps): 1. Calculate the mean average. 2. List all of the values of the sample in a column. 3. Subtract the mean average from each row. 4. Square the result in each row. 5. Add all of these squared values together. 6. Divide the squared values by the number of values in the sample minus 1 to get the variance. 7. Take the square root of the variance to get the standard deviation. OK, now that you know the steps, let s give this a whirl. Say we are going to do a preliminary study on the diameter of a skin rash exhibited on rodents after being treated with a very small quantity of chemical A. Since we are just trying to get a general idea as to the response, we only treat eight individuals. We get the following values in millimeters: 10, 8, 10, 8, 6, 4, 12, 6 The first step is to find the mean average. So we add them all together and divide the total by the number in the sample (8). We end up with an average of 8 mm (64/8=8). The next step is to list each value in the sample and subtract the mean

10-8 = 2 8-8 = 0 10-8 = 2 8-8 = 0 6-8 = -2 4-8 = -4 12-8 = 4 6-8 = -2 The next step is to square each result we just obtained: 2 2 = 4 0 2 = 0 2 2 = 4 0 2 = 0-2 2 = 4-4 2 = 16-4 2 = 16-2 2 = 4 The next step is to add these results: 4+0+4+0+4+16+16+4 = 48 Finally, to get our variance, we divide this total by N-1 (the sample size minus 1). 48/7 = 6.86 So 6.86 is our variance. Now do you remember how to determine the sample standard deviation from the sample variance? Correct! You just hit the square root button on your calculator. SqRt of 6.86 = 2.62 Based on this information, we would report our sample mean as 8.0 and our standard deviation as 2.62. This is, of course, a very small sample and clearly does not reflect a perfect normally distributed sample. A larger sample needs to be obtained. It is possible, and likely that a much larger sample will be much more normally distributed. But regardless, you now know the steps to calculating the mean, variance, and standard deviation. References: Standard Deviation. (n.d.). Wikipedia. Retrieved from: https://en.wikipedia.org/wiki/standard_deviation Note: Although using Wikipedia sources is typically discouraged as they have the potential to be unreliable, this tutorial utilized such sources primarily to obtain images. However, the writer of this tutorial is well versed in the use of statistics and therefore able to evaluate the reliability of the images used.