Descriptive Statistics

Size: px
Start display at page:

Download "Descriptive Statistics"

Transcription

1 Descriptive Statistics Let s work through an exercise in developing descriptive statistics. The following data represent the number of text messages a sample of students received yesterday. 3 1 We begin by putting the data in ascending order (lowest number to highest). 3 1 This should be the first step in any exercise for which we wish to use statistics to describe the data.

2 With the data in (ascending) order, we can readily identify the mode and median. First, the mode. 3 There are 2 modes in this distribution: occurs 3 times AND occurs 3 times. No other number occurs more than once. 1 Now, the median. 3 Notice that N =, an even number. This means that the median (the point that splits the distribution exactly in half) will lie between the 2 middle values. These 2 numbers are the middle values in this distribution; hence Median = ( + )/2 = 17/2 =.5. Even though there is no.5 in the original data, it can be seen that there are 4 numbers lower than.5 and 4 numbers greater than.5. 1

3 Now, we find the mean- the arithmetic average. First, we SUM all the values of the X variable: 72 Then we DIVIDE by the number of values of X: We have just described the TYPICAL number of text messages students receive in a 24-hour period. The mode(s) is/are and ; the median is.5; and the mean (average) is. Before we leave this topic, let us look at what would happen to one of these measures if one of the values were to change. X 3 1 =

4 Suppose the largest number in the distribution were changed from 1 to 34. The distribution would look like this. X 3 First mode Median =.5 Second mode 34 = Sum = N Mean = Sum of X s N = = 11 In this new distribution, the mode(s) would remain the same- they would still be and. The median, likewise, would not change- it would still be.5 But the mean would change. The sum of the X values would now be, N would still be, and the resulting mean would be 11. (Remember, the mean of the original distribution was.) This illustrates an important point about the mean. Unlike the mode and median, the mean is affected by EXTREMELY HIGH or EXTREMELY LOW values of X. Distributions with one or a few extreme values at either end of the distribution are said to be SKEWED (unbalanced). (We will look at skewed distributions when we do graphs in SPSS. For now, we need to be aware that the mean may not be the most appropriate measure of central tendency for a skewed distribution. (The median is probably better.)

5 Now let s calculate some percentages. Here, once again, are the original data (remember- data is a plural term, hence the use of the plural verb form are ). X N Percent 3 1 1/=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = 12.5 Total Set up this way, this table illustrates several important points about data distributions and percentages. First, it illustrates how percentages are calculated. This is simply an application of a formula we have presented in another slide. Second, notice that the results of dividing the number in the N column by the Total () are decimal numbers. These types of numbers will be discussed in a later section of the course where we learn about the concept of probability. [Probability is the link between this early part of the course- descriptive statistics- and the second part called inferential statistics. On the next slide, we will look at how we can use the percentage figures.

6 We have repeated the distribution from the previous slide to help our presentation. X N Percent 3 1 1/=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = 12.5 Total The third point illustrated by this distribution is how we can use the percentage figures in our data analysis. For example, if we want to know what percent of these students received exactly 3 text messages, we simply look at the line for X = 3 and see the percentage is If we want to know what percent of these students received exactly text messages, we ADD the percentages for the 3 students who received messages. In this instance, the percentages are =37.5. We could accomplish the same end if we find the number of students who received messages (3) and divide this number by the total number in the sample (); 3/=.375*100=37.5% If we want to know what percent of students received MORE than text messages, we have 2 options. We could simply ADD the number of students who received or more messages (4) and divide this number by the total number in the sample (); 4/=.50*100=50.0%. Or, we could add the percentages of students who received

7 or fewer messages (50%) and SUBTRACT that number from Either way, we arrive at the same figure- 50 percent.

8 Here, again, is our data distribution. X N Percent 3 1 1/=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = /=.125*100 = 12.5 Total Finally, suppose we want to know the percentage of students who received either or text messages. Again, we have 2 options. We could simply ADD the percentage of students who received either ( ) or ( ) text messages and find the total to be 75%. Or, we could add the percentage of students who did not received either or text messages (1 received 3 [12.5%] and 1 received 1 [12.5%]) and subtract that figure from Again, either way, we arrive at the same conclusion- 75% received either or text messages.