Chapter 1 Data and Descriptive Statistics

Size: px

Start display at page:

Download "Chapter 1 Data and Descriptive Statistics"

Edward Wilkins
5 years ago
Views:

1 1.1 Introduction Chapter 1 Data and Descriptive Statistics Statistics is the art and science of collecting, summarizing, analyzing and interpreting data. The field of statistics can be broadly divided into two (i) descriptive statistics and (ii) inferential statistics. In descriptive statistics, we simply describe a given set of data in ways that makes it understandable to the user or the decision maker. There are various approaches through which we can describe data, such as summarizing numbers, tabulating numbers, visualizing through various graphics etc. We will study some of these approaches in this chapter. In Inferential statistics, statisticians try to make some useful statements about a population, based on an analysis of some sample data. In general, a decision maker is interested in statements about a population, but collecting data for an entire population is not feasible either practically or economically. Statisticians therefore collect data from a small sample and make statements about the population based on the sample data. In later chapters, when we learn about inferential statistics, we will learn the process of making statements about a population based on sample data. 1.2 What is Data? A single unit of data is the value of some variable of interest. For example the number 5 is a single unit of data. It could represent the number of customers waiting in line at a given time; it could represent the number of days it takes to ship an item; it could represent the weight in pounds of a package being shipped. In these examples, the number of customers waiting in line is a variable, the number of days it takes to ship an item is a variable and the weight in pounds of package is a variable. Data need not always be a number, it could also be a non-numeric value, such as red or high. The data red might represent the color of the next car you see on the road. It might represent the favorite color of your best friend or the most popular color for clothes amongst women. The data value high may represent the degree of customer satisfaction for my customers, or the degree of perceived quality for a product. As a statistician, whenever you see a collection of data, you must understand what each data item represents. There is always a real-world entity that a piece of data represents or describes. 1.3 Types of Data Quantitative vs. Qualitative Data Broadly, there are two types of data (i) numeric or quantitative data and (ii) non-numeric or qualitative or categorical data. Whether the data is quantitative or qualitative really depends on whether the underlying variable happens to be quantitative or qualitative. If the variable is weight or height or time taken or number of people etc., it is inherently quantitative and therefore data that describes these variables will be quantitative. If the underlying variable in qualitative in nature, such as the color of a dress or the degree of satisfaction, then the data is also qualitative. Sometimes, a qualitative variable is represented by a number, which can create some confusion. For example, in some countries, the zip code is represented by numbers (e.g. in USA). In some countries the zip code is not numeric (e.g. in Canada). Even if it is represented as a number, a zip code is essentially a qualitative variable, because it is simply a label for a neighborhood to facilitate mail delivery. One way to test whether a variable is truly numeric is to see if it makes sense to perform some arithmetic on the data values. If it makes sense, then the variable must be truly numeric, if not, then it must not be. For example, while it makes sense to add two values of weight, it makes no sense to add two zip codes. The sum of two zip codes does not produce any meaningful number, whereas sum of two weight values produces a meaningful value. A third type of data is date. It is neither completely quantitative nor completely qualitative. It has elements of both. For example, part of the date represents the month, which is a non-numeric quantity such as January, February, etc. The fact that these months can be represented as numbers, don t necessarily make them numeric because adding 1 and 2 gives 3, but adding its corresponding values January and February does not give March. In fact, adding January and February does not give anything meaningful. Yet, some arithmetic can be performed with dates. For example, you can calculate the date, 60 days from today. Discrete vs. Continuous Variables Within Quantitative variables, there are two types of variables Discrete and Continuous. If the values of a variable are discrete such as 3 or 5, the variable is discrete. A discrete variable can assume only certain 1

2 values, it cannot have any value such as A continuous variable, on the other hand, can assume any value on a continuum, such as Height and weight are examples of continuous variables. If we have a scale of extremely fine resolution that measures height up to six decimal places, the height can be any number, such as inches. But because we tend to round it up to the nearest integer, it appears to be discrete. Similarly weight is a continuous variable because a very sensitive weighing machine can weigh a person up to several decimal places. Number of people in a line is an example of a discrete variable. Number of countries visited by a person is also an example of a discrete variable. Nobody can visit countries for example. Most discrete variables are those whose values are a result of counting, such as number of customers who enter a store in an hour, or the number of cars that pass through a traffic light in a day, or the number of students enrolled in a course. Most continuous variables are those whose values are a result of measurement, such as distance, weight, temperature etc. Scales of Measurement There are four different scales of measurement, namely (i) Nominal scale, (ii) Ordinal scale, (iii) Interval scale and (iv) Ratio scale. A qualitative (or categorical) variable may have a Nominal scale or an Ordinal scale. A quantitative variable may have an Interval scale or a Ratio scale. A variable with a nominal scale is a categorical variable whose values cannot be ordered. For example, Color is an example of a nominal variable because its values cannot be ordered. How do you order Red, Green, Brown, Blue that makes sense? Another example of a nominal variable is Gender. Values of an Ordinal scale variable can be ordered. For example, when filling out a survey on customer satisfaction, you might choose amongst categories of Poor, Below Average, Average, Above Average, Excellent. These values can be ordered and therefore the variable customer satisfaction is an ordinal variable. An Interval scale variable is a quantitative variable whose values do not have a true zero and consequently the ratio of two values is meaningless; only the interval between two values are meaningful. For example temperature in Fahrenheit is a variable whose value of zero is an arbitrary temperature. A temperature of zero degrees Fahrenheit does not correspond to zero heat and therefore this variable does not have a true zero. A ratio of 40 degrees and 20 degrees is 2, but it does not imply that 40 degrees temperature corresponds to twice the heat compared to 20 degrees temperature. So the ratio of two values is meaningless. In business examples, we rarely come across interval scale variables. A ratio scale variable is a quantitative variable with a true zero and therefore, for which, ratio is meaningful. For example sale price, height, length, weight are all examples of ratio scale variables. Population vs. Sample When learning statistics, we must learn to clearly distinguish between a population and a sample. A population consists of all entities of interest. A sample is a subset of entities from a population. Usually, though not always, it is infeasible to collect data about the entire population of interest. In rare cases, if the population size is small, then it is feasible to collect data about a population. For example, if I am interested in the income distribution of everyone in a city of a million residents, it would be quite infeasible to collect data on each resident s income. If, however, we are interested in the income distribution of everyone in a small town of 15 residents, we may be able to collect the entire population data. Whenever collecting population data is infeasible, we have no choice but to work with sample data. 1.4 Descriptive Statistics We will now discuss how data is described using descriptive statistics. It is important to recognize the type of data before deciding how to describe it because the descriptive statistics for quantitative data are different from the descriptive statistics for qualitative data. Descriptive Statistics for Quantitative Variables Sometimes data for a quantitative variable is given as a bunch of raw numbers, also called ungrouped data and sometimes it is given as grouped data. An example of ungrouped data is a list of raw numbers such as 2, 5, 7, 9, 4, 3, 3, 4, 6, 8, 14, 4, 20, 6, 10, 4, 5, 9, 11, 1, 6, 9, 4, 5, 13, 18, 7, 6, 9, 10. These numbers could represent any quantitative variable such as the number of cars sold per day in April at a car dealership. Grouped data appears as a frequency table for different groups of values, such as: 2

3 Table 1.1: Grouped Data Num of cars sold in a day in April Count (or frequency) Depending on whether the data is grouped or ungrouped, the way we describe data is different. We describe data either using some summary measures or by some visual graphs. Summary Measures for Quantitative Variables There are four types of summary measures: (i) (ii) (iii) (iv) Measures of Central Tendency measures of central tendency measures of variation measures of location measures of shape. In general, any given data tends to crowd around a center. It helps to know what this center is. There are three measures of central tendency (i) (ii) (iii) Mean Median Mode. The mean is simply the average of all the values. We can calculate the mean by simply summing up all the values and dividing by the total number of values. For example, the mean of these values: 2, 5, 7, 9, 4, 3, 3, 4, 6, 8, 14, 4, 20, 6, 10, 4, 5, 9, 11, 1, 6, 9, 4, 5, 13, 18, 7, 6, 9, 10 can be determined by summing up these values and dividing by 30. The sum of all these values happens to be 222. So the mean is 222/30 or 7.4. Mathematically, the mean is given by the formula: Mean = /n, where n is the number of values and x i s are the data values. We can say that the mean number of cars sold per day in April is 7.4. If the data is grouped, then the way we calculate the mean is different. We compute the middle value in each group and then multiply frequency by the middle value and add the product and divide by total frequency. Table 1.2: Computing Median in Grouped Data Num of cars sold in a day in April Middle group value Count (or frequency) Product of middle value and freq Total The mean is 222/30 = 7.4 Although in this example, the mean of ungrouped and grouped data turned out to be the same, it may not always be the case. The second measure of central tendency, median is the middle value. The middle value can be determined by arranging the data in either ascending or descending order and finding the value in the middle of 3

4 the sorted list. Median is easier to obtain if there are an odd number of values because there is only one middle value. If there are an even number of values, such as in our example, then there are two middle values and the median is the average of the two middle values. If we arrange our data in ascending order it looks like this: 1,2,3,3,4,4,4,4,4,5,5,5,6,6,6,6,7,7,8,9,9,9,9,10,10,11,13,14,18,20 Since there are a total of 30 values, there are two middle values - the fifteenth and the sixteenth values. Since both of them happen to be 6, the average of these two middle values is also 6, so the median for this data is 6. If the fifteenth value had been 6 and the sixteenth value had been 7, the median would have been 6.5. For grouped data, we compute the cumulative frequency column and look for the group that has the middle value. For example in the table below, 15 th and the 16 th values are the two middle values and they happen to be in the group 4 to 6. So we know that the median is in the group 4 to 6. Within that group, we find the value of the middle value in a prorated manner. In our example, the fifth value is closer to 4 and the 16 th value is closer to 6. If we prorate, then the 15 th value is 4 + (6-4)*(15-4)/12 or and the 16 th value is 4 + (6-4)*(16-4)/12 = 6. So the median is the the average of and 6 or Table 1.3: Cumulative Frequency Table for Grouped Data Num of cars sold in a day in April Count (or frequency) Cum. Freq The third measure of central tendency, mode is the value that appears the most number of times. In our example, the value of 4 appears the most number of times. It appears five times. There is no other value that appears five times or more. The mode for this data is therefore 4. Sometimes, there may be more than one mode. For example, if the day 10 cars were sold, if only 9 cars had been sold, there would have been five days when 9 cars were sold. In that case there would be two modes 4 and 9. When you have two mode, we do not try to find the average of the two modes. We simply report that there are two (or more) modes. For grouped data, the mode is the middle value of the group with the highest frequency. In the above example, the highest frequency is for the group 4 to 6 whose middle value is 5. So the mode is 5. Which of the three measures of central tendency should you use depends on the type of data. If the data is distributed somewhat symmetrically around the center, the mean is the most appropriate measure of central tendency because it acts as a good representative of the data. If the data is not distributed symmetrically, i.e. it has a long tail on one side, then median is a better measure of central tendency. For example, instead of selling 20 cars on a day, suppose we sold 150 cars in a day. The data would then look like this: 1, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 8, 9, 9, 9, 9, 10, 10, 11, 13, 14, 18,150. The extremely high value of 150 makes the data non-symmetric about the center. Because of this one value, the mean would be pulled up to 11.7 whereas the median would stay at 6. Since 6 is a more representative value for this data, it is more appropriate to use the median in this example. In real life, incomes tend to be nonsymmetric because some people have very large incomes and therefore median income is a better measure of central tendency than mean income. Similarly, house prices tend to be non-symmetric and therefore median house prices as opposed to mean house prices are normally reported. Measures of Variation So now we know that data tends to crowd around the center and that we measure that center using one of the measures of central tendency. But some data is away from the center. Data could be very far from the center or not so far from the center. How far the data is spread around the center is determined by the various measures of variation. Measures of variation are also sometimes called measures of spread or measures of dispersion. The various commonly used measures of variation are (i) Variance 4

5 (ii) (iii) (iv) (v) Standard Deviation Mean Absolute Deviation Range Inter-quartile range The Variance is obtained by finding the deviation of each value from the mean value and squaring the deviation and finding the average of these squared deviations. So variance is basically the mean square deviation. Mathematically, the variance is for a population and for a sample. The difference between these two will be explained later. In this formula is the mean. In our example = So the population variance is 575.2/30 = and the sample variance is 575.2/29 = The standard deviation is nothing but the square root of the variance. The population standard deviation is, and the sample standard deviation is. The standard deviation has the same unit as the original data, whereas the unit for variance is square of the unit of the original data. Therefore, it is easier to interpret the value of standard deviation than it is to interpret the value of variance. In our example, the population standard deviation is square root of = The sample standard deviation is square root of = The mean absolute deviation is the mean of the absolute deviations of the values from the mean. In our example, the sum of the absolute deviations from the mean is The mean absolute deviation is 102.4/30 = The range is simply the highest value minus the lowest value. In our example, the range is 20 minus 1 or 19. The interquartile range is the difference between the first and the third quartile. Quartiles will be explained next. Measures of Location There are five measures of location: (i) (ii) (iii) (iv) Highest value Lowest value Quartiles Percentiles. The highest and lowest values are very straightforward. They are the maximum and minimum values in the data set, respectively. In our example, the highest value is 20 and the lowest is 10. There are four quartiles The first quartile, the second quartile, the third quartile and the fourth quartile. The first quartile is the value such that a quarter of the values are below this value. The second quartile is the value such that half of the values are below this value. The second quartile is basically the same as the median. The third quartile is the value such that 75% of the values are below this value. The fourth quartile is nothing but the highest value. In our example, if we arrange all the values in an increasing order, the average of seventh and eighth values will be the first quartile, the average of 15 th and 16 th values will be the second quartile, the average of 22 nd and 23 rd values will be the third quartile. For our example, the first quartile is 4, since both the 7 th and the 8 th values are 4. The second quartile is 6 since both the 15 th and the 16 th values are 6. The third quartile is 9 since both the 22 nd and the 23 rd values are 9. Now that we understand quartiles, we can talk about interquartile range, which is a measure of spread discussed earlier. In our example, the interquartile range is 9 minus 4 = 5. A percentile is best described with the help of an example. A percentile can be anywhere between 1 percentile to 99 percentile. A 68 percentile, for example, is the value such that 68% of the values are below that value. A 93 rd percentile is the value such that 93% of the values are below that value. So, the first quartile and the 25 th percentile are the same. The second quartile, the 50 th percentile and the median are the same. The third quartile and the 75 th percentile are the same. Note that percentiles are not represented by the % sign, a sign that is reserved for percent which is quite different from a percentile. Measures of Shape There are two measures of shape: (i) (ii) Skewness Kurtosis 5

6 Skewness measures the asymmetry of the distribution. If data is distributed symmetrically about the mean, the skewness value is zero. If the distribution is such that the right tail is longer, we say the distribution is positively skewed, or skewed to the right. Similarly, if the left tail is longer, we say the distribution is negatively skewed or skewed to the left. In general, although not always, for a right skewed distribution, the mean is greater than the median and the median is greater than the mode. Similarly, for a left skewed distribution, the mean is smaller than the median and the median is smaller than the mode. In our example, the skewness happens to be 1.218, so we can say our data is skewed to the right or is positively skewed or that it has a long right tail. Kurtosis is a measure of the peakedness of the distribution or narrowness of the peak of the distribution. Higher the kurtosis value, the taller is the peak. A negative kurtosis suggests a more rounded peak. In our example, the kurtosis happens to be Descriptive Measures for Qualitative Variables When we have qualitative or categorical variables, such as Gender, Color, Race etc., it is not possible to compute average or median or variance etc. It makes no sense to talk about the average of Red and Green, for example. The measures that do make sense are (i) frequency (or count), (ii) mode, (iii) relative frequency. For example, for the following data about color of ten cars, we can generate some descriptive measures as follows: Data: Red, White, Black, Blue, White, Red, Green, Red, Black, Red Table 1.4: Frequency and Relative Frequency Table for Categorical Variable Color Frequency Relative Frequency Red White Black Blue Green The mode for this data is Red, since it the value of most frequently occurring color. Descriptive Statistics using Excel In the pre Excel days, one had to spend a considerable amount of time computing these summary measures. But with Excel, with just a few key strokes you can get all the measures and then you can spend all the time you saved interpreting the measures and telling an interesting story. Even the graphs are easily drawn using Excel. There are a number of statistical functions that can quickly give us the various summary measures discussed above, such that the mean, the median, the mode, the variance etc. In addition, there is an Add-In called Data Analysis, which comes with Excel, which can perform some basic descriptive statistics. You just have to activate the Data Analysis pak. It takes only a few key strokes to activate it. To see if Data Analysis add-in is already active, select the Data ribbon, and check if on the right of the ribbon you see Data Analysis. If you don t see it, you need to make it active. Most likely the Data Analysis Pak add-in will be inactive on your computer. To activate, click on File and then Options and then click on Add-Ins. You will see a list of active and inactive add-ins. Click on the Go button to manage add-ins. Check Analysis Pak and click OK. Now if you look at the Data ribbon, you will Data Analysis on the right of the ribbon. Note that for some reason, Excel on Mac does not have the Data Analysis add-in. Type the following data in Excel in cells B1 through B30. This is the same data as given above. 2, 5, 7, 9, 4, 3, 3, 4, 6, 8, 14, 4, 20, 6, 10, 4, 5, 9, 11, 1, 6, 9, 4, 5, 13, 18, 7, 6, 9, 10 Type each number in a different cell. There are 30 data values, so you will use 30 cells as shown in Figure 1.1. As you can see, the data is in the range B1:B30. Some rows have been hidden to save space. 6

7 Figure 1.1: Data for our Example Now click on Data Analysis on the data ribbon and click on Descriptive Statistics (see Figure 1.2). You will see a dialog box as in Figure 1.3. Type the range for the Data in the Input Range as $B$1:$B$30 and type the output range as $D$1 so the output will appear in a range starting with cell D1, and also check the box for Summary Statistics and click OK. The output appears as shown in Figure 1.4. Figure 1.2: Data Analysis Add-In 7

Figure 1.3: Data Analysis Dialog Box Figure 1.4: Output of Descriptive Statistics using Data Analysis Add-In Figure 1.4 shows the various summary measures that we have discussed above.

813 1.5 Converting ungrouped data into grouped data in Excel?

8 Figure 1.3: Data Analysis Dialog Box Figure 1.4: Output of Descriptive Statistics using Data Analysis Add-In Figure 1.4 shows the various summary measures that we have discussed above. The only measure we have not discussed is Standard Error. The standard error is defined as the sample deviation divided by the square root of sample size. In our example, it is 4.45/sqrt(30) = Converting ungrouped data into grouped data in Excel? To convert ungrouped data into grouped data, we should first create some categories or bins or classes or groups of values, such as 1 to 3, 4 to 6, etc., and then we count the number of observations that fall in each category or class or group. We can do the counting manually or we can write some Excel formulas. For a dataset as small as in our example, we can do manual counting, but imagine if we had data for 1,000 days instead of 30 days. It would be really hard to count manually. So we will learn how to count the frequencies in Excel. We will define our groups as 1 to 3, 4 to 6, 7 to 9 etc. Note that the group size is arbitrary, so we could have groups like 1 to 4, 5 to 8 etc. as well. These groups are also called classes. Each class has a lower class limit and an upper class limit. So the class 1 to 3 has a lower class limit of 1 and an upper class limit of 3. Type 8

We will first explain how this function works. Suppose in cells B1 through B5 we have this data: 2, 5, 7, 9, 4.

9 all the lower class limits in a column and the upper class limit in the adjacent higher column side by side as in Figure 1.5. Figure 1.5: Lower and Upper Class Limits Next, we want to type a formula to get the frequency of observations in each class. There is an Excel function called =COUNTIF(). We will first explain how this function works. Suppose in cells B1 through B5 we have this data: 2, 5, 7, 9, 4. Now if I write a function as =COUNTIF($B$1:$B$5, >=6 ), then Excel will count the number of values in the range B1 through B5 that are greater than 6 and put that number. So the result of this function will be 2 because there are two numbers (7 and 9) which are greater than 6. If the function was =COUNTIF($B$1:$B$5, >=5 ) then the result of the function would be 3. So in column, J, write this function: =COUNTIF($B$1:$B$30, >=1 ) or you can also write =COUNTIF($B$1:$B$30, >= &H3). Since cell H3 has the value of 1, these two formulas are equivalent. See Figure 1.6 for the formulas of cell J. Note that you don t have to type all the formulas. Just type it in cell J3 and the copy and paste in cells J4 through J9. Note that the result of this function in cell J3 is 30 because all 30 values are greater than or equal to 1. Note also that the result of this function in the last cell (J9) is 1 because there is only one value greater than or equal to 19. Now write another =COUNTIF() function in column K as shown in Figure 1.7. Note the result of the function in cell K9, the result is 0 because there are 0 values greater than 21. Note that for the upper class limit, our formula looked like > &K3 instead of >= &K3. The reason for this will become clear after the next step. Figure 1.6: The COUNTIF formula for the lower class limit and its result 9

then copy it down from L4 through L9, as shown in 8.

column L will give the frequency of each class. Let s look at cell L3.

gives the number of observations from 1 to 3.

10 Figure 1.7: The COUNTIF formula for the upper class limit and its result Now in column L, in cell L3, write the formula = J3-K3 and then copy it down from L4 through L9, as shown in Figure 1.8. Figure 1.8: The difference of columns J and K gives the frequency of each class Please convince yourself that this formula in column L will give the frequency of each class. Let s look at cell L3. Since there are 26 observations higher than 3 and since there are 30 observations higher or equal to 1, their difference gives the number of observations from 1 to 3. Figure 1.9: The Frequency Table So this is how the frequency table in Table 1 was created. Note that you could have simply counted these frequencies manually, but that approach will not work for large datasets and hopefully, by going through this exercise you learnt a little bit of Excel. Once you get this far, creating the rest of the columns is fairly straightforward. Also creating charts is fairly straightforward. 10

11 In the above example, we have shown you how to perform some basic descriptive statistics using Excel. If you were able to follow the entire example, you will appreciate the power of Excel for performing descriptive statistics. We will now discuss some common terms used in Statistics. A Summary of Descriptive Measures for Quantitative Variables A set of data for a quantitative variable may be described using the measures of central tendency, measures of location and measures of variation. Term Description Formula Excel Function Mean The average of all numbers in a given data set Measures of Central Tendency /n =AVERAGE(array) Median The middle value in the data set There is no formula but a procedure. Arrange the values in an ascending order and find the value of the middle value. For even number of values, find the average of the middle two values =MEDIAN(array) Mode The most frequently occurring value in the data set There is no formula but a procedure. Find the frequency of each unique value and find the value with the highest frequency =MODE(array) Measures of Location Term Description Formula Excel Function Highest Value Lowest Value First Quartile The highest value in the data set The lowest value in the data set The value at the first quarter point There is no formula but an obvious procedure. There is no formula but an obvious procedure. Arrange the values in an ascending order and find the median of the first half of the data set =MAX(array) =MIN(array) =QUARTILE(array,1) Second Quartile Same as the median See median =QUARTILE(array,2) Third Quartile X PERCENTILE The value at third quarter point The value at X percent point in the data set Arrange the values in an ascending order and find the median of the second half of the data set Arrange the dataset in ascending order and find the value at x percent point Measures of Variation =QUARTILE(array,3) =PERCENTILE(array,x) Range The difference between the highest and the lowest values Highest Value Lowest Value =MAX() MIN() Population Variance The average squared difference of each =VARP() =VAR.P() 11

12 value from the mean Sample Variance The average squared difference of each value from the mean (average uses n-1 as denominator) =VAR() =VAR.S() Population Standard Deviation Square root of population variance =STDEVP() =STDEV.P() =SQRT(VARP()) Sample Standard Deviation Square root of sample variance =STDEV() =STDEV.S() =SQRT(VAR()) Inter-Quartile Range The difference between the third and the first Quartile Third quartile first quartile =QUARTILE(array,3) QUARTILE(array,1) Measures of Shape Skewness A measure of asymmetry with respect to the mean A very complex formula =SKEW(array) Kurtosis A measure of thickness of the tails of a distribution A very complex formula =KURT(array) 1.6 Charts Box and Whisker Chart A Box and Whisker Chart plots the mean, median, lowest value, highest value, first quartile and third quartile measures as follows: Lowest Value First Quartile Median Mean Third Quartile Highes t Value A box and whisker provides a good overall visual of the various measures of location and central tendency for ungrouped data. It is not drawn for grouped data. Histogram, Frequency Polygon, Ogive A chart of the frequencies of classes is called a histogram. When we connect the midpoint of each column in a histogram, we get a frequency polygon. When we plot cumulative frequencies, we get an ogive. 12

13 Charts for Categorical Variables Once we have the frequency table, we can draw a bar chart, also known as the column chart of the frequencies and relative frequencies. We can also draw a pie chart of frequencies or relative frequencies. Chapter Summary 1. Statistics is the science of collecting, summarizing, analyzing and interpreting data. 2. Statistics can broadly be divided into Descriptive and Inferential Statistics. 3. In descriptive statistics we describe data using numerical summary measures, tables and charts. 4. In inferential statistics, we make statements about a population based on sample data. 5. Data can be quantitative or qualitative. 6. Qualitative data can be of nominal scale or ordinal scale. 7. Quantitative data can be of interval scale or ratio scale. 8. Quantitative variable may be discrete or continuous. 9. Data can be ungrouped or raw, or grouped into classes or groups. 10. Numerical summary measures used to describe quantitative data are measures of central tendency, variation, location and shape. 11. Numerical summary measures to describe qualitative data are frequency, relative frequency and mode. 12. Charts to describe quantitative variables are box and whisker plots, histograms, pie chart, frequency polygons and ogive. 13. Charts to describe qualitative variables are column or bar chart and pie chart. 13