e-learning Student Guide

Size: px
Start display at page:

Download "e-learning Student Guide"

Transcription

1 e-learning Student Guide Basic Statistics Student Guide Copyright TQG Page 1 of 16

2 The material in this guide was written as a supplement for use with the Basic Statistics e-learning curriculum produced by The Quality Group under the authorship of Gary Klipp, Chief Technical Officer for TQG. Please send any reader comments to your distributor or to The Quality Group at: All comments with respect to changes, errors, or additions may be used by The Quality Group at its discretion and may distribute any of the information you supply in any way it believes appropriate without incurring any obligations. You may, of course, continue to use the information you supply. This guide can only be utilized in conjunction with a paid up license for the Basic Statistics e-learning curriculum produced by The Quality Group and sold through its licensed distributors. Any unauthorized copying or reproduction of this guide without the written permission of The Quality Group would be considered an infringement of copyright laws. Copyright The Quality Group 1992, 1995, 2000, All rights reserved. Basic Statistics Student Guide Copyright TQG Page 2 of 16

3 Table of Contents What is Statistics? STATISTICS LEARNING OBJECTIVES 4 STATISTICS 4 Statistics defined 5 Branches of Statistics 5 Definitions 6 Data defined 6 Population defined 6 Census defined 6 Sample defined 7 SOURCES OF DATA 7 Sampling Methods 7 VARIABLES AND DATA 9 Variable defined 9 Types of Variables 9 Qualitative Variables 9 Quantitative Variables 10 Discrete variables 10 Continuous Variables 10 Measurement Scales 11 Statistically Designed Experiments 12 Data Analysis Techniques 13 SUMMARY 14 Glossary of Terms 14 Basic Statistics Student Guide Copyright TQG Page 3 of 16

4 STATISTICS LEARNING OBJECTIVES STATISTICS Upon completion of this course: Students will know the difference between discrete and continuous data and the concept of variation. Given some basic terms used in statistics, students will be able to identify their definition. Given an example of a variable used in statistics, students will be able to classify the variable as quantitative or qualitative Given a scenario for collecting data, students will be able to describe the difference between a population and a sample. Given an application for the use of statistics, students will understand the difference between using descriptive and inferential statistics. Statistics are all around us and are used to present information and to guide social understanding. For instance: Are men paid more than women for doing the same job when all other factors such as educational level, experience, and age are considered? What percent of all American adults read news on the Internet at least once a week? Is air travel still the safest way? These questions and others can be answered by using statistics. Businesses and governments use statistics every day in quality control, inventory control, forecasting, market surveys, and in the creation of the Consumer Price Index. Whether we like it or not, statistics is involved in our everyday life. Basic Statistics Student Guide Copyright TQG Page 4 of 16

5 STATISTICS DEFINED Statistics is the science of collecting, organizing, analyzing, and interpreting information. Statistics consists of methods and procedures to reduce a lot of information into a more manageable form. Statistics has its own terminology and vocabulary. Some example terms are: population and sample parameter and statistics random sampling time-series data BRANCHES OF STATISTICS The use of statistics will help you make informed decisions. Information is collected, organized, summarized, and analyzed in order to draw conclusions. Conclusions are factbased and data-driven. With this in mind, the application of statistics can be divided into two areas or branches -- "descriptive statistics" and "inferential statistics." Descriptive statistics is the use of graphical and numerical methods to organize, summarize and present data in a meaningful way for the purpose of making more informed decisions. Descriptive statistics: Involves using charts, tables, and graphs to summarize "raw data" for the purpose of extracting useful information. Inferential statistics uses information gathered from sample data to make estimates, predictions, or other generalizations about a larger set of data called a population. Inferential statistics: Basic Statistics Student Guide Copyright TQG Page 5 of 16

6 Involves making an inference about the value of a population parameter on the basis of a sample statistic. Uses information gathered from sample data to make estimates, predictions, or other generalizations about a larger set of data called a "population." Involves making an inference about the value of a population parameter on the basis of a sample statistic. A demonstrative example of inferential statistics might be to make an inference about all married couples living in a specified city from sample data. The characteristic of interest is the percentage of married couples where the women is the primary investor. Therefore: The population is: all married couples living in the specified city. The parameter is an unknown percentage of all married couples where the woman is the primary investor. The inference is that in 56% of all married couples, the woman is the primary investor. DEFINITIONS DATA DEFINED Data is information collected and evaluated. Both branches of statistics require the use of data. We make conclusions from data.. Data can be collected from various sources. However, some sources of data are more reliable than others. When working with data, we need to know the origin of the data and what the data represents. This is critical if conclusions drawn from the analysis are to be true. In statistics, there are two important terms used to describe the collection of data elements: population and sample. POPULATION DEFINED A population is a collection of objects or individuals with at least one characteristic in common. For instance, a population may be the group of people who have credit cards. Or, a population may be the collection of people who have an account at a specific bank. A population provides all outcomes, responses, measurements or counts that are of interest. Collecting data from every element in the population is called a "census." CENSUS DEFINED A census is a method of data collection from an entire population. It is a count or measure of an entire population. Every unit within the population must be counted or measured. It is often costly and difficult to perform. Basic Statistics Student Guide Copyright TQG Page 6 of 16

7 The United States Government conducts a census every ten years in order to learn the demographic make up of the United States. Census data provides information such as the number of members in a household, number of years living at a residence, household income, and the median rent paid to lease an apartment. SAMPLE DEFINED Sampling is another method for collecting data from a population. A sample is a subset of a population. Every element in a sample is an element of the population. It represents a portion of the population. There are various sampling methods that are used by statisticians to ensure that a sample is unbiased, accurate, precise, and representative of the population. Let s learn about some different sampling techniques. A census is a method of data collection from a population. It is a count or measure of an entire population. Every unit within the population must be counted or measured. It is often costly and difficult to perform. SOURCES OF DATA Both branches of statistics require the use of data. "Data is information collected and evaluated. We make conclusions from data. Questionnaires Surveys Observations Interviews Warranty registrations Production logs Shipping records Legal records Public records Experiments Data can be collected from various sources. However, some sources of data are more reliable than others. SAMPLING METHODS Listed here are some methods used to collect data by sampling. Some methods are more reliable and effective than others depending on the population being studied. Statistical expertise should be consulted prior to developing any complex sampling plan. Random Sampling is the process by which every member of the population has an equal chance of being selected for the sample. Samples need to be unbiased. Faulty data will result in faulty results. Systematic Sampling is the process by which the population is ordered in some way. Members are then selected at regular intervals. A quality control inspector may collect every 45th item, for example, as it comes from a production line. Basic Statistics Student Guide Copyright TQG Page 7 of 16

8 Cluster Sampling is when some of the clusters (group of elements) are selected, and then sampling from each of the selected clusters occurs. It is often used in populations where natural grouping of elements occurs. Each cluster should be a small scale version of the total population. Random sampling is used to select which clusters are sampled. In single-stage cluster sampling, all the elements within a cluster are sampled. In two-stage cluster sampling, a random sample of elements is selected from each chosen cluster. Stratified Sampling is used when subgroups within a population vary considerably. It is advantageous to sample each subgroup, or stratum, independently. The process of grouping members of the population into relatively similar subgroups before sampling is called stratification. Random sampling is used to select elements from each stratum. Sampling with Replacement is a method that selects an element from the population and replaces it prior to the selection of the next element. In this method, the same element can be selected more than once. Sampling without Replacement is a method where an element is selected from the population and removed prior to the selection of the next element. In this method, the same element can only be selected once. Example Population vs. Sample To practice differentiating population and sample, let's look at some examples. Example 1: People often refer to the Cola War as the competition between Coca-Cola and Pepsi in their marketing campaigns. As part of a marketing research experiment, 500 cola drinkers were given a taste test for a new brand of cola. Each cola drinker is asked to state a preference for either Cola Brand A or Cola Brand B. Based upon the information presented for this marketing research study, which one of the following best describes the population for this study? What is the population of the study? A) All Pepsi drinkers B) All Coca-Cola drinkers C) All cola drinkers Answer: Based on the information given, the answer is C because the sample consisted of cola drinkers not only Pepsi drinkers but Coca-Cola drinkers. Pepsi and Coca-Cola drinkers are both elements of the population. Example 2: Is computing the average time of all riders to complete the Tour de France an example of population or sample? Answer: The average time is being computed for all riders. We are not selecting a sample of a few riders to estimate the average time. This describes a population. Basic Statistics Student Guide Copyright TQG Page 8 of 16

9 Example 3: Is computing the average time per day a student spends watching TV for selected students in a local high school an example of population or sample? Answer: The average time is being computed from selected students. This describes a sample of students selected from the larger population of all students attending the local high school. Example 4: Is the average number of cell phones owned by 100 households in the United States an example of population or sample? Answer: The average number of cell phones owned is computed from a sample of 100 households selected from the population of all the households in the United States. This describes a measurement from a sample. Example 5: Is selecting households from specific zip codes when considering a large city with many zip codes an example of population or sample? Answer: The selected households from specific zip codes describe a sample of households that represent the larger population of all households in the city. VARIABLES AND DATA Data is obtained by measuring or counting the values of one or more variables in the sample or population of interest. A variable may be classified as qualitative or quantitative. Similarly, the data collected on a variable is called qualitative data or quantitative data. VARIABLE DEFINED In studying a population or a sample, we focus on one or more characteristics or properties of the units within the population. A variable is any characteristic on which the elements of a population or sample differ from each other. For instance, we may be interested in age, gender, or years of education of a country s workforce. Age, gender, and years of education would be defined as variables for this population. TYPES OF VARIABLES Data is obtained by measuring or counting the values of one or more variables in the sample or population of interest. A variable may be classified as qualitative or quantitative. Similarly, the data collected on a variable is called qualitative data or quantitative data. QUALITATIVE VARIABLES Qualitative data cannot be measured on a numerical scale. Cereal brand, make of coffee, color of car, political party affiliation, and zip code are all examples of qualitative data. Basic Statistics Student Guide Copyright TQG Page 9 of 16

10 Qualitative data is classified into categories and is also referred to as categorical data. Often we assign arbitrary numerical values to qualitative data for ease of computer entry and analysis. The assigned numerical values are simply codes and cannot be added, subtracted, multiplied, or divided. An example of qualitative data: When assessing the quality of a product, we could assign a "0" for poor, a "1" for average, or a "2 for superior. QUANTITATIVE VARIABLES Quantitative variables provide numerical measures of the units of a population or sample. Quantitative data comes from measurements or counts. Examples of quantitative data are: weight, length, time, temperature, age, and scores on tests. Arithmetic operations such as addition and subtraction can be performed on quantitative data and provide meaningful results. Quantitative variables can be divided into two types: discrete variables and continuous variables. Discrete variables Discrete variables produce discrete data which is countable Discrete variables are quantitative variables that produce a countable number of possible values. In these three variables, the values are data that can be counted: CONTINUOUS VARIABLES The number of occupants in an elevator at 5 pm The number of customer care letters sent per day The number of stocks whose share price increased on a given Continuous variables produce continuous data which is measured. Continuous variables are quantitative variables that have an infinite number of possible values that are not countable. Basic Statistics Student Guide Copyright TQG Page 10 of 16

11 Continuous variables require a measurement. It is impossible to list all the possible values for a continuous variable. Examples of continuous variables are: MEASUREMENT SCALES The amount of water used per day from a water cooler The amount of time it takes to complete a customer service call. The following detail four measurement scales in order of statistical desirability. There is a relationship between the level of measurement and the use of appropriate statistical procedures. Nominal Scale Simplest form of measurement Allocated to categories (qualitative variables) with no relationship between categories Counts or Frequencies (Frequency Distributions) Mode, Contingency Tables (Chi-Square Tests) Examples: Type of automobile, Voter preference, Place of birth Ordinal Scale Ranking or ordering Scores or numbers should not be treated as cardinal numbers and can not be added or subtracted The zero point is arbitrarily chosen Non-parametric statistics can be applied Examples: Order of finish in a race, Rating scale of 1-5 used for opinion on a specific subject Interval Scale Ratio Scale Scores are based on equal unit of measurements No true zero point. Zero does not imply the absence of the characteristic. Compare in order and distance between them (but not ratio) Add, subtract and multiply by constants without changing meaning Almost all statistical procedures can apply Examples: Temperature on Fahrenheit scale, All properties of Interval but has an absolute zero point (absence of characteristic) Compare distance between data and order of magnitude All statistical procedures can apply Examples are height, weight, and time Basic Statistics Student Guide Copyright TQG Page 11 of 16

12 STATISTICALLY DESIGNED EXPERIMENTS There are many sources of available data. From the short and incomplete list of sources of data that we looked at above, the last in the list is Experiments. In business, we often define a problem, identify the variables, and discover we lack the necessary data to solve the problem. In statistics, one important method often used to collect data is to run a statistically designed experiment. A statistically designed experiment results from a treatment that is applied to part of a population and responses are observed. The researcher exerts strict control over the entire experiment. We often think of experiments as being run in a lab. But experiments are widely used in all facets of business to solve problems. For example, testing the effectiveness of a new prescription drug or changing the marketing strategy for one section of a territory will provide comparative information for effectiveness. Then the information provided will support a decision about a more effective strategy that a company should use. In some experiments, a second part of the population, called a control group, is used. The control group does not receive the treatment. Results from both groups are compared. Care must be taken in designing the experiment and the sampling methods used to ensure that both treatment and control groups are similar. News media frequently offers evidence of a control group in medical testing where a control group receives a placebo or sugar pill and not the active pill. Results are discussed using the control group as a base and the active group as either improved or not improved. Basic Statistics Student Guide Copyright TQG Page 12 of 16

13 DATA ANALYSIS TECHNIQUES Examples of Data Type of Data Time to service a customer by bank teller Quantitative Continuous Number of calls received per hour for customer service Quantitative Discrete Package weight Quantitative Continuous Number of vacation days used per month by an employee Quantitative Discrete Amount of water a person consumes per day Quantitative Continuous Best selling brand of personal computer Qualitative NA Type of mail service used to ship a package Qualitative NA Basic Statistics Student Guide Copyright TQG Page 13 of 16

14 SUMMARY Congratulations! You have completed this introductory module. You should now have an understanding of some of the terminology and vocabulary used in statistics. Here are some of the terms just learned and a flow diagram illustrating the key points of the module. GLOSSARY OF TERMS Census A complete enumeration of the population for a given variable of interest. Cluster Sampling Occurs when some of the strata are selected and then sampling is undertaken within each of the selected strata. Continuous Variable Quantitative variables that have an infinite number of possible values over one or more intervals and are not countable. Cross-section Data As opposed to time series data is taken at a specific time for different elements. Discrete Variable - Quantitative variables that produce a countable number of possible values. Descriptive Statistics The use of graphical and numerical methods to organize, summarize and present data in a meaningful way for the purpose of making more informative decisions. Descriptive Variables Quantitative variables that have a countable number of possible values. Inferential Statistics Uses information gathered from sample data to make estimates, predications, or other generalizations about a population. Parameter A numerical measurement of a population characteristic is called a parameter. Population The entire collection of all elements under study. Qualitative Variable (Data) Qualitative variables product data which can not be measured on a numerical scale but can be classified into one of a group of categories. This type of data is called qualitative data. Quantitative Variable Provide numerical measures of the units of a population or sample. The numerical measures are called quantitative data. Random Sample Occurs when every subset of a fixed size from a population has the same chance of being chosen. Representative Sample A sample that exhibits the same characteristics as the population it was chosen from. Basic Statistics Student Guide Copyright TQG Page 14 of 16

15 Sampling with Replacement Occurs when an element chosen from the population is replaced before the next element is chosen for the purpose of selecting a sample from the population. Sampling without Replacement Occurs when an element chosen from the population is not replaced before the next element is chosen for the purpose of selecting a sample from the population. Statistic A numerical description of a sample characteristic. Statistics - The science of collecting, organizing, analyzing, and interpreting information. Systematic Sampling A rule or pattern used to select a sample from a population by choosing a starting point and then choosing every nth element. Time-series Data Recording information about a characteristic at different time periods or intervals, like hourly, daily, monthly, or yearly. Variable A characteristic of the elements of a population or sample under study that assumes different values. Basic Statistics Student Guide Copyright TQG Page 15 of 16

16 Basic Statistics Student Guide Copyright TQG Page 16 of 16