What is Statistics? Stat Camp for the MBA Program. Where Is Statistics Needed? Where Is Statistics Needed?

Similar documents
Chapter 1 Data and Descriptive Statistics

Lecture 10. Outline. 1-1 Introduction. 1-1 Introduction. 1-1 Introduction. Introduction to Statistics

Section 9: Presenting and describing quantitative data

Module - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches

SPSS 14: quick guide

Data Visualization. Prof.Sushila Aghav-Palwe

MAS187/AEF258. University of Newcastle upon Tyne

CEE3710: Uncertainty Analysis in Engineering

Math 1 Variable Manipulation Part 8 Working with Data

Math 1 Variable Manipulation Part 8 Working with Data

Introduction to Statistics

Using Excel s Analysis ToolPak Add-In

A is used to answer questions about the quantity of what is being measured. A quantitative variable is comprised of numeric values.

STAT 2300: Unit 1 Learning Objectives Spring 2019

Quantitative Methods. Presenting Data in Tables and Charts. Basic Business Statistics, 10e 2006 Prentice-Hall, Inc. Chap 2-1

Why Learn Statistics?

1. Contingency Table (Cross Tabulation Table)

Business Quantitative Analysis [QU1] Examination Blueprint

STA Module 2A Organizing Data and Comparing Distributions (Part I)

Ordered Array (nib) Frequency Distribution. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods

Business Statistics: A Decision-Making Approach 7 th Edition

Statistics Definitions ID1050 Quantitative & Qualitative Reasoning

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 2 Organizing Data

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations

STA 2023 Test 1 Review You may receive help at the Math Center.

Slide 1. Slide 2. Slide 3. Interquartile Range (IQR)

Introduction to Statistics. Measures of Central Tendency and Dispersion

Topic 1: Descriptive Statistics

Chapter 3. Displaying and Summarizing Quantitative Data. 1 of 66 05/21/ :00 AM

Elementary Statistics Lecture 2 Exploring Data with Graphical and Numerical Summaries

Job and Employee Actions

points in a line over time.

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

Biostatistics 208 Data Exploration

An ordered array is an arrangement of data in either ascending or descending order.

Management. 1 Evaluate business and economic data/information obtained from published sources.

An Introduction to Descriptive Statistics (Will Begin Momentarily) Jim Higgins, Ed.D.

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

11-1 Descriptive Statistics

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Computing Descriptive Statistics Argosy University

Statistics: Data Analysis and Presentation. Fr Clinic II

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Summary Statistics Using Frequency

Math227 Sample Final 3

Exam 1 - Practice Exam (Chapter 1,2,3)

Week 13, 11/12/12-11/16/12, Notes: Quantitative Summaries, both Numerical and Graphical.

Basic Statistics, Sampling Error, and Confidence Intervals

DIGITAL VERSION. Microsoft EXCEL Level 2 TRAINER APPROVED

Name: Class: Date: 1. Use Figure 2-1. For this density curve, what percent of the observations lie above 4? a. 20% b. 25% c. 50% d. 75% e.

The Dummy s Guide to Data Analysis Using SPSS

Biostat Exam 10/7/03 Coverage: StatPrimer 1 4

Measurement and sampling

Test Name: Test 1 Review

Central Tendency. Ch 3. Essentials of Statistics for the Behavior Science Ch.3

CHAPTER FIVE CROSSTABS PROCEDURE

SPSS Guide Page 1 of 13

Test lasts for 120 minutes. You must stay for the entire 120 minute period.

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Bar graph or Histogram? (Both allow you to compare groups.)

DDBA8437: Central Tendency and Variability Video Podcast Transcript

Slides Prepared by JOHN S. LOUCKS. St. Edward s s University Thomson/South-Western. Slide

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer

Mathematics in Contemporary Society - Chapter 5 (Spring 2018)

SPSS Instructions Booklet 1 For use in Stat1013 and Stat2001 updated Dec Taras Gula,

Section Sampling Techniques. What You Will Learn. Statistics. Statistics. Statisticians

Determining Effective Data Display with Charts

Chapter 5. Statistical Reasoning

Descriptive Statistics

BAR CHARTS. Display frequency distributions for nominal or ordinal data. Ej. Injury deaths of 100 children, ages 5-9, USA,

Chapter 1. * Data = Organized collection of info. (numerical/symbolic) together w/ context.

1-Sample t Confidence Intervals for Means

Module 1: Fundamentals of Data Analysis

Chapter 2 Ch2.1 Organizing Qualitative Data

Exam 1 - Practice Exam (Chapter 1,2,3)

Gush vs. Bore: A Look at the Statistics of Sampling

Creating Simple Report from Excel

e-learning Student Guide

Excel 2011 Charts - Introduction Excel 2011 Series The University of Akron. Table of Contents COURSE OVERVIEW... 2

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

Approaches, Methods and Applications in Europe. Guidelines on using SPSS

CH 2 - Descriptive Statistics

Introduction to Statistics. Measures of Central Tendency

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) ORDINARY CERTIFICATE IN STATISTICS, 2003

STAT/MATH Chapter3. Statistical Methods in Practice. Averages and Variation 1/27/2017. Measures of Central Tendency: Mode, Median, and Mean

ANALYSING QUANTITATIVE DATA

Social Studies 201 Fall Answers to Computer Problem Set 1 1. PRIORITY Priority for Federal Surplus

AP Statistics Test #1 (Chapter 1)

To provide a framework and tools for planning, doing, checking and acting upon audits

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

Online Student Guide Types of Control Charts

Fundamental Elements of Statistics

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Overview. Presenter: Bill Cheney. Audience: Clinical Laboratory Professionals. Field Guide To Statistics for Blood Bankers

統計學 Fall 2004 授課教師 統計系余清祥 日期 2004年9月14日 第一週 什麼是統計 Slide 1

Review Materials for Test 1 (4/26/04) (answers will be posted 4/20/04)

Advanced Higher Statistics

REPORTING ON HISTORICAL CHANGES IN YOUR DATA

This paper is not to be removed from the Examination Halls

Chapter 2. Describing Data (Descriptive Statistics)

Transcription:

Stat Camp for the MBA Program Daniel Solow Lecture 1 Exploratory Data Analysis What is Statistics? Statistics is the art and science of collecting, analyzing, presenting and interpreting data, which are information you have or can obtain. Business Statistics helps managers make more informed decisions. Descriptive Statistics Inferential Statistics Describes properties of large data sets with a few summary numbers or graphs. Helps you make decisions when you can obtain only a portion of the desired data. 1 2 Where Is Statistics Needed? Market survey/research A market survey says your market share is 19% with margin of error of 3%. What does this mean? Manpower planning A bank wants to know how many tellers they should have during the busiest time on a given day? Quality control A machine is set to produce parts with a length of 2 inches. A part just produced has a length of 2.1 inches. Should you stop the production and reset the machine? 3 Where Is Statistics Needed? Forecasting How much sales can I expect next quarter? Premiums and Warranties What should the insurance premium be for a particular class of customers? You have just introduced a new automobile tire in the market. How many miles of warranty should you offer on this product? Fun and Games I bet that this class has at least two persons with the same birthday (day and month). Should you take this bet? 4 1

Inferential Statistics Example 1: Suppose you want to know the average length of iron bars produced by your machine. Population: All iron bars produced on that machine. Number of interest for each item: Length of the bar. Parameter: Average length of all iron bars =. In such situations, there are a large number of items you are interested in, which is called the population. Every item in the population has a number of interest. You want to know the value of one number associated with the whole population, called the parameter. 5 Inferential Statistics Example 2: You want to know your market share (the fraction of customers that purchase your product). Population: All people that buy this product. Number Associated with Each Item in the Population: 1, if that person buys your product 0, if that person does not buy your product Parameter: = fraction of the population that buys your product. 6 Inferential Statistics In general, you can never know the value of the parameter of a population (why?). Because there are too many items in the population. In such cases, you should compute your best estimate (statistic) from a manageable subset of data (sample) collected randomly from the population. Population Random Sample parameter is unknown best estimate statistic sample 7 Example 1 (Iron Bars): Inferential Statistics Collect a sample of n iron bars (iron bar i has a length x i ). Compute the following statistic (sample mean): Example 2 (Market Share): Collect a sample of n people from the population of people that buy the product (each person i has a value x i of 1 or 0). Compute the following statistic (sample proportion): y = number in the sample who buy your product 8 2

Data Data are information that are collected, summarized and analyzed for presentation and interpretation. Cross-Sectional: Data collected at the same point in time. Time Series: Data collected over several time periods. Example: The Data Files web site on the first page of these notes has the following file shadow02.xls with data on certain stocks. 9 Qualitative Quantitative Exchange Classes: OTC AMEX NYSE Mkt Cap Classes: 0-50 50-100 100-150 150-200 200-250 10 Data Sets As shown on the previous slide, Elements: Entities on which data are collected (the 25 different companies in the shadow-stocks example). Variable: A characteristic of the elements you are interested in and whose value varies (Exchange, Ticker Symbol, and so on). Class: A group consisting of one or more values for a variable. Types of Statistical Data Qualitative (non-numeric) Nominal values cannot be compared in terms of order (color, stock exchange, and so on) Ordinal values can be compared in terms of order (rank, quality level, satisfaction) Quantitative (numeric) Interval difference between values is meaningful (birth year, customer arrival time) Ratio ratio of two values is meaningful (income, age, height, inventory level) 11 12 3

Example: MBA SURVEY Identify the Data Type What is your height in inches? RATIO What is your gender? NOMINAL Attitude toward this Course on 1 to 6 scale: 1 = seriously worried (strongly dreading this), 6 = enthused & confident (eager to start) Do you smoke? NOMINAL WWW purchases (in $) over past year. ORDINAL RATIO 13 Descriptive Statistics Descriptive statistics is the art of summarizing a data set using either: Graphical Methods (Charts) Numerical Methods All done with computer software packages. Used all the time in annual reports, news articles, research studies. Different for qualitative and quantitative data. 14 Summarizing Qualitative Data File SoftDrink.xls Variable: Soft Drink Frequency Distribution: A table listing the number of elements in each class. Frequency Distribution Value Frequency Coke Classic 19 Diet Coke 8 Dr. Pepper 5 Pepsi-Cola 13 Sprite 5 Total 50 15 (See the files UsingSPSS_ Intro.ppt and UsingSPSS_ Descriptive Stats.ppt) To Open an EXCEL file: Click on file/open/data. Under Files of Type use.xls files. Using SPSS for Frequency Table 16 4

SPSS Output Using SPSS for a Bar Graph Bar Graph: A graph with the classes on the x-axis and the frequencies (or percentages) on the y-axis. Click on Graphs/Legacy Dialogs/Bar. The relative frequency table shows the proportion (or fraction) of elements in each class. You can display both the frequency and relative frequency tables in a graphical form for easy visualization. 17 Click on Simple then Define. Drag the var. to the Category axis and click either N of Cases or % of Cases. 18 SPSS Output Using SPSS for a Pie Chart Pie Chart: A circle having one slice for each class, with the size of each slice proportional to the relative frequency of that value.. Click on Graphs/Legacy Dialogs/Pie. Click Define Move the var. into the Slice By box and click % of Cases. Click OK 19 20 5

SPSS Output Summarizing Quantitative Data With quantitative data, the classes have to be determined by the statistician. Given the minimum and maximum data values: Determine the number of non-overlapping classes (usually 5 20). Too few classes: variation does not show. Too many classes: too much detail. The class widths and class limits are then determined from the number of classes. lower limit upper limit 21 [ ][ ][ ][ ][ ] min width max 22 Graphical Methods for Summarizing Quantitative Data Tabular Summaries Frequency Distributions Number of items in each class Relative Frequency (percentage of items in each class) Cumulative (everything up to a certain value) Graphical Summaries Histograms (like a bar chart) 23 Example: Audit Times File audit.xls Here, try 5 classes, so Class Width = (max min) / classes = (33 12) / 5 = 4.2 5 (round up) Class Limits shows the smallest and largest values in the class. 10-14 15-19 20-24 25-29 30-34 min max 24 6

Frequency Table The frequency table is constructed by counting how many data items fall within each class (relative frequency table for percentages). Audit time (days) Frequency Rel. Frequency (%) 10-14 4 20% 15-19 8 40% 20-24 5 25% 25-29 2 10% 30-34 1 5% 25 Histogram A histogram is a plot of a frequency distribution. Classes on the x-axis. Frequencies or relative frequencies on the y-axis. Similar to bar graph, only now the bars are not separated. In SPSS: Choose Graph/Legacy Dialogs/Histogram, move the variable to the Variable box, and then customize the plot. In EXCEL: First create a column of bins (upper class limits), then choose Tools/Data Analysis/Histogram. 26 Histogram of Audit Times EXCEL Histogram of Audit Times 27 28 7

Numerical Summaries of Data Location, Average, Central Tendency Mean Median, Percentiles, Quartiles Mode Variation (how spread out the numbers are) Range Variance, Standard Deviation Shape Skewness MEAN MEAN = Arithmetic Average 29 30 Example: Invention Development Time (Develop.xls) Invention Development Time Automatic Transmission 16 Ballpoint Pen 7 Filter Cigarettes 2 Frozen Foods 15 Helicopter 37 Instant Coffee 22 Minute Rice 18 Nylon 12 Photography 56 Invention Development Time Radar 35 Radio 24 Roll-On Deodorant 7 Telegraph 18 Television 63 Transistor 16 Video Cassette Recorder 6 Xerox Copying 15 Zipper 30 An invention on average takes 22.167 years to develop. In Excel: AVERAGE(range) 31 MEDIAN (splits data in half) MEDIAN = middle value when data values are sorted from low to high... At least 50% of values are below the median and at least 50% are above the median. If sample size (n) is even, the median is the mean of the two middle values. What is the median development time? 32 8

Example: Invention Development Time Median = (16+18)/2 = 17 In Excel: MEDIAN(range) Mean vs. Median The mean is the most commonly used measure of location. However the mean is affected by extremely large or small values. In those cases the median may be a more reliable measure of location. 33 34 Example: Salaries Example: Invention Development Time Employee Salary John 30,000 Doe 32,000 Smith 32,000 Perry 33,000 Sweeney 200,000 Mean = 65,400 Median = 32,000 Median = 17 Mean = 22.167 35 36 9

SYMMETRIC DATA RIGHT SKEWED DATA 50% 50% Mean Median Mean = Median 37 Median Mean Long Right Hand Tail Mean > Median 38 LEFT SKEWED DATA Percentiles Think about your numerical data values lying on a line: Mean Long Left Hand Tail Mean < Median Median 39 At least p % are p th percentile At least 100 p % are The p-percentile is a number such that: About p% of your data values are that number and About (100 p)% of your data values are that number. Example: The 90 th percentile on the GMAT is a score so that about 90% of people s GMAT scores are that number and about 10% are that number. 40 10

Quartiles Q 1 = First quartile = 25 th percentile = a value so that about 25% of the elements are that value and about 75% are that value. Q 2 = Second quartile = 50 th percentile = a value so that about 50% of the elements are that value and about 50% are that value = the median.. Q 3 = Third quartile = 75 th percentile = a value so that about 75% of the elements are that value and about 25% are that value. Percentiles in EXCEL: (file salary.xls) 41 42 Percentiles in SPSS (File salary.xls) Analyze; Descriptive Statistics; 123 Frequencies; then move the desired variable to the Variable(s) box; then click on Statistics; then click Percentile(s) and type your desired percentiles and Add; then click Continue and OK. MODE The mode of a variable is the value or category that occurs most often in the batch of data. A data set can have more than one mode (bimodal, trimodal). 43 44 11

Example: Invention Development Time Modes: 7, 15, 16, 18 In Excel: MODE(range), which returns only one of these values. Do It Yourself Example: Blood Problem Suppose that the number of pints per day of whole blood used in transfusions at a hospital over the previous 11 days is: 25, 18, 61, 12, 18, 15, 20, 25, 17, 19, 28. Use the file blood.xls and Excel to: Find and interpret the mean, median and mode(s). 45 46 Is the Mean Enough? In the Blood Problem, an average of 23.45 pints of blood are used on a day. Question: Does this mean you should have exactly 23.45 pints of blood available? No. Why not? Answer: Because the amount of blood you need varies, that is, there is variation in the blood data. Question: How much variation is there? Answer: What is needed is a numerical value to represent how much variation there is in the data. Example: Range = Largest Value Smallest Value 47 Variance Variance is a number 0 that measures how close the data values are to the mean. µ Var. is small µ Var. is larger Variance is generally a relative measure. More reliable measure of variation than the range. Uses all the data. There are two different formulas, depending on whether you are computing the population variance or sample variance (see the handout formulas.pdf). Consider the following example for managing the amount of blood at a hospital (file blood.xls). 48 12

Example: Blood Problem (blood.xls) Population Variance = population mean x i = value of the i th item (x i ) = deviation of i th item from (x i ) 2 = square deviation of i th item Variance = average of the square deviations: In Excel: VAR.P(range) 49 50 Sample Variance = sample mean x i = value of the i th item (x i ) = deviation of i th item from (x i ) 2 = square deviation Sample Variance = In Excel: VAR.S(range) 51 Standard Deviation Square root of the variance. Expressed in the same units as the data. More intuitive measure of variability. Blood Problem Sample Variance = S 2 = 177.07 Sample Standard Deviation = S = = 13.31 In Excel: Sample Std. Dev. = STDEV.S(range) Pop. Std. Dev. = STDEV.P(range) Under circumstances you will learn soon, the std. dev. has a useful interpretation) 52 13

Using EXCEL and SPSS to Compute Descriptive Statistics Both EXCEL and SPSS can automatically compute all of the descriptive statistics. In EXCEL: Tools/Data Analysis/Descriptive Statistics In SPSS: Analyze/Descriptive Statistics/Frequencies Click on the Statistics box and select all of the descriptive statistics you want (including the percentiles). EXCEL and SPSS are now illustrated on the data in the file salary.xls. Descriptive Statistics in Excel To compute descriptive statistics in EXCEL, in the Data tab, use the Data-Analysis add-in and choose Descriptive Statistics: 53 54 EXCEL Salary Example Descriptive Statistics in SPSS To compute descriptive statistics in SPSS, use the Analyze/Descriptive Statistics/Frequencies and then on the bottom of the screen, click on Statistics and choose the statistics you want reported: 55 56 14

SPSS Salary Example Relationship Between Two Variables So far you have seen ways to analyze information about a single variable. One is often interested in the relationship between two or more variables. Examples of relationships Advertising expenditures and sales. Company profits and stock price. Home size and sales price. 57 58 File stereo.xls Example: Stereo Store Is there any relationship between the number of commercials and the sales levels? Scatter Diagrams in Excel In Excel, select the two columns of data; click on the Insert tab; then on the Scatter icon; then on the top left diagram. Number of commercials on the x-axis. Sales levels on the y-axis. 59 60 15

Scatter Diagrams in SPSS Plot of two variables on the same graph. In SPSS, choose Graphs/Legacy Dialogs/ Scatter then choose Simple and click on Define Number of commercials on the x-axis. Sales levels on the y-axis. 61 Covariance and Correlation The sample and population covariance of two variables X and Y are numbers whose sign have the following meaning: COV(X,Y) > 0 means that the two variables tend to move in the same direction if one increases (decreases), then the other increases (decreases). COV(X,Y) < 0 means that the two variables tend to move in opposite directions if one increases (decreases), then other decreases (increases). The value of the covariance is hard to interpret, so the covariance is converted to a number between 1 and +1 called the correlation of X and Y that indicates how strongly X and Y are correlated. 62 Covariance and Correlation For two variables X and Y for which you have n pairs of data in the form (x 1, y 1 ),, (x n, y n ), the covariance and correlation are computed by: Population Sample Cov. and Correlation in EXCEL COV(X, Y): COR(X, Y): Note: COVARIANCE.P and COVARIANCE.S in Excel compute the population and sample covariance XY. CORREL computes the sample correlation = population 63 64 correlation. 16

Cov and Correlation in SPSS In SPSS, choose Analyze/Correlate/Bivariate. On the next menu, click on Options. Select Cross-Product Deviations and Covariances. Click Continue and, on the previous menu, OK. 65 17