Chapter 1 Introduction: The Role of Statistics in Engineering

Similar documents
CHOSUN UNIVERSITY-SEOK-GANG,PARK CHAPTER 1 SECTION 1: WHAT IS STATISTICS?

majority, plurality Polls are not perfect Margin of error Margin of error

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc.

BUSS1020. Quantitative Business Analysis. Lecture Notes

Gush vs. Bore: A Look at the Statistics of Sampling

Inferential Statistics:

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition

Wooldridge, Introductory Econometrics, 4th ed. Chapter 1: Nature of Econometrics and Economic Data. What do we mean by econometrics?

4 : Research Methodology

UNIVERSITY OF MORATUWA

Displaying Bivariate Numerical Data

Introduction to Business Statistics QM 120 Chapter 1

Multiple Choice Questions Sampling Distributions

Introduction to Sample Surveys

8. Researchers from a tire manufacturer want to conduct an experiment to compare tread wear of a new type of tires with the old design.

1. WHAT IS STATISTICS? Statistics

Statistics 201 Summary of Tools and Techniques

If you are using a survey: who will participate in your survey? Why did you decide on that? Explain

Online Supplementary Materials. Appendix 1: Flags in the 2012 Election Campaign

e-learning Student Guide

QUANTITATIVE TECHNIQUES SECTION I

Chapter 6 - Statistical Quality Control

AP Statistics: Chapter 4.1, Pages Summarized by Lauren Chambers, Logan Wagner, Jack Thompson

Audits of Grant or Contribution programs May 2001

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Data Collection Instrument. By Temtim Assefa

Chapter 1 Introduction to Statistics

Distinguish between different types of numerical data and different data collection processes.

Introduction to Research

Statistics Summary Exercises

Statistics Year 1 (AS) Unit Test 1: Statistical Sampling

Hint: Look at demonstration problem 3-3 for help in solving this problem.

Topic 2 Market Research. Higher Business Management

Big Data. Unauthenticated Download Date 10/2/18 4:46 PM

VIII. STATISTICS. Part I

CHAPTER 10. Graphs, Good and Bad

Appendix to Chapter 4, entitled Obama: Not Ready to Lead from The Obama Victory: How Media, Money, and Message Shaped the 2008 Election

Third-Party Voter Registration Organization 3PVRO for short! HELPFUL HINTS

Skills we've learned

AP Statistics Cumulative Practice Test 3 (Ch 1 Ch 13) Section I: Multiple Choice Choose the best answer.

CHAPTER 21A. What is a Confidence Interval?

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Statistics for Business and Economics

QUESTION 2 What conclusion is most correct about the Experimental Design shown here with the response in the far right column?

What proportion of the items provide enough information to show that they used valid statistical methods?

The top five ways to develop an objective, informative white paper

Topic 1: Descriptive Statistics

and Forecasting CONCEPTS

BUSS1020 Quantitative Business Analysis

IT Audit Process. Michael Romeu-Lugo MBA, CISA November 4, IT Audit Process. Prof. Mike Romeu

Lesson-9. Elasticity of Supply and Demand

Continuous Improvement Toolkit

Introduction to Statistics

AQR Unit 2: Probability Contingency Table Problems. Name: Date:

Survey Sampling. Situation: Population of N individuals (or items) e.g. students at this university light bulbs produced by a company on one day

Treatment of Influential Values in the Annual Survey of Public Employment and Payroll

SCHOOL OF ACCOUNTING AND BUSINESS BSc. (APPLIED ACCOUNTING) GENERAL / SPECIAL DEGREE PROGRAMME

Examples of Statistical Methods at CMMI Levels 4 and 5

International Program for Development Evaluation Training (IPDET)

Understanding Inference: Confidence Intervals II. Questions about the Assignment. Summary (From Last Class) The Problem

CHAPTER 1 INTRODUCTION TO STATISTICS

Key Concept Overview

Validation of a Multiple Linear Regression Model

Quantitative Methods

The City of Edinburgh Council and the Leith Neighbourhood Partnership

Lecture (chapter 7): Estimation procedures

EFFECT OF ROCK IMPURITIES AS DETERMINED FROM PRODUCTION DATA FINAL REPORT. Hassan El-Shall and Regis Stana Principal Investigators.

MARKETING INFORMATION SYSTEM AND MARKETING RESEARCH

Chapter 10 Regression Analysis

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

Chapter 19. Confidence Intervals for Proportions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

6. The probability that you win at least $1 both time is (a) 1/2 (b) 4/36 (c) 1/36 (d) 1/4 (e) 3/4

STATISTICAL TECHNIQUES. Data Analysis and Modelling

1 Format for a plebiscite question

Conducting Meaningful On-Farm Research and Demonstrations

Section Sampling Techniques. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

PEW RESEARCH CENTER ELECTION QUESTIONS

THE NORMAL CURVE AND SAMPLES:

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

EST Accuracy of FEL 2 Estimates in Process Plants

Impact of Advertisement on Consumer Buying Behavior for FMCG products

CHAPTER V ANALYSIS AND INTERPRETATION II (STATISTICAL ANALYSIS)

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

The Impact of E-Commerce and GDP on the Revenue of One. Rezarta Zhaku-Hani, Lirinda Vako Abedini

Introduction to Business Research 3

Marketing information system

1-Sample t Confidence Intervals for Means

Lecture 9 - Sampling Distributions and the CLT

Operating Characteristic Curves for Acceptance Sampling for Attributes

Using SPSS for Linear Regression

STATISTICS PART Instructor: Dr. Samir Safi Name:

Confidence Intervals

D R A F T Rev. 2 Carolyn Mounce

MANY important decisions are based on measurement results. Some of. Measurement Uncertainty. Chapter 1

DATA BASE AND RESEARCH METHODOLOGY

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical and Industrial Engineering Indian Institute of Technology, Roorkee

POLI 343 Introduction to Political Research

Abstract. Background and Introduction. The Concept of Damage. Quality Delivered. Plug n Play

Econometrics is: The estimation of relationships suggested by economic theory

Test 6D (cumulative) AP Statistics Name:

Transcription:

1 Chapter 1 Introduction: The Role of Statistics in Engineering Example: The manufacturer of a medical laser used in ophthalmic surgery wants to be able to quote quality characteristics of the laser to potential customers. One characteristic that they want to use is the average lifetime of the laser under normal use. They could obtain the exact average lifetime by running each laser produced until it wears out, recording the lifetime for that laser, and finding the average over all lasers produced. They would then know the exact average lifetime for this type of laser. The drawback to this procedure is that they would be left with no product to sell. In order to both stay in business and advertise quality characteristics to potential customers, they need to find a way to estimate the average lifetime from a relatively small sample of lasers. Since they are using only some of the lasers produced, not all, their estimate will have some uncertainty. One major use of statistics is to quantify the degree of uncertainty in such situations. Definition: Statistics is the branch of applied mathematics that deals with collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling. There are two general branches of statistics: 1) Descriptive statistics and 2) Inferential Statistics. Definition: Descriptive statistics consists of methods of organizing, summarizing, and presenting data in an informative way. Graphical techniques allow us to present summaries of data in pictorial form, so that data characteristics may be easily seen.

Numerical techniques provide various summary values which represent the characteristics of the data set. Definition: A unit is a single entity, usually an object or a person, whose characteristics are of interest to the researcher. 2 Definition: A population of units is the set of all items of interest in a statistical problem. Example 1: All registered voters in Florida in November, 2012. Example 2: All cars of a certain model coming off an assembly line in October, 2009. Example 3: All 12-oz. cans of Pepsi-Cola produced at a certain factory in the year 2009. Definition: A statistical population is the set of all measurements corresponding to each unit in the entire population of units. Note: We will generally use the term population to refer to either a population of units or a statistical population. Definition: A parameter is a numerical characteristic of a population. Example 1: Proportion of all registered voters in Florida who intend to vote for Pres. Barack Obama in November, 2012. Example 2: Average time until first major repair job for all cars of a certain model coming of an assembly line in October, 2009. Example 3: Average amount of Pepsi-Cola, by weight, in all 12-oz. cans of Pepsi-Cola produced at a certain factory in the year 2009.

3 Definition: A sample is a subset of a population. We will also use the term sample to denote the subset of measurements that are actually collected by the researcher. Example 1: One thousand randomly selected registered voters from across Florida in October, 2012, selected from among those who voted in the previous election, or are newly registered. Example 2: Every 50 th car of a certain model coming off an assembly line in October, 2009. Example 3: Every 100 th 12-oz. can of Pepsi-Cola produced at a certain factory in the year 2009. Note 1: The description of a sample must refer to the population from which the sample was selected. Note 2: Depending on the method of selection, a sample may or may not be representative of the population from which it was selected. Definition: A statistic is a numerical characteristic of a sample. Example 1: The proportion of one thousand randomly selected voters from across Florida in October, 2012 who intend to vote for Pres. Barack Obama. Example 2: The average time until the first major repair job for a sample consisting of every 50 th car of a certain model coming off an assembly line in October, 2009. Example 3: The average amount of Pepsi-Cola in a sample consisting of every 100 th can of Pepsi-Cola produced at a certain factory in the year 2009.

Definition: Inferential statistics Consists of methods of drawing conclusions about the characteristics of a population based on the information obtained from a sample selected from the population. Inferential statistics is divided into the fields of 1) estimation of parameters and 2) hypothesis testing. An example of inferential statistics is the estimation of the average lifetime for the population of medical lasers based on data from a small sample of the lasers. 4 Note: Inferential statistics amounts to making decisions based on incomplete information. Note: The particular statistical inferential technique used depends strongly on the method by which the sample was selected from the population. Definition: A representative sample is a sample whose characteristics reflect the characteristics of the population from which the sample was selected. Example 1: Is this sample representative? If the sample was randomly chosen, then it has a good chance of being representative of the population. Example 2: Is this sample representative? What if there were some cyclically occurring flaw in the manufacturing process which affected every 50 th car produced? Example 3: Is this sample representative? What if there were some flaw in the manufacturing process which led to overfilling of every 100 th can? Note: A primary reason for working with samples instead of entire populations is that often the populations are too large to handle easily.

Example 1: All registered voters in Florida in October, 2012. There are approximately 6,000,000 of them. Example 2: How easy would it be to actually examine the entire months production of cars, following them over time to see when the first major repair job was required? Example 3: Would we actually want to weigh the amount of Pepsi-Cola in every 12-oz. can coming off the assembly line at a certain factory in 2009? Whenever we infer a population characteristic based on sample data, there is always the chance that our inference will be incorrect. Example: A public opinion poll conducted in 1936 for Literary Digest Magazine (R.I.P.) predicted that Alf Landon would defeat Franklin Delano Roosevelt in the Presidential election by a 3 to 2 margin. Actually, F.D.R. won 62% of the ballots. Why was the prediction so incorrect? 1) The pollsters sent out 10 million sample ballots to prospective voters, based on the magazine s subscription list and on telephone directories. (Poor identification of population.) 2) Only 2.3 million of the mailed ballots were actually returned. (Self-selection of sample.) Note: To do valid statistical inference, we need a sample which is likely to be representative of the population. We want to build into our statistical inferential procedures measures of reliability, which will tell us how likely it is that our inference is correct/incorrect. These measures of reliability depend on the sampling method used. For estimation of parameters based on sample statistics, the measure of reliability is called the confidence level. For testing hypotheses about parameters based on sample statistics, the measure of reliability is the significance level. 5

6 Definition: A simple random sample of size n is a sample drawn from a population by a method which makes every sample of size n equally likely to be chosen. Equivalently, the sampling method insures that every member of the population has an equal chance to be selected. Steps in choosing a SRS of size n: 1) Obtain a list of all members of the population; this list is called a sampling frame. (Note: This is the most difficult step in the whole process, and is also error-prone.) 2) Assign a unique ID number to each member of the population. 3) Go to a table of random numbers; choose a convenient starting point; go down the column, recording numbers within the range of the assigned ID numbers, until n distinct numbers are selected. 4) The population members that have the ID numbers obtained by this process make up the SRS of size n. (Step 2 may also be done using your calculator) Note: We can never be absolutely certain that our sample is representative, but simple random sampling gives us a good chance. Example: I want to estimate the average height of the class, without gathering height data for every person in the class. I will select a simple random sample of size 3 and use the average height of the members of the sample as the estimate of the average height of the members of the class. I assign a unique ID number to each person in the class; the first person on the class roll will have the ID number 001, the second person 002, etc. I then go to a table of random numbers, open it, and blindly choose a starting point. Reading down the column from the starting point, I find 3 distinct two-digit numbers within the range of the values of the ID numbers. The

class members with these 3 ID numbers constitute the SRS. 7 Collecting Engineering Data Observational study: members of the sample are simply observed, during routine operation, with measurements taken - To build empirical models - Cause-and-effect relationships cannot be confirmed Designed experiment: the engineer makes deliberate, purposeful changes in controllable variables (called factors), and observes the results of these changes - Designing and running very efficient experiments - Cause-and-effect relationships can be examined, using: Hypothesis testing and parameter estimation A carefully designed data collection procedure (including the method of selecting a sample from the population) will usually lead to interpretable and useful results; a poorly designed data collection procedure will often lead to worthless data. As R. A. Fisher said, "Often the only thing you can do with a poorly designed experiment is to try to find out what it died of." Example of an Observational Study to Build an Empirical Model 1 The table contains data collected on three variables in an observational study conducted at a semiconductor manufacturing plant. In this plant, the semiconductor is wire-bonded to a frame. The variables are: Pull Strength the force required to break the bond; Wire Length; and Die Height. We want to be able to predict the Pull Strength by knowing the Wire Length and Die Height.

8 1 Montgomery, D. C.; Runger, G. C.; and Hubele, N. F. Engineering Statistics, 3 rd Edition, John Wiley & Sons, Inc. (2004). The linear regression model that we want to estimate has the following form (We will cover linear regression in Chapter 11): Pull Strength 0 1( Wire Length ) 2( Die Height ) We wish to: a) Test to see whether this model adequately represents the relationship between Pull Strength and Die Height (is the relationship linear?), and b) estimate the values of the constant term

and the coefficients in this equation, so that we will have an useful model. (The term ε in the equation is a random error term.) 9