CHAPTER 4. Labeling Methods for Identifying Outliers
|
|
- Trevor Woods
- 5 years ago
- Views:
Transcription
1 CHAPTER 4 Labeling Methods for Identifying Outliers 4.1 Introduction Data mining is the extraction of hidden predictive knowledge s from large databases. Outlier detection is one of the powerful techniques of data mining. There are many authors defined outliers in different words, Hawkins (1980) defined as An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism. Outliers are also referred to as discordance, deviants, abnormalities or anomalies in data mining and statistics literature Aggarwal (2005). Some outlier labeling methods such as the Standard Deviation (SD), the MADe and the Median rule are commonly used. These methods are quite reasonable when the data distribution is normal.in non random distributions, outliers can decrease normality. When data depart from a normal distribution, a transformation to normality is simply a common step in order to identify outliers 92
2 Chapter 4. Labeling Methods for Identifying Outliers 93 using a method which is quite effective in a normal distribution. Quesenberry & David (1961) discussed on the rejection and location of outlying observations that there might be several ways of approaching the problem, which depended to a large extent on the object in view. One might be particularly interested in identifying the genuinely exceptional observations, in order to a new insight into the phenomena under study the basis of risks of misclassification rather than of estimation errors. Grubbs (1969) has statistically determining whether the highest observation, the highest and lowest observations, the two highest observations, the lowest observations, or more of the observations in the sample are statistical outliers. Most outlier labeling methods, informal tests, generate an interval or criterian for outlier detection instead of hypothesis testing, and any observations beyond the interval or criterian is considered as an outlier. Various location and scale parameters are mostly employed in each labeling method to define a reasonable interval or criterian for outlier detection. There are two reasons for using an outlier labeling method. 1. To find the possible outliers as a screening device before conducting a formal test. 2. To find the extreme values away from the majority of the data regardless of the distribution. while the formal tests are usually require test statistics based on the distribution assumptions and a hypothesis to determine if the largest extreme value is a true outlier of the distribution, most outier labeling methods present the interval using the
3 Chapter 4. Labeling Methods for Identifying Outliers 94 location and scale parameters of the data. Although the labeling method is usually simple to use, some observations outside the interval may turn out to be falsely identified outliers after a formal test when the outliers are defined as only observations that deviate from the assuming distribution. However, if the purpose of the outlier detection is not a preliminary step to find the extreme values violating the distribution assumptions of the main statistical analyses such as the t-test, ANOVA and regression but mainly to find the extreme values away from the majority of the data regardless of the distribution, the outlier labeling methods may be applicable. In addition, for a large data set that is statistically problematic, e.g. when it is difficult to identify the distribution of the data or transform it into a proper distribution such as the normal distribution, labeling methods can be used to detect outliers Issues of Outliers outliers. Iglewicz & Hoaglin (1993) categorized the three following issues with regards to Outlier labeling flag potential outliers erroneous data, indicative of an inappropriate distributional model for further investigation. Outlier accommodation It is used to robust statistical techniques that will not be unduly affected by outliers. That is, if we cannot determine that potential outliers are erroneous observations, do we need modify our statistical analysis to more appropriately account for these observations.
4 Chapter 4. Labeling Methods for Identifying Outliers 95 Outlier identification It used to formally test whether observations are outliers. This chapter focuses on the outlier labeling technique and issues of outlier identification. Many real-time data sets contain outliers that have unusually large or small values when compared with others in the data set. Outliers may cause a negative effect on data analyses, such as ANOVA and regression, based on distribution assumptions, or may provide useful information about data when we look into an unusual response to a given study. Thus, outlier detection is an important part of data analysis in the above two cases. Several outlier labeling methods have been developed. Some methods are sensitive to extreme values, like the SD method, and others are resistant to extreme values, like Tukey s method. Although these methods are quite powerful with large normal data, it may be problematic to apply them to non-normal data or small sample sizes without knowledge of their characteristics in these circumstances. This is because each labeling method has different measures to detect outliers, and expected outlier percentages change differently according to the sample size or distribution type of the data. Many kinds of data regarding public health are often skewed, usually to the right, and lognormal distributions can often be applied to such skewed data, for instance, surgical procedure times, blood pressure, and assessment of toxic compounds in environmental analysis.
5 Chapter 4. Labeling Methods for Identifying Outliers Methods of Analysis In this section, several outlier labeling methods are available among them four of the labeling methods such as Z-Score, Modified Z-Scores, Median Absolute Deviation (MADe) and Tukey Method (Boxplot) are used in the studies. FIGURE 4.1: Flowchart for Outlier Labeling Methods
6 Chapter 4. Labeling Methods for Identifying Outliers Z - Scores Z-Score is a statistical measurement of a score s relationship to the mean in a group of scores. Z-score of 0 means the score is same as the mean. It can also be positive or negative, indicating whether it is above or below the mean and by how many standard deviations. This method that can be used to identifying outliers in the dataset is the Z-score, using the mean and standard deviation. Z scor e (i )= x i x, (4.1) s where s= 1 n 1 n (x i x) 2 i=1 The Z-scores based on the property is that if X follows a normal distribution, N(µ,σ 2 )then Z follows a standard normal distribution, z = x µ σ N(0,1), and Z scores that exceed 3 in absolute value are generally considered as outliers. This method is simple and it is the same formula as the 3 SD method when the criterion of an outlier is an absolute value of a Z-score of at least 3. According to Shiffler (1988), a possible maximum Z-scores is dependent on sample size and it computed as(n 1)/ n. Since no z-score exceeds 3 in a sample size less than or equal to 10, the z-score method is not very good for outlier labeling, particularly in small data sets. Another
7 Chapter 4. Labeling Methods for Identifying Outliers 98 limitation of this rule is that the standard deviation can be inflated by a few or even a single observation having an extreme value. Thus it can cause a masking problem, i.e., the less extreme outliers go undetected because of the most extreme outlier(s). Although it is common practice to use Z-scores to identify possible outliers, this can be misleading (partiucarly for small sample sizes) due to the fact that the maximum Z-score is at most (n 1)/ n Interpretation of Z-Scores Here it is interpretion steps for z-scores. 1. z-score less than 0 represents an element less than the mean. 2. z-score greater than 0 represents an element greater than the mean. 3. z-score equal to 0 represents an element equal to the mean. 4. z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc. 5. z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc. 6. If the number of elements in the set is large, about 68% of the elements have a z-score between -1 and 1; about 95% have a z-score between -2 and 2; and about 99% have a z-score between -3 and 3. Standard scores are also called z-values, z-scores, normal scores, and standardized variables; the use of "Z" is because the normal distribution is also known as the "Z
8 Chapter 4. Labeling Methods for Identifying Outliers 99 FIGURE 4.2: Compares the various grading methods in a normal distribution. Includes: Standard deviations, cumulative percentages, percentile equivalents, Z-scores, T-scores, standard nine, percent in stanine distribution". They are most frequently used to compare a sample to a standard normal deviate, though they can be defined without assumptions of normality. The z-score is only defined if one knows the population parameters; if one only has a sample set, then the analogous computation with sample mean and sample standard deviation yields the Student s t-statistic. The z-score is often used in the z-test in standardized testing the analog of the Student s t-test for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used. A few applications of z-scores include the following: 1. What percentage of people fall below a specific value?
9 Chapter 4. Labeling Methods for Identifying Outliers What values can be deemed extreme? For example, in an IQ test, what scores represent the top 5%? 3. What is the relative score of one distribution versus another? For example, Michael is taller than the average male and Emily is taller than the average female, but who is relatively taller in their own gender? These types of questions can be answered using a z-score. As a general rule, z-scores less than or greater than 1.96 are considered unusual and generally very interesting. They are also synonymous with being statistically significant and outliers Modified Z - Scores The previous problem of Z-Scores was used two estimators the sample mean ( x) and sample standard deviation(s), can be affected by a few extreme values or by even a single extreme value. To resolve this problem the median and the median of the absolute deviation (MAD) are employed in the modified Z - Scores instead of the mean and standard deviation of the sample, respectively (Iglewicz & Hoaglin (1993)). M AD = medi an { x i x } (4.2) where x is the sample median. M i = (x i x) M AD (4.3) where E(M AD)=0.675σ for large normal data. Iglewicz & Hoaglin (1993) suggested
10 Chapter 4. Labeling Methods for Identifying Outliers 101 that observations are labeled outliers when M i >3.5 through the simulation based on pseudo-normal observations for sample sizes of 10, 20 and 40. The M i score is effective for normal data in the same way as the Z-score Median Absolute Deviation (MADe) The Median Absolute Deviation (MADe) method is one of the basic robust methods which are largely unaffected by the presence of extreme values of the data set. This approach is similar to the SD method. However, the median and MADe are employed in this method instead of the mean and standard deviation. It is defined as follows, 2M AD e Method : Medi an± 2M AD e (4.4) 3M AD e Method : Medi an± 3M AD e (4.5) where M AD e = M AD for large normal data and is an estimator of the spread in a data similar to the standard deviation. M AD = medi an { x i medi an(x) }, i = 1,2,...,n (4.6) The MAD is scaled by a factor of it also similar to the standard deviation in normal distribution or Absolute Deviation around the Median as stated in the title is a robust measure of central tendency.
11 Chapter 4. Labeling Methods for Identifying Outliers Tukeys Method (Box Plot) Tukey (1977) method, constructing a boxplot, is well known simple graphical tool to display information about continuous univariate data, such as the median, lower quartile, upper quartile, lower extreme and upper extreme of a data set. This method for finding outliers uses the interquartile range to filter out very large or very small numbers. The formulas are: Low outlier s= Q 1 1.5(Q 3 Q 1 )= Q 1 1.5(IQR) (4.7) Hi g h outlier s= Q (Q 3 Q 1 )= Q (IQR) (4.8) Where: Q1 = first quartile, Q3 = third quartile, IQR = Interquartile range These equations gives two values, or fences. A fence that cordons off the outliers from all of the values that are contained in the bulk of the data. The given following steps for finding outliers using IQR, Step 1 Find the Interquartile Range and Median. Step 2 Find Q1 and Q3. Q1 can be thought of as a median in the lower half of the data. Q3 can be thought of as a median for the upper half of data. Subtract Q1 from Q3. Step 3 Calculate 1.5 IQR and subtract from Q1 to get lower fence Step 4 Add to Q3 to get upper fences
12 Chapter 4. Labeling Methods for Identifying Outliers 103 Step 5 Add fences to the data to identify outliers 4.3 Computation Results and Discussion In this study, the diabetes data was obtained from the primary health center in Tirunelveli. This data has 50 observations for the patient s diabetes levels. It has computed with different output on the several labeling methods. The given methods are computed by open source R software package. Several labeling methods are employed in this study, each methods has different measures for identifying outliers in the data set. It screens the different behavior of the skewness and sample size Computation of Z-Scores In Table 4.1(case - 1) with all the data has included, it appears that the value 50 is outlier, yet no observations exceed the absolute value 3. For Table 4.2 (case - 2), the most extreme value 50 has excluded in the data, 49 and 48 has considered as outliers. This is because the multiple extreme values have artificially inflated the standard deviation Computation of Modified Z-Scores For this method, the computation results are tabulated below and it is compared with z-scores. Table 4.3 shows that the computed data values of the modified Z-scores in absolute value, out of these, this 3 observations (236, 236, and 525), may well be
13 Chapter 4. Labeling Methods for Identifying Outliers 104 TABLE 4.1: Computation and Masking Problem of the Z-Scores(Case-1) Obs.No. x i Z-Score Obs.No. x i Z-Score outliers Computaion of MADe This method was computed from the data set results as follows, from the equations MADe = , Median = 110, MAD = 19. Here the 2 MADe method has identifying 6 outliers which are: 172, 525, 175, 236, 169 and 236. Also, the 3 MADe method has identifying 3 outliers which are: 525, 236 and 236.
14 Chapter 4. Labeling Methods for Identifying Outliers 105 TABLE 4.2: Computation and Masking Problem of the Z-Scores(Case-2) Obs.No. x i Z-Score Obs.No. x i Z-Score Dot Plot for MADe A dotplot is made up of dots plotted on a graph. Here is how to interpret a dotplot. 1. Each dot represents a specific number of observations from a set of data. (Unless otherwise indicated, assume that each dot represents one observation. If a dot represents more than one observation, that should be explicitly noted on the plot.) 2. The dots are stacked in a column over a category, so that the height of the
15 Chapter 4. Labeling Methods for Identifying Outliers 106 TABLE 4.3: Computation of Z-Scores compared with the Modified Z-Scores Z-Scores Modified Z-Scores Modified i x i i x i Z-Scores Z-Scores Case - 1 Case -2 Case - 1 Case column represents the relative or absolute frequency of observations in the category. 3. The pattern of data in a dotplot can be described in terms of symmetry and skewness only if the categories are quantitative. If the categories are qualitative (as they often are), a dotplot cannot be described in those terms.
16 Chapter 4. Labeling Methods for Identifying Outliers 107 Compared to other types of graphic display, dotplots are used most often to plot frequency counts within a small number of categories, usually with small sets of data. FIGURE 4.3: Dotplot for visualize the data with outliers In figure 4.3 the extreme value at x=525 has dragged x+ 2s is the outlier cutoff, above the same two points at x=236, 236. Only the point at x=525 is therefore caught as an outlier, even though the points at x=236, 236 is clearly also an outlier Computaion of Tukey Method(Box Plot) In this method obtained from the result of the dataset is, TABLE 4.4: Tukey method outlier detection using IQR Sample size 50 Lowest value Highest value Arithmetic mean Median Standard deviation Coefficient of Skewness (P<0.0001) Coefficient of Kurtosis (P<0.0001) Suspected outliers(tukey 1977) Outside values Far-out values 525
17 Chapter 4. Labeling Methods for Identifying Outliers 108 FIGURE 4.4: Box and Whisker plot for visualizing outliers The IQR (Inter Quartile Range) is the distance Q1=95.25, Q3=133.5 and IQR = Thus the inner fences is [37.875, ] and outer fence is [19.5, ]. The two extreme values are, 236 and 525 are identified as probable outliers in this method. Figure 4.4 is a boxplot for the dataset. In Figure 4.4, the central box represents the values from the lower to upper quartile (25 to 75 percentile). The middle line represents the median. The horizontal line extends from the minimum to the maximum value, excluding outside and far out values which are displayed as separate points. An outside value is defined as a value that is smaller than the lower quartile minus 1.5 times the interquartile range, or larger than the upper quartile plus 1.5 times the interquartile range (inner fences). A far out value is defined as a value that is smaller than the lower quartile minus 3 times the interquartile range, or larger than the upper quartile plus 3 times the interquartile range (outer fences).
18 Chapter 4. Labeling Methods for Identifying Outliers 109 TABLE 4.5: Number of outliers detected by different outlier labeling methods Methods Cases Cutoff value Outliers Z-Scores I 525 Zi>3 II 236, 236 Modified Z-Scores MAD Mi > , 236, 236 MAD 2MADe MAD>2 169, 172, 175, 236, 236, 525 3MADe MAD>3 525, 236, 236 Tukeys Method Outside values [37.875, ] 236,236 Far outside values [19.5, ] Conclusion The performance of the various outlier labeling methods Z-Score, Modified Z-Scores, MADe and Tukey has been studied statistically using real time dataset to evaluate which of the methods has more powerful way for detecting and handling outliers. Most intervals are used to identify the possible outliers in the outlier labeling methods that are effective under the normal distribution. Z-Scores and Tukey methods are affected by masking problem, for this reason the detection sensitivity is low. MADe is one of the most common ways for finding the outliers in one-dimensional data that is to mark as a potential outlier for any point which is more than two standard deviations. MADe and Modified Z-scores are used in the MAD method. It has identified almost three values 525, 236, 236 which are considered as the outliers. But all the methods can find that the maximum far away value is 525. In MADe method M AD > 2 is identifying six (169, 172, 175, 236, 236, 525) outliers and M AD > 3 is identifying three (525, 236 and 236) outliers. In univariate case, the Median Absolute Deviation is one of the most robust dispersion scales in the presence of outliers, and therefore we recommended the MADe method for outlier detection.
STAT 2300: Unit 1 Learning Objectives Spring 2019
STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems
More informationStatistics Chapter Measures of Position LAB
Statistics Chapter 2 Name: 2.5 Measures of Position LAB Learning objectives: 1. How to find the first, second, and third quartiles of a data set, how to find the interquartile range of a data set, and
More information1. Contingency Table (Cross Tabulation Table)
II. Descriptive Statistics C. Bivariate Data In this section Contingency Table (Cross Tabulation Table) Box and Whisker Plot Line Graph Scatter Plot 1. Contingency Table (Cross Tabulation Table) Bivariate
More informationClassroom Simulation: Indications of Outliers in Boxplots of Normal Data
Classroom Simulation: Indications of Outliers in Boxplots of Normal Data JSM, Seattle, August 6, 2006 Jacob B. Colvin jbcolvin@fastmail.fm Bruce E. Trumbo bruce.trumbo@csueastbay.edu Eric A. Suess eric.suess@csueastbay.edu
More informationOutliers and Their Effect on Distribution Assessment
Outliers and Their Effect on Distribution Assessment Larry Bartkus September 2016 Topics of Discussion What is an Outlier? (Definition) How can we use outliers Analysis of Outlying Observation The Standards
More informationMath 1 Variable Manipulation Part 8 Working with Data
Name: Math 1 Variable Manipulation Part 8 Working with Data Date: 1 INTERPRETING DATA USING NUMBER LINE PLOTS Data can be represented in various visual forms including dot plots, histograms, and box plots.
More informationMath 1 Variable Manipulation Part 8 Working with Data
Math 1 Variable Manipulation Part 8 Working with Data 1 INTERPRETING DATA USING NUMBER LINE PLOTS Data can be represented in various visual forms including dot plots, histograms, and box plots. Suppose
More informationA is used to answer questions about the quantity of what is being measured. A quantitative variable is comprised of numeric values.
Stats: Modeling the World Chapter 2 Chapter 2: Data What are data? In order to determine the context of data, consider the W s Who What (and in what units) When Where Why How There are two major ways to
More informationChapter 1 Data and Descriptive Statistics
1.1 Introduction Chapter 1 Data and Descriptive Statistics Statistics is the art and science of collecting, summarizing, analyzing and interpreting data. The field of statistics can be broadly divided
More informationSlide 1. Slide 2. Slide 3. Interquartile Range (IQR)
Slide 1 Interquartile Range (IQR) IQR= Upper quarile lower quartile But what are quartiles? Quartiles are points that divide a data set into quarters (4 equal parts) Slide 2 The Lower Quartile (Q 1 ) Is
More informationAttachment 1. Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database
Attachment 1 Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database Prepared by Geosyntec Consultants, Inc. Wright Water
More informationAP Statistics Scope & Sequence
AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that
More informationJMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING
JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with
More informationChapter 3. Displaying and Summarizing Quantitative Data. 1 of 66 05/21/ :00 AM
Chapter 3 Displaying and Summarizing Quantitative Data D. Raffle 5/19/2015 1 of 66 05/21/2015 11:00 AM Intro In this chapter, we will discuss summarizing the distribution of numeric or quantitative variables.
More informationAssignment 1 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 1 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran 1. In inferential statistics, the aim is to: (a) learn the properties of the sample by calculating statistics
More informationISSN (Online)
Comparative Analysis of Outlier Detection Methods [1] Mahvish Fatima, [2] Jitendra Kurmi [1] Pursuing M.Tech at BBAU, Lucknow, [2] Assistant Professor at BBAU, Lucknow Abstract: - This paper presents different
More informationVQA Proficiency Testing Scoring Document for Quantitative HIV-1 RNA
VQA Proficiency Testing Scoring Document for Quantitative HIV-1 RNA The VQA Program utilizes a real-time testing program in which each participating laboratory tests a panel of five coded samples six times
More informationModule - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B. Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institution of Technology, Madras
More informationChapter 2 Part 1B. Measures of Location. September 4, 2008
Chapter 2 Part 1B Measures of Location September 4, 2008 Class will meet in the Auditorium except for Tuesday, October 21 when we meet in 102a. Skill set you should have by the time we complete Chapter
More informationBiostatistics 208 Data Exploration
Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment
More informationSuper-marketing. A Data Investigation. A note to teachers:
Super-marketing A Data Investigation A note to teachers: This is a simple data investigation requiring interpretation of data, completion of stem and leaf plots, generation of box plots and analysis of
More informationGlobally Robust Confidence Intervals for Location
Dhaka Univ. J. Sci. 60(1): 109-113, 2012 (January) Globally Robust Confidence Intervals for Location Department of Statistics, Biostatistics & Informatics, University of Dhaka, Dhaka-1000, Bangladesh Received
More informationComparison of Different Methods of Outlier Detection in Univariate Time Series Data
Comparison of Different Methods of Outlier Detection in Univariate Time Series Data Egbo Mary Nkechinyere E-mail Address: egbomary4@yahoocom Department of Statistics, Federal University of Technology Owerri
More informationUnit 1 Analyzing One-Variable Data
Unit 1 Analyzing One-Variable Data So what is statistics? Statistics is the science and art of,, and from data. Statistical problem-solving process : Clarify the research problem and ask one or more valid
More informationUsing Excel s Analysis ToolPak Add-In
Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,
More informationSTATISTICALLY SIGNIFICANT EXCEEDANCE- UNDERSTANDING FALSE POSITIVE ERROR
2017 World of Coal Ash (WOCA) Conference in Lexington, KY - May 9-11, 2017 http://www.flyash.info/ STATISTICALLY SIGNIFICANT EXCEEDANCE- UNDERSTANDING FALSE POSITIVE ERROR Arun Kammari 1 1 Haley & Aldrich,
More information36.2. Exploring Data. Introduction. Prerequisites. Learning Outcomes
Exploring Data 6. Introduction Techniques for exploring data to enable valid conclusions to be drawn are described in this Section. The diagrammatic methods of stem-and-leaf and box-and-whisker are given
More informationCHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test
CHAPTER 8 T Tests A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test 8.1. One-Sample T Test The One-Sample T Test procedure: Tests
More informationPart 1. DATA PRESENTATION: DESCRIPTIVE DATA ANALYSIS
22S:101 Biostatistics: J. Huang 1 Part 1. DATA PRESENTATION: DESCRIPTIVE DATA ANALYSIS Numerical Data Data Presentation I: Tables Data Presentation II: Graphs 22S:101 Biostatistics: J. Huang 2 1. Types
More informationModule 1: Fundamentals of Data Analysis
Using Statistical Data to Make Decisions Module 1: Fundamentals of Data Analysis Dr. Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business S tatistics are an important
More informationPRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT
PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT CLASS 3: DESCRIPTIVE STATISTICS & RELIABILITY AND VALIDITY FEBRUARY 2, 2015 OBJECTIVES Define basic terminology used in assessment, such as validity,
More informationCoal Combustion Residual Statistical Method Certification for the CCR Landfill at the Boardman Power Plant Boardman, Oregon
Coal Combustion Residual Statistical Method Certification for the CCR Landfill at the Boardman Power Plant Boardman, Oregon Prepared for Portland General Electric October 13, 2017 CH2M HILL Engineers,
More informationStatistics 201 Summary of Tools and Techniques
Statistics 201 Summary of Tools and Techniques This document summarizes the many tools and techniques that you will be exposed to in STAT 201. The details of how to do these procedures is intentionally
More informationA note on detecting statistical outliers in psychophysical data
Detecting statistical outliers page 1 of 7 Initial Draft [not peer-reviewed] A note on detecting statistical outliers in psychophysical data Pete R. Jones 1,2 1 Institute of Ophthalmology, University College
More informationIntroduction to descriptive statistics
Introduction to descriptive statistics Illustrated with XLSTAT Jean Paul Maalouf jpmaalouf@xlstat.com linkedin.com/in/jean-paul-maalouf www.xlstat.com Oct. 12, 2016 1 PLAN XLSTAT: who are we? Statistics:
More informationSPSS 14: quick guide
SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of
More informationChapter 1. * Data = Organized collection of info. (numerical/symbolic) together w/ context.
Chapter 1 Objectives (1) To understand the concept of data in statistics, (2) Learn to recognize its context & components, (3) Recognize the 2 basic variable types. Concept briefs: * Data = Organized collection
More informationElementary Statistics Lecture 2 Exploring Data with Graphical and Numerical Summaries
Elementary Statistics Lecture 2 Exploring Data with Graphical and Numerical Summaries Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT
More informationMeasurement and sampling
Name: Instructions: (1) Answer questions in your blue book. Number each response. (2) Write your name on the cover of your blue book (and only on the cover). (3) You are allowed to use your calculator
More informationInfluence of biological variability, assay signal, and outlier criteria on Immunogenicity cut points and clinical relevance
Influence of biological variability, assay signal, and outlier criteria on Immunogenicity cut points and clinical relevance V. Devanarayan, Ph.D., FAAPS Charles River Laboratories European Bioanalytical
More informationComputing Descriptive Statistics Argosy University
2014 Argosy University 2 Computing Descriptive Statistics: Ever Wonder What Secrets They Hold? The Mean, Mode, Median, Variability, and Standard Deviation Introduction Before gaining an appreciation for
More informationAP Statistics Part 1 Review Test 2
Count Name AP Statistics Part 1 Review Test 2 1. You have a set of data that you suspect came from a normal distribution. In order to assess normality, you construct a normal probability plot. Which of
More informationFundamental Elements of Statistics
Fundamental Elements of Statistics Slide Statistics the science of data Collection Evaluation (classification, summary, organization and analysis) Interpretation Slide Population Sample Sample: A subset
More informationSection 9: Presenting and describing quantitative data
Section 9: Presenting and describing quantitative data Australian Catholic University 2014 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced or used in any form
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating
More informationLecture 10. Outline. 1-1 Introduction. 1-1 Introduction. 1-1 Introduction. Introduction to Statistics
Outline Lecture 10 Introduction to 1-1 Introduction 1-2 Descriptive and Inferential 1-3 Variables and Types of Data 1-4 Sampling Techniques 1- Observational and Experimental Studies 1-6 Computers and Calculators
More informationProject 2 - β-endorphin Levels as a Response to Stress: Statistical Power
Score: Name: Due Wednesday, April 10th in class. β-endorphins are neurotransmitters whose activity has been linked to the reduction of pain in the body. Elite runners often report a runners high during
More informationDr. Allen Back. Aug. 26, 2016
Dr. Allen Back Aug. 26, 2016 AP Stats vs. 1710 Some different emphases. AP Stats vs. 1710 Some different emphases. But generally comparable. AP Stats vs. 1710 Some different emphases. But generally comparable.
More informationBar graph or Histogram? (Both allow you to compare groups.)
Bar graph or Histogram? (Both allow you to compare groups.) We want to compare total revenues of five different companies. Key question: What is the revenue for each company? Bar graph We want to compare
More informationFraud Detection in Clinical Trials: A Graphical Tool
Fraud Detection in Clinical Trials: A Graphical Tool Data Visualization in Clinical Research Author: Giulia Zardi Milan, May 29 th 2015 Introduction A clinical trial database can never be completely free
More informationBiostat Exam 10/7/03 Coverage: StatPrimer 1 4
Biostat Exam 10/7/03 Coverage: StatPrimer 1 4 Part A (Closed Book) INSTRUCTIONS Write your name in the usual location (back of last page, near the staple), and nowhere else. Turn in your Lab Workbook at
More informationVIII. STATISTICS. Part I
VIII. STATISTICS Part I IN THIS CHAPTER: An introduction to descriptive statistics Measures of central tendency: mean, median, and mode Measures of spread, dispersion, and variability: range, variance,
More informationReview Materials for Test 1 (4/26/04) (answers will be posted 4/20/04)
Review Materials for Test 1 (4/26/04) (answers will be posted 4/20/04) Prof. Lew Extra Office Hours: Friday 4/23/04 10am-10:50am; Saturday 12:30pm-2:00pm. E- mail will be answered if you can send it before
More informationSTAT/MATH Chapter3. Statistical Methods in Practice. Averages and Variation 1/27/2017. Measures of Central Tendency: Mode, Median, and Mean
STAT/MATH 3379 Statistical Methods in Practice Dr. Ananda Manage Associate Professor of Statistics Department of Mathematics & Statistics SHSU 1 Chapter3 Averages and Variation Copyright Cengage Learning.
More informationLearning Area: Mathematics Year Course and Assessment Outline. Year 11 Essentials Mathematics COURSE OUTLINE
Learning Area: Mathematics Year 209 Course and Assessment Outline Year Essentials Mathematics COURSE OUTLINE SEM/ TERM WEEKS LEARNING CONTENT- Unit ASSESSMENTS Topic. Basic calculations, percentages and
More informationData Visualization. Prof.Sushila Aghav-Palwe
Data Visualization By Prof.Sushila Aghav-Palwe Importance of Graphs in BI Business intelligence or BI is a technology-driven process that aims at collecting data and analyze it to extract actionable insights
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure
More informationBasic Statistics, Sampling Error, and Confidence Intervals
02-Warner-45165.qxd 8/13/2007 5:00 PM Page 41 CHAPTER 2 Introduction to SPSS Basic Statistics, Sampling Error, and Confidence Intervals 2.1 Introduction We will begin by examining the distribution of scores
More informationStudents will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of
Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of numbers. Also, students will understand why some measures
More informationImproved overlay control using robust outlier removal methods
Improved overlay control using robust outlier removal methods John C. Robinson 1, Osamu Fujita 2, Hiroyuki Kurita 2, Pavel Izikson 3, Dana Klein 3, and Inna Tarshish-Shapir 3 1 KLA-Tencor Corporation,
More informationDistinguish between different types of numerical data and different data collection processes.
Level: Diploma in Business Learning Outcomes 1.1 1.3 Distinguish between different types of numerical data and different data collection processes. Introduce the course by defining statistics and explaining
More information+? Mean +? No change -? Mean -? No Change. *? Mean *? Std *? Transformations & Data Cleaning. Transformations
Transformations Transformations & Data Cleaning Linear & non-linear transformations 2-kinds of Z-scores Identifying Outliers & Influential Cases Univariate Outlier Analyses -- trimming vs. Winsorizing
More informationDescriptive Statistics Tutorial
Descriptive Statistics Tutorial Measures of central tendency Mean, Median, and Mode Statistics is an important aspect of most fields of science and toxicology is certainly no exception. The rationale behind
More informationCEE3710: Uncertainty Analysis in Engineering
CEE3710: Uncertainty Analysis in Engineering Lecture 1 September 6, 2017 Why do we need Probability and Statistics?? What is Uncertainty Analysis?? Ex. Consider the average (mean) height of females by
More informationISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison
ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison ema training workshop August 8-9, 2016 Mexico City Class Schedule Monday, 8 August Types of PT of interest
More informationPreprocessing Methods for Two-Color Microarray Data
Preprocessing Methods for Two-Color Microarray Data 1/15/2011 Copyright 2011 Dan Nettleton Preprocessing Steps Background correction Transformation Normalization Summarization 1 2 What is background correction?
More informationThe SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa
The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found
More informationANALYSING QUANTITATIVE DATA
9 ANALYSING QUANTITATIVE DATA Although, of course, there are other software packages that can be used for quantitative data analysis, including Microsoft Excel, SPSS is perhaps the one most commonly subscribed
More informationSample Exam 1 Math 263 (sect 9) Prof. Kennedy
Sample Exam 1 Math 263 (sect 9) Prof. Kennedy 1. In a statistics class with 136 students, the professor records how much money each student has in their possession during the first class of the semester.
More informationApplying Regression Techniques For Predictive Analytics Paviya George Chemparathy
Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS
More informationChapter 3C Assignment Sheet
Stat Tech Chapter 3C Assignment Sheet Percentiles Day 1 1. The reaction time to a stimulus for a certain test has a mean of 2.5 seconds and a standard deviation of 0.3 seconds. Find the corresponding z-score
More informationIntroduction to Statistics. Measures of Central Tendency
Introduction to Statistics Measures of Central Tendency Two Types of Statistics Descriptive statistics of a POPULATION Relevant notation (Greek): µ mean N population size sum Inferential statistics of
More informationSession 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations
ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka,
More informationStatistics, Data Analysis, and Decision Modeling
- ' 'li* Statistics, Data Analysis, and Decision Modeling T H I R D E D I T I O N James R. Evans University of Cincinnati PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Preface xv
More information+? Mean +? No change -? Mean -? No Change. *? Mean *? Std *? Transformations & Data Cleaning. Transformations
Transformations Transformations & Data Cleaning Linear & non-linear transformations 2-kinds of Z-scores Identifying Outliers & Influential Cases Univariate Outlier Analyses -- trimming vs. Winsorizing
More informationExploratory Data Analysis
Exploratory Data Analysis Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) 580342 Exploratory Data Analysis Exploring data can help to determine whether the statistical
More informationClovis Community College Class Assessment
Class: Math 110 College Algebra NMCCN: MATH 1113 Faculty: Hadea Hummeid 1. Students will graph functions: a. Sketch graphs of linear, higherhigher order polynomial, rational, absolute value, exponential,
More information5 CHAPTER: DATA COLLECTION AND ANALYSIS
5 CHAPTER: DATA COLLECTION AND ANALYSIS 5.1 INTRODUCTION This chapter will have a discussion on the data collection for this study and detail analysis of the collected data from the sample out of target
More informationAdvanced Higher Statistics
Advanced Higher Statistics 2018-19 Advanced Higher Statistics - 3 Unit Assessments - Prelim - Investigation - Final Exam (3 Hours) 1 Advanced Higher Statistics Handouts - Data Booklet - Course Outlines
More informationAn Introduction to Descriptive Statistics (Will Begin Momentarily) Jim Higgins, Ed.D.
An Introduction to Descriptive Statistics (Will Begin Momentarily) Jim Higgins, Ed.D. www.bcginstitute.org Visit BCGi Online While you are waiting for the webinar to begin, Don t forget to check out our
More informationInvestigating Common-Item Screening Procedures in Developing a Vertical Scale
Investigating Common-Item Screening Procedures in Developing a Vertical Scale Annual Meeting of the National Council of Educational Measurement New Orleans, LA Marc Johnson Qing Yi April 011 COMMON-ITEM
More informationOrdered Array (nib) Frequency Distribution. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods
Chapter Descriptive Statistics: Tabular and Graphical Methods Ordered Array (nib) Organizes a data set by sorting it in either ascending or descending order Advantages & Disadvantages Useful in preparing
More informationChapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition
Chapter 4: Foundations for inference OpenIntro Statistics, 2nd Edition Variability in estimates 1 Variability in estimates Application exercise Sampling distributions - via CLT 2 Confidence intervals 3
More informationANOVA The Effect of Outliers
ANOVA The Effect of Outliers By Markus Halldestam Bachelor s thesis Department of Statistics Uppsala University Supervisor: Inger Persson 2016 Abstract This bachelor s thesis focuses on the effect of outliers
More informationAP Statistics Test #1 (Chapter 1)
AP Statistics Test #1 (Chapter 1) Name Part I - Multiple Choice (Questions 1-20) - Circle the answer of your choice. 1. You measure the age, marital status and earned income of an SRS of 1463 women. The
More informationContinuous Improvement Toolkit. Graphical Analysis. Continuous Improvement Toolkit.
Continuous Improvement Toolkit Graphical Analysis The Continuous Improvement Map Managing Risk FMEA Understanding Performance Check Sheets Data Collection PDPC RAID Log* Risk Assessment* Fault Tree Analysis
More informationDevelopment of the Project Definition Rating Index (PDRI) for Small Industrial Projects. Wesley A. Collins
Development of the Project Definition Rating Index (PDRI) for Small Industrial Projects by Wesley A. Collins A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of
More informationTopic 1: Descriptive Statistics
Topic 1: Descriptive Statistics Econ 245_Topic 1 page1 Reference: N.C &T.: Chapter 1 Objectives: Basic Statistical Definitions Methods of Displaying Data Definitions: S : a numerical piece of information
More informationBasic applied techniques
white paper Basic applied techniques Choose the right stat to make better decisions white paper Basic applied techniques 2 The information age has changed the way many of us do our jobs. In the 1980s,
More informationSetting the Bar and Establishing In-Study Cut Points for Immunogenicity Testing. Ron Bowsher, Ph.D. 16-May-2016
Setting the Bar and Establishing In-Study Cut Points for Immunogenicity Testing Ron Bowsher, Ph.D. 16-May-2016 B2S Consulting Team: o Rocco Brunelle, M.S. (Statistics) o Kim Krug, M.S. (Statistics) o Paula
More informationFUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT. Fourth Edition. AMITAVA MITRA Auburn University College of Business Auburn, Alabama.
FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT Fourth Edition AMITAVA MITRA Auburn University College of Business Auburn, Alabama WlLEY CONTENTS PREFACE ABOUT THE COMPANION WEBSITE PART I PHILOSOPHY AND
More informationBusiness Quantitative Analysis [QU1] Examination Blueprint
Business Quantitative Analysis [QU1] Examination Blueprint 2014-2015 Purpose The Business Quantitative Analysis [QU1] examination has been constructed using an examination blueprint. The blueprint, also
More informationGlossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/
Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a
More informationMAS187/AEF258. University of Newcastle upon Tyne
MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................
More information10.2 Correlation. Plotting paired data points leads to a scatterplot. Each data pair becomes one dot in the scatterplot.
10.2 Correlation Note: You will be tested only on material covered in these class notes. You may use your textbook as supplemental reading. At the end of this document you will find practice problems similar
More informationSIDDHARTH INSTITUTE OF ENGINEERING & TECHNOLOGY (AUTONOMOUS) :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)
S SIDDHARTH INSTITUTE OF ENGINEERING & TECHNOLOGY (AUTONOMOUS) :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : Course & Branch: MBA IYear I-Sem Regulation:
More informationUnderstanding Inference: Confidence Intervals II. Questions about the Assignment. Summary (From Last Class) The Problem
Questions about the Assignment Part I The z-score is not the same as the percentile (eg, a z-score of 98 does not equal the 98 th percentile) The z-score is the number of standard deviations the value
More informationGroundwater Monitoring Statistical Methods Certification
Groundwater Monitoring Statistical Methods Certification WEC Temporary Ash Disposal Area Whelan Energy Center Public Power Generation Agency/ Hastings Utilities February 9, 2018 This page intentionally
More informationBusiness Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization 1) One of SiriusXM's challenges was tracking potential customers
More informationStatistics in Market Research
Introducing Statistics in Market Research Second Edition Prepared by Leo Cremonezi Statistical Scientist January 2018 1 Introduction 3 1 Descriptive Statistics 4 2 Sampling 10 3 Tests of Significance 18
More information