An ordered array is an arrangement of data in either ascending or descending order.

Size: px
Start display at page:

Download "An ordered array is an arrangement of data in either ascending or descending order."

Transcription

1

2 2.1 Ordered Array An ordered array is an arrangement of data in either ascending or descending order. Example 1 People across Hong Kong participate in various walks to raise funds for charity. Recently, national and local businesses sponsored a 5K Race for the Community in Tseung Kwan O. Participants could either run or walk the race. The time to complete the race appeared on a large display a each participant enthusiastically crossed the finish line. The times (rounded to the nearest minute) for a random sample of 5 participants who walked this race are: 45, 53, 45, 50, 48 Solution: The ordered array for these times to walk the 5K Race are: Stem-and-leaf Display 1. List all scores or values in an ordered array. 2. Split each score or value into two sets of digits. The first or leading set of digits is the stem and the second set of digits is the leaf. 3. For each score in the mass of data, write down the leaf numbers on the line labeled by the appropriate stem number. Example 2 Obtain a stem-and-leaf display for the following data. Number of Questions Answered Correctly on an Aptitude Test

3 Solution: We write the last digits as the leaf and the other digits as the stem. Stem 6 8 Leaf. Stem 10 4 Leaf Number of Questions Answered Correctly on an Aptitude Test Stem Leaf (Last digit) Advantages: 1. The stem-and-leaf display is easier to construct. 2. Within an interval, the stem-and-leaf provides more information than the histogram, since the stem-and-leaf shows the actual data values. 2.3 Dot diagram Another way to graphically represent a frequency distribution is by means of a dot diagram. We construct a dot diagram as follows. We use a horizontal axis to represent the data values. Above 3

4 each distinct data value on the horizontal axis we place dots, with the number of dots being equal to the frequency of the data value. Example 3 Construct a dot diagram for the data set {1, 1, 3, 3, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 11} Solution: Step 1. Draw a horizontal line and labelled with data values. Step 2. Place a dot above the corresponding data values. The dot diagram for the given data are constructed as follow: 2.4 Organizing and Graphing Quantitative Data Frequency distribution for quantitative data Frequency Distribution A frequency distribution for quantitative data lists all the classes and the number of values that belong to each classes. Data presented in the form of a frequency distribution are called grouped data. Grouping Data When we are faced with finding the mean for a very large number of observed values, we can materially decrease the labor involved by grouping the data into a frequency distribution and then finding the statistics for the grouped data. However, we have lost information, and the statistics obtained from the grouped data will therefore only approximate those of the ungrouped data. If the number of observations is large and the class intervals are small, the approximation will be very good. Constructing Frequency Distribution Tables 4

5 The three steps necessary to define the classes for a frequency distribution with quantitative data are as follows: (i) Determine the number of nonoverlapping classes. Recommend 5 to 15 classes. Large data sets require larger number of classes. Small data sets require smaller number of classes. (ii) Determine the width of each class. Approximate Class Width = Largest Data Value Smallest Data Value Number of Classes (iii) Determine the class limits and class boundary. Lower class limit The lower class limit identifies the smallest possible data value assigned to the class. Upper class limit The upper class limit identifies the largest possible data value assigned to the class. Class Boundary The class boundary is given by the mid-point of the upper limit of class and the lower limit of the next class. Class Midpoint or Class Mark Class midpoint or Class mark = lower limit + upper limit Relative Frequency and Percentage Distributions Relative Frequency Distribution Relative Frequency of a class = Frequency of that class Sum of all frequencies = f f 5

6 Percentage Distribution Percentage = (Relative frequency) 100 Example 4 The following data is the time required to complete year-end audits for a sample of 20 clients of a small public accounting firm. Construct a frequency distribution, relative frequency distribution and percentage. Year-End Audit Times (in Days) Solution: The frequency distribution, relative frequency distribution and percentage are shown as follow: Frequency and Relative Frequency Distribution for the Audit-Time Data Audit Time Class Class Relative (days) boundaries mark Frequency Frequency Percentage Total

7 2.4.3 Graphing Grouped Data Histogram A histogram is usually used to present frequency distributions graphically. This is constructed by drawing rectangles over each class. The area of each rectangle should be proportional to its frequency. In a histogram, the rectangles are drawn adjacent to each other. Constructing a Histogram 1. Examine the data to determine the smallest and the largest measurements. 2. Divide the interval between the smallest and the largest measurements into between 5 and 20 equal subintervals called classes. These classes should satisfy the following requirement: Each measurement falls into one and only one subinterval. Note that this requirement implies that no measurement falls on a boundary of a subinterval. 3. Compute the frequency or relative frequency of measurements falling within each subinterval. 4. Using a vertical axis of about three-fourths the length of the horizontal axis, plot each frequency or relative frequency as a rectangle over the corresponding subinterval. Figure 1: Histogram Polygon A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. Cumulative frequency distribution A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. 7

8 Figure 2: Histogram & Polygon Cumulative Relative Frequency Cumulative Relative Frequency = Cumulative Frequency Total observations in the data set Cumulative Percentage Cumulative Percentage = (Cumulative relative frequency) 100 Example 5 Construct a cumulative frequency distribution, cumulative relative frequency distribution and cumulative percentage for the data given in Example 4. Cumulative Frequency Distribution for the Audit-Time Data Cumulative Cumulative Cumulative Audit Time (days) Frequency Relative Frequency Percentage < < < < < <

9 Ogive An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. 2.5 Scatter diagram The scatter diagram provides an overview of the data and enables us to draw preliminary conclusions about a possible relationship between the variables. Values for the independent variable are shown on the horizontal axis and the corresponding values for the dependent variable are shown on the vertical axis. Example 6 A sociologist was hired by a large city hospital to investigate the relationship between the number of unauthorized days that an employee is absent per year and the distance (miles) between home and work for the employees. A sample of 10 employees was chosen, and the following data were collected. Distance to Work (miles) Number of Days Absent Develop a scatter diagram for these data. Does a linear relationship appear reasonable? Solution: 9

10 2.6 Organizing and Graphing Qualitative Data Frequency Distributions A frequency distribution for Qualitative Data A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories Relative Frequency and Percentage Distributions Relative Frequency of a Category Relative Frequency of a Category = Frequency of that category sum of all frequenices Percentage Percentage = (Relative frequency) 100 Example 7 The following data obtained from a sample of 50 New Car Purchases. Find the frequency distribution, relative frequency distribution and percentage for the given data. 10

11 Honda Accord Ford Escort Toyota Echo Ford Escort Toyota Echo Chevrolet Cavalier Honda Accord Chevrolet Cavalier Honda Accord Ford Escort Toyota Echo Chevrolet Cavalier Honda Accord Hyundai Excel Hyundai Excel Ford Escort Ford Escort Hyundai Excel Chevrolet Cavalier Ford Escort Toyota Echo Chevrolet Cavalier Honda Accord Hyundai Excel Honda Accord Ford Escort Honda Accord Ford Escort Toyota Echo Chevrolet Cavalier Hyundai Excel Hyundai Excel Honda Accord Ford Escort Toyota Echo Ford Escort Honda Accord Ford Escort Ford Escort Hyundai Excel Ford Escort Honda Accord Chevrolet Cavalier Honda Accord Chevrolet Cavalier Chevrolet Cavalier Toyota Echo Ford Escort Hyundai Excel Toyota Echo Solution: The frequency distribution, relative frequency distribution and percentage are shown as follows. Relative Automobile Purchased Frequency Frequency Percentage Chevrolet Cavalier Ford Escort Toyota Echo Honda Accord Hyundai Excel Total: Graphical Presentation of Qualitative Data Bar Graphs A graph made of bars whose heights represent the frequencies of respective categories is called a bar graph. Constructing a Bar Graph 1. Label frequencies along one axis and categories of the variable along the other. 2. Construct a rectangular at each category of the variable with a height equal to the frequency (number of observations) in the category. 11

12 3. Leave a space between each category to connote distinct, separate categories and to clarify the presentation. Figure 3: Bar Chart Pie Chart A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories is called a pie chart. Constructing a Pie Chart 1. Choose a small number of categories for the variable, preferable about five or six. Too many categories make the pie chart different to interpret. 2. Whenever possible, construct the pie chart so that percentages are in either ascending or decreasing order. 2.7 Pareto Diagram A Pareto diagram is a bar chart for a categorical variable that results from a quality improvement investigation. Each category of the variable represents a factor that results in a nonconformity or a problem for a product or service. The lengths of the bars on the chart are equal to the frequencies (or relative frequencies or percentages) for categories. Categories are ordered according to their frequency counts, and it often turns out that the majority of nonconformities or difficults can be 12

13 Figure 4: Pie Chart traced to a few factors. Quality improvement efforts can then be directed to the more important factors. The diagram is extended at times to a combination bar and line chart where line shows the cumulative frequency (cumulative relative frequency or cumulative percentage) over the ordered categories. Example 8 The city manager of a City is concerned with water usage, particularly in single family homes. She would like to develop a plan to reduce the water usage in the city. To investigate, she selects a sample of 100 homes and determines the typical daily water usage for various purposes. These sample results are as follows. Reasons for Reasons for Water Usage Gallons per Day Water Usage Gallons per Day Laundering 24.9 Swimming pool 28.3 Watering lawn Dishwashing 12.3 Personal bathing Car washing 10.4 Cooking 5.1 Drinking 7.9 What is the area of greatest usage? Where should she concentrate her efforts to reduce the water usage? Solution: A Pareto chart is useful for identifying the major areas of water usage and focusing on those areas where the greatest reduction can be achieved. The first step is to convert each of the 13

14 activities to a percent and then to order them from largest to smallest. The total water usage per day is gallons, found by totaling the gallons used in the eight activities. The activity with the largest use is watering lawns. It accounts for gallons of water per day, or 42.4 percent of the amount of water used. The next largest category is personal bathing, which accounts for 31.4 percent of the water used. These two activities account for 73.8 percent of the water usage. Reasons for Water Usage Gallons per Day Percent Cumulative Percent Watering lawn Personal bathing Swimming pool usage Laundering Dishwashing Car washing Drinking Cooking Total To draw the Pareto chart, we begin by scaling the number of gallons used on the left vertical axis and the corresponding percent on the right vertical axis. Next we draw a vertical bar with the height of the bar corresponding to the activity with the largest number of occurrences. In this example, we draw a vertical bar for the activity watering lawns to a height of gallons. (We call this the count.) We continue this procedure for the other activities, as shown in the following figure. The cumulative percents are plotted above the vertical bars. In this example, the activities of watering lawn, personal bathing, and pools account for 82.1% of water usage. The city manager can attain the greatest gain by looking to reduce the water usage in these three areas. 2.8 Summarizing Two Qualitative Variables In Section 2.5 you learned that the first step in organizing and summarizing data for a single variable is to create a frequency table. The frequency table lists the number of occurrences of each value of the variable expressed as a count, fraction, decimal, or percentage. When data are collected on two related variables, they are organized by means of a crosstabulation or contingency table. A contingency table is a table with rows that represent the possible values of one variable and columns that represent the possible values for a second variable. The entries in the table are the 14

15 numbers of times that each pair of values occurs. Creating a contingency table for a set of bivariate data is similar to creating a frequency table for a single variable. The basic layout of a contingency table is shown in the following table. Variable 2 Category 1 Category 2 Category n Total Category 1 f 11 f 12 f 1n Variable 1 Category 2 f 21 f 22 f 2n Category m f m1 f m2 f mn Total Each category for one of the variables is represented by a column, whereas each category for the other variable is represented by a row. The numbers in the table, the values of f ij, represent the number of observations in the data set that are in category i of variable 1 and category j of variable 2. By using a contingency table, we can look at the number, proportion, or percentage of the data that fall into each pair of values for variable 1 and variable 2. Adding a second variable to an analysis gives us a different, and sometimes valuable, perspective on the data we have collected. In order to explore any possible pattern or relationship between variable 1 and variable 2, it is useful 15

16 to first covert these results into percentages based on the following three totals: 1. The overall total 2. The row totals 3. The column totals Example 9 The committee at a community college knows that other variables should be factors in determining faculty salaries, they decide to look at the gender of faculty members in addition to the division in which they teach. The college has three divisions: Business (BUS), Communication & Social Sciences (CSS), and Sciences & Technology (S&T). They create a contingency table for the data: Division Gender BUS CSS S&T Total Female Male Total Making comparisons with the actual frequencies is difficult, so the committee decides to use the relative frequency for each cell. The relative frequency table is shown here: Contingency table displaying gender and division (percentages based on overall totals) Division (%) Gender BUS CSS S&T Total (%) Female Male Total (%) The committee sees that the distribution of faculty over the divisions is not uniform. The Division of BUS and S&T have 81% of the faculty, whereas CSS has only 19%. They also notice that the distribution of gender differs in each of the schools. 16

17 Contingency table displaying gender and division (percentages based on row totals) Division (%) Gender BUS CSS S&T Total (%) Female Male Total (%) From the row totals table, they found that the distributions of female and male over three divisions are similar. Contingency table displaying gender and division (percentages based on column totals) Division (%) Gender BUS CSS S&T Total (%) Female Male Total (%) Although the percentage of females is always lower, in the CSS and S&T the percentages of males and females are almost the same, whereas in the BUS, it is not. 17