Graphs BAR CHARTS. Display frequency distributions for nominal or ordinal data. Ej. Injury deaths of 100 children, ages 5-9, USA, 1980-85. HISTOGRAMS. Display frequency distributions for continuous or discrete data. Histogram of birthweights from 100 consecutive deliveries at a Boston hospital. Histogram of birthweights Number of injury deaths 0 10 20 30 40 Motor Drowning Fire Homicide Other Cause Frequency 0 5 10 15 20 25 30 35 50 100 150 Weights 1
Histograms To construct a frequency histogram draw two axes: a horizontal axis labeled with the class intervals and a vertical axis labeled with the frequencies. Construct a rectangle over each class interval with a height equal to the number of measurements falling in a given subinterval. In a relative frequency histogram the vertical axis is labeled as relative frequency and the rectangle is constructed over each class interval with a height equal to the class relative frequency. The two histograms will have the same shape. 2
Histograms A histogram with one major peak is called unimodal. Unimodal Frequency 0 5 10 15 20 3 2 1 0 1 2 3 3
Histograms A histogram with two major peaks is called bimodal. Bimodal Frequency 0 5 10 15 4 2 0 2 4 6 8 10 4
Histograms A histogram with roughly the same number of observations per interval is called a uniform histogram. Uniform Frequency 0 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0 5
Histograms When the right side of the histogram, with the larger half of the observations, extends a greater distance than the left side, the histogram is referred to as skewed to the right. Right skewed Frequency 0 10 20 30 40 50 0 1 2 3 4 5 6 6
Histograms When the left side of the histogram extends a greater distance than the right side, the histogram is referred to as skewed to the left. Left skewed Frequency 0 5 10 15 20 25 30 4 6 8 10 12 7
Frequency Polygons FREQUENCY POLYGONS. Use same axes as histograms. Useful when comparing two data sets. Cumulative frequency polygons display cumulative relative frequencies and are used to obtain percentiles of the data. Frequency 0 5 10 15 20 25 30 35 50 100 150 Birthweights 8
Example 2.1 Example 2.1 (P & G). Assume we want to compare the serum cholesterol levels for two age groups. Relative frequency 0 10 20 30 40 Ages 25 34 Ages 55 64 Cumulative frequency 0 20 40 60 80 100 Ages 25 34 Ages 55 64 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 Serum cholesterol level Serum cholesterol level 9
Example 2.1 (cont.) Note that the cumulative frequency polygon for 55-64-year-old males lies to the right of the polygon for 25-34-year-old males for each value of serum cholesterol lever the distribution for older men is stochastically larger than the distribution for younger men. Ex.: the 60th percentile of the serum cholesterol levels for the group of 25-34-year-olds is approx 175 mg/100 ml while the 60th percentile for the 55-64-year-olds is about 220 mg/100 ml. 10
Percentiles Percentiles are useful for describing the shape of a distribution. For example, if the 40th and 60th percentiles lie an equal distance away from the midpoint and the same is true for the 30th and 70th percentiles, the 20th and 80th and so on, the data are symmetric. If there are a number of outlying observations on one side of the midpoint only the data are skewed. If the observations are smaller than the rest of the values the data are skewed to the left. If the observations are larger than the rest the data are skewed to the right. 11
Box Plots BOX PLOTS. Display a summary of the data as follows, the central box extends from the 25th percentile to the 75th percentile (these are the quartiles of the data), a line is drawn at the 50th percentile, lines projecting out of the box extend to adjacent values, i.e. the most extreme observations in the data that are not more than 1.5 the height of the box beyond either quartile, outliers (points outside the above range) are represented as circles. 12
Box Plots C B A A: 25th percentile. B: 50th percentile. C: 75th percentile. Adjacent values: smallest and largest observations x 1 and x 2 such that x 1 A 1.5 (C A) and x 2 C + 1.5 (C A). 13
Example 1.6 (cont.) Example 1.6 (cont.) Boxplot of birthweights in a Boston Hospital 40 60 80 100 120 140 160 Boxplot of birthweights 14
Example 2.3 Example 2.3 Goal: assess the potency of various constituents of orchard sprays in repelling honeybees. Individual cells of dry comb were filled with measured amounts of lime sulphur emulsion in sucrose solutions. Eight concentrations were used as treatments. The responses were obtained by releasing 100 bees into the chamber for 2 hours and measuring the decrease in volume of the solutions. 2 5 10 20 50 100 A B C D E F G H 15
Two-Way Scatter Plots TWO-WAY SCATTER PLOTS. Used to find relationships between two variables. Example 2.4. Speed of cars vs. distances taken to stop. dist 0 20 40 60 80 100 120 5 10 15 20 25 speed 16
Other graphs LINE GRAPHS, TIME SERIES PLOTS. Similar to the previous graphs but usually the points are connected by straight lines and the scales along the horizontal axis represents time. pairs of jeans (in 1000 s) 2000 2500 3000 1980 1981 1982 1983 1984 1985 1986 Year 17
Example 2.5 Example 2.5: (Rosner, p 39, ex 1) Infectious Disease. The data are a sample from a larger data set collected on persons discharged from a selected Pennsylvania hospital as part of a retrospective chart review of antibiotic usage in hospitals. It is of clinical interest to know if the duration of hospitalization is affected by whether or not a patient has received antibiotics. ID Duration Age Sex (1=M,2=F) Antibiotics (1=Y, 2=N) 1 5 30 2 2 2 10 73 2 2 3 6 40 2 2 4 11 47 2 2 5 5 25 2 2 6 14 82 1 1 7 30 60 1 1 8 11 56 2 2 9 17 43 2 2 18
Example 2.5 (cont.) ID Duration Age Sex (1=M,2=F) Antibiotics (1=Y, 2=N) 10 3 50 1 2 11 9 59 2 2 12 3 4 1 2 13 8 22 2 1 14 8 33 2 1 15 5 20 2 2 16 5 32 1 2 17 7 36 1 1 18 4 69 1 2 19 3 47 1 1 20 7 22 1 2 21 9 11 1 2 22 11 19 1 1 23 11 67 2 2 24 9 43 2 2 25 4 41 2 2 19
Example 2.6 (cont.) Q.What types of variables do we have? 20
Example 2.6 (cont.) Q.What types of variables do we have? The duration of hospitalization is a discrete variable; age is a discrete variable, sex is nominal (binary) and antibiotics is also binary. 20
Example 2.6 (cont.) Q.What types of variables do we have? The duration of hospitalization is a discrete variable; age is a discrete variable, sex is nominal (binary) and antibiotics is also binary. Q.Using numeric methods describe the duration of hospitalization for the 25 patients. Q.Is the duration of hospitalization affected by whether or not a patient has received antibiotics? 20
Example 2.5 (cont.) We can summarize the duration of hospitalization as follows Min. 1st Qu. Median Mean 3rd Qu. 3.0 5.0 8.0 8.6 11.0 Max. Range Int Qu. Range Variance SD 30.0 27.0 6.0 32.67 5.72 21
Histogram of Duration 0 5 10 15 20 25 30 Duration of hospitalization Example 2.5 (cont.) 22 Frequency 0 2 4 6 8 Duration of hospitalization 5 10 15 20 25 30
Example 2.5 (cont.) Duration of hospitalization for patients who received antibiotics Min. 1st Qu. Median Mean 3rd Qu. Max. 3.00 7.50 8.00 11.57 12.50 30.00 Range Int Qu. Range Variance SD 27.00 5.00 77.62 8.81 Duration of hospitalization for patients who did not received antibiotics Min. 1st Qu. Median Mean 3rd Qu. Max. 3.00 5.00 6.50 7.44 9.75 17.00 Range Int Qu. Range Variance SD 14.00 4.75 13.67 3.70 23
Example 2.5 (cont.) Antibiotics No Antibiotics 24 Duration of Hospitalization 5 10 15 20 25 30