HW1 Solution. #1.6) (a) Plot the marginal dot diagrams for all the variables.

Size: px
Start display at page:

Download "HW1 Solution. #1.6) (a) Plot the marginal dot diagrams for all the variables."

Transcription

1 HW1 Solution #1.6) (a) Plot the marginal dot diagrams for all the variables wind solar radiation CO NO NO O HS Figure 1: Dot plots of the variables in the air pollution dataset. (b) Construct the x, S n, and R arrays, and interpret the entries in R. 1

2 Wind Solar CO NO NO2 O3 HC x Table 1: The sample means of variables. Wind Solar CO NO NO2 O3 HC Wind Solar CO NO NO O HC Table 2: The sample variance-covariance matrix of the variables. Wind Solar CO NO NO2 O3 HC Wind Solar CO NO NO O HC Table 3: The sample correlation matrix of the variables. The majority of the variables have only weak linear associations, with correlations close to zero. The pollutants are mostly positively correlated with each other. Wind is negative correlated with pollutants, while solar radiation is positively correlated with pollutants. 2

3 Windy and Sunny Windy and Not Sunny Not Windy and Sunny Not Windy and Not Sunny Figure 2: Star plots of the air pollution variables. To investigate if there is an effect on air pollution with wind and the sun, we can divide the wind and solar radiation variable in half by the median and then make the star plots in Figure 2. From the stars, we can see that solar radiation have some effects on the air pollution. When it is sunny, most pollutants are on a relative low level. And within each group, the patterns are quite different. So there are a fair amount of variation in each of the four groups, as there were days with very little pollution in each group, as well as days with quite a bit of air pollution. 3

4 #1.17) In this dataset the first three are measured in seconds, while the last four are measured in minutes. x Table 4: The sample means for the track record. 100m m m m m m Marathon Table 5: The sample variance covariance matrix for the track record variables. 100m m m m m m Marathon Table 6: The sample correlation matrix for the track record variables. All the seven variables are strongly positively correlated. And the correlations tend to be larger when distances are close to each other.for example he correlation between 100m and 200m is 0.95, while the correlation between 100m and marathon is This makes sense, since runners are good at races of similar length. 4

5 #1.18) Table 7: The sample mean track records measured in meters/second. 100m m m m m m Marathon Table 8: The sample variance covariance matrix of track records measured in meters/second. 100m m m m m m Marathon Table 9: The sample correlation matrix of track records measured in meters/second. The results are similar with those I obtained in Exercise All have strong, positive linear relationships with each other, and races that are closer together in distance have a stronger relationship. The differences in correlation are slightly less maybe because all the races were measured in the same units now. Compute the sample variance-covariance matrix (call it S). Obtain the spectral decomposition (also called the eigenvalue decomposition: use eigen if you are using R) of the variance covariance matrix. Next post-multiply the observation matrix (call it X) with P. Plot the pairwise scatter plots of the first three columns. 5

6 var var var 3 Figure 3: pairwise scatter plots of the first three columns of YX*P. From the pairwise scatter plots, we could not find any obvious relationships among these three variables. The first two columns may have some negative relationship while the last two columns have a slightly positive relationship. #1.26 (a) Breed SaleP YearlingHT FFBody PctFFBody Frame Back.fat SaleHT Salewt x Table 10: The sample means of the variables in the bulls dataset. 6

7 Breed SaleP YearlingHT FFBody PctFF Frame Back.fat SaleHT Salewt Breed SaleP YearlingHT FFBody PctFF Frame Back.fat SaleHT Salewt Table 11: The sample variance covariance matrix of the variables in the bulls dataset. Breed SaleP YearlingHT FFBody PctFFBody Frame Back.fat SaleHT Salewt Breed SaleP YearlingHT FFBody PctFFBody Frame Back.fat SaleHT Salewt Table 12: The sample correlation matrix of the variables in the bulls dataset. Only a few variables(frame, Yearling height, and Sale Height) have strong relationships with each other. I do not think the breeds are well separated in this system since all the correlations between breed and other variables are not strong. The best potential variable to distinguish between breeds is back fat, which has the strongest linear relationship with breed. (b) I did not find any obvious outliers from Figure 4. From the three dimensional plot, we can observe that most bulls with breed 8 (Simental) have less back fat and larger frame. And the values of back fat and frame in breed 1 (Angus) are more spread out. 7

8 Figure 4: A three dimensional plot. (c) This time the points are more closely clustered, so it is more clearly to separate these three breeds. Bulls with breed 8 (Simental) have higher fat free body weight and higher sale height. And the values of fat free body weight and sale height in breed 1 (Angus) are more spread out. 8

9 Figure 5: A three dimensional plot. 9

10 #2.20) [ ] 2 1 A 1 3 [ ] [ ] [ ] [ ] A 1/2 P Λ 1/2 P [ ] [ ] [ ] [ ] A 1/2 P Λ 1/2 P [ ] [ ] [ ] [ ] [ ] A 1/2 A 1/ I # 2.23) V 1/2 ρv 1/2 σ11 1 ρ ρ 1p σ22 ρ ρ 2p σpp ρ 1p ρ 2p... 1 σ 11 ρ 12 σ11 σ22... ρ 1p σ11 σpp ρ 12 σ11 σ22 σ ρ 2p σ22 σpp σ11 σ22... σpp ρ 1p σ11 σpp ρ 2p σ22 σpp... σ pp σ 11 σ σ 1p σ 12 σ σ 2p Σ σ 1p σ 2p... σ pp 10