Review Materials for Test 1 (4/26/04) (answers will be posted 4/20/04)

Size: px
Start display at page:

Download "Review Materials for Test 1 (4/26/04) (answers will be posted 4/20/04)"

Transcription

1 Review Materials for Test 1 (4/26/04) (answers will be posted 4/20/04) Prof. Lew Extra Office Hours: Friday 4/23/04 10am-10:50am; Saturday 12:30pm-2:00pm. E- mail will be answered if you can send it before noon on Sunday 4/24/04. No guarantees for an answer if it is sent after that time. There are no office hours on Monday 4/26/04. Exam coverage: Chapter 1, Chapter 2, Chapter , Chapter 6.1 PLEASE BRING SOME FORM OF PHOTO IDENTIFICATION (e.g. Bruin Card, Drivers License, etc.) TO THE EXAM. ATTENDANCE WILL BE TAKEN. PLEASE REMEMBER TO BRING WRITING INSTRUMENTS AND A CALCULATOR. WE WILL PROVIDE AN EXAM PACKET AND SCRATCH PAPER. FORMULA SHEET: ONE SIDE OF A SINGLE 8.5 x 11 piece of paper with formulas is allowed into the exam. Typed, laser printed, cut and paste, handwritten, ink/pencil/crayon, small fonts, fancy fonts, solved problems, highlighting, different colors, prayers, curses, are acceptable. NO POST-ITS. It is yours, you have to use it, I do not collect it. Additional review: Know the following Stata commands (these are only examples, but you should know them more generally with different variables) and their associated output tabulate gender summarize height summarize height, detail graph box income, over(gender) Review the lecture notes before the exam. Everything covered in lecture is eligible for the exam. If something was covered in the textbook but not covered in any lecture, it will not be on an exam (for example, the equation box on page 5 was not covered in lecture and will not appear on the first exam). How many digits? You will be absolutely safe if you carry 4 digits after the decimal place. If you round too severely e.g. some people will round to 4.0, you will probably lose a point because that number diverges too far from the correct answer. If were the final answer, you could round to 3.5 and be safe. If goes into another calculation, however, I d round it to and then round any resulting final answer to one or two digits after the decimal (depends). So for example, 1.52 * =5.3472, you can round it to 5.35 or 5.4 even. I have seen some students give an answer like 8 (instead of 5.35 or 5.4) because they had rounded too severely too soon. SHOW YOUR WORK FOR FULL CREDIT. All solved problems with numerical answers require evidence that you did the work. Nothing is obvious to your professor who is the dumbest person in the room during any exam and during the grading process. WHAT FOLLOWS ARE ACTUAL EXAM QUESTIONS USED IN PAST CLASSES. YOUR TEST WILL NOT BE THIS LONG. IN GENERAL EXAMS CONSIST OF 3 PERHAPS 4 QUESTIONS WITH MULTIPLE PARTS.

2 1. The Dull Computer Company makes its own computers and delivers them directly to customers who order them via the Internet. Dull's market dominance has arisen from its quick delivery and competitive pricing. The CEO of Dull has stated publicly that if customers make unassisted online purchases of their computers, these computers will have a mean delivery time of 48 hours from time of purchase with a standard deviation of 11 hours and a mean cost of $2,433 with a standard deviation of $800. The quickest delivery time was 12 hours and the slowest delivery time was 102 hours. The median delivery time was 36 hours and the median cost of the computers was $2,988. He also went on to state that 28% of those computers are delivered in less than 24 hours. a. Suppose there is a transportation strike and delivery times are doubled plus an extra hour for loading/unloading of freight for all computer purchases. So for example, a computer which once took 40 hours for delivery will now require (39*2) + 1 = 79 hours for delivery. Using this information, please calculate the new range of delivery times and write the word YES with your answer. If you think it is not possible to calculate the new range with the existing information, please write NO and then explain why not. If you calculate a new range AND decide to write NO in the hopes of covering all the bases, you will not get any credit for your answer there is only one right answer here. (4 points) b. Are the distributions of delivery time and of cost are symmetric or skewed? i. They are both symmetric ii. Time is symmetric, Cost is negatively (left) skewed iii. Time is negatively (left) skewed, Cost is positively (right) skewed iv. Time is positively (right) skewed, Cost is negatively (left) skewed v. Time is positively (right) skewed, Cost is symmetric vi. Not enough information to make a determination c. Grateway, a Dull competitor, hopes to make some money during the transportation strike because it employs a different carrier service. Grateway computers have a mean cost of $2,452 with a standard deviation of $400, their median cost is $2,311. Based only on the information that you have been given and what you have learned in Statistics 11 so far, which company probably sells its computers at lower cost to the majority of their customers? (choose only one): DULL GRATEWAY Justify your choice in the space below:

3 2. We recently downloaded the current market information on publicly traded securities from There are 4,367 active securities in the dataset. The next three questions refer to the following Stata output on the variable "pricechange" which is the percentage change in price in the last 26 weeks (about six months).. summarize pricechange, detail Price % Change 26 weeks Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis a. What is the interquartile range for this variable? Please show your work. (2 points) b. Are there suspected outliers for this variable? Answer yes or no first and then show mathematically how you arrived at your answer. (3 points) c. (Circle one) This is an observational study (2 points): TRUE FALSE

4 3. Congratulations, you graduated from UCLA and own your own company. Your Chief Information Officer (CIO) has developed a table of the number of computer failures per day at your firm by type of computer (Windows machines and Macs). To make your life difficult, she left out some information: a. You need to complete the tables:(4 points) Windows Machines outcome (number of failures per day) proportion Macintosh Machines outcome (number of failures per day) proportion b. Please calculate the mean failures for each type of machine. (4 points) c. Please calculate the standard deviation for the failures for each type of machine. (8 points) d. Is it possible to calculate a median number of failures for the two types of machines? Answer yes or no. If you answer yes, please write down the medians for each type of machine. If you answered no, please explain why it is not possible to calculate it. Be brief. (4 points)

5 4. You work for a credit card issuer and it is your job to issue cards to new customers. Since you also go to school, you decide to randomly issue cards to college students. Suppose it is known that 30% of all college students will eventually fail to pay their credit card debt within the first year of possessing a credit card. A. (3 points) You issue credit cards to 3 students selected at random, what is the chance that at least one of them will fail to pay their credit card debt within the first year? Assume independence. (a) 10% or.10 (b) 30% or.30 (c) 34% or.34 (d) 66% or.66 (e) 70% or.70 (f) 90% or.90 (g) less than 10% (less than.10) (h) greater than 90% (greater than.90) B. (3 points) Suppose once a student fails to pay on a credit card, the chance that the student will fail to pay on the next credit card that is issued rises to 60%. If a student has not failed to pay on a credit card, the chance still remains 30% for the next card that is issued. A randomly chosen student has been issued two cards. What is the chance that the student will fail to pay on at least one of the two cards? (Hint: a branch might be helpful here): (a) 60% or.60 (b) 30% or.30 (c) 36% or.36 (d) 49% or.49 (e) 51% or. 51 (f) 64% or.64 (g) 90% or.90 (h) greater than 90% (greater than.90) 5. Indicate whether the following statements are true or false T F Statement A. Randomization is necessary to insure equality between treated and control subjects in an experiment. B. Control Groups are necessary in experiments. They allow us to compare the results from the treatment group properly. C. Bias in samples is unavoidable. D. Bias in bservational studies can be reduced by including as many confounding factors as possible in analysis. E. Confounding may cause observational studies to be misleading. F. Random assignment in experiments counteracts confounding.

6 6. The box plots below summarize the percentage change in stock price for 4 industries (each industry has many companies) for January Please answer the following questions on the basis of this graphic. (a) (b) For the Biotech industry, the 75 th percentile for percentage change in stock price is (3 points): A. A little over 90% B. About 75% C. About 60% D. slightly less than 40% E. not calculable from this graphic Is the distribution of percentage change in stock price nearly symmetrical for any these industries? (circle one, 2 points) YES NO NEED MORE INFORMATION (c) Please tell us how you arrived at your answer in part (b) above, use calculations where appropriate (5 points)

7 7.Congratulations! You graduated from UCLA and got a job as the marketing director for a large grocery store chain which just created an internet site that allows its existing customers to purchase products on-line. Your supervisor is wondering whether allowing existing customers to use special store coupons during their on-line purchases will increase average sales. You decide to randomly sample 200 on-line customers, gather information about their purchasing behavior and demographics, then using a random method, give 100 of the on-line customers special store coupons for use during their next on-line purchase and you do not give the other 100 any special store coupons. You find that the 100 who were given the special store coupons purchased $25 more on average than the 100 who did not receive the special store coupons. Data on 2.5 million existing customers reveals that the use of special store coupons during non-online transactions increased purchases by $18 on average. Please answer the following questions: a. (2 points) The parameter is: i. 2.5 million customers ii. 200 customers iii. 100 customers iv. 25 dollars v. 18 dollars b. (2 points) The statistic is: i. 2.5 million customers ii. 200 customers iii. 100 customers iv. 25 dollars v. 18 dollars c. (2 points) The population is: i. 2.5 million existing customers ii. all on-line customers only iii. 200 customers iv. 100 customers v. none of the above d. (2 points) The sample is: i. 2.5 million existing customers ii. all on-line customers only iii. 200 customers iv. 100 customers v. none of the above e. (2 points) This study is: i. an Observational Study with controls ii. an Observational Study that uses a random probability method for sample selection. iii. a Randomized Experiment without Controls, but it is blind iv. a Randomized Experiment without Controls, but it is double-blind v. a Randomized Controlled Experiment

8 8. Congratulations! You became a traveling salesperson for a large manufacturer. You make 2 calls per year on each client. Your chance of a sale each time you call is 75%. The next two questions ask you about what can happen after one year. a. Please complete this table of events (9 points) outcome (number of possible sales in one year for each client) relative frequency (proportion, probability) b. Using the information in your table (it s OK if the values are wrong, pretend that they are correct) and suppose they represented the pattern of sales over countless calls, what is the mean number of sales? (6 points) c. Using the information in your table (it s OK if the values are wrong, pretend that they are correct) and suppose they represented the pattern of sales over countless calls, what is the standard deviation for the number of sales? (5 points) 9. Please indicate whether the statements below are true or false (1 point each) True False Statement A Variables whose values are categorical but not quite quantitative are often called ordinal (ordered) variables B A categorical variable is a variable that labels categories with text or even numbers C D E A relative frequency table differs from a frequency table by giving percentages rather than counts of the values in each category of a categorical variable A representative sample is a sample whose statistics reflect the corresponding population parameters accurately Placebos are the best way to blind subjects from knowing whether they are receiving the treatment or not

9 10. The CEO of company A bids on consulting jobs. The CEO of company B bids on consulting jobs too. Each company has a table of the number of jobs the company is awarded per year (over countless years) and the relative frequency or proportions for each unfortunately, there is a bit of missing information: a. Fill in the missing information in their tables (2 points each company, 4 points total) Company A Number of jobs Proportion Company B Number of jobs Proportion b. What is the mean profit for each company? (3 points each company, 6 points total) c. Find the standard deviation of the number of consulting jobs awarded per year for each company. (5 points each company, 10 points total)

10 11. The UCLA-USC football game is the number one party event of the year for Bruins, exceeding even commencement celebrations (mostly because parents are present at commencement). Suppose it is known that the typical Bruin football party has 15 UCLA students on average with a standard deviation of 4.3 UCLA students. Many activities will occur on that day and for all UCLA students attending football parties, they will result in a mean financial change of -$18 with a standard deviation of $21. The UCLA Management School decided to study the effects of attending football parties on the resulting financial changes experienced by college students. 1,050 college students were selected randomly from registration lists obtained from all 3,800 colleges and universities in America. Of that 1,050, 370 students reported that they had attended a football party, 140 did not attend a party but watched the football game on television at home. The remainder did not attend a party or watch the game on television. The financial change experienced by the partygoers had a mean of -$12 with a standard deviation of $11. The financial change experienced by the non-party goers had a mean of -$1 with a standard deviation of $6. Among the party goers, 77% reported getting drunk, only 5% of the non-party goers reported getting drunk. Please use the information above to answer the following questions. A. Please identify the following for the UCLA Management school The population parameter of greatest interest (3 points) Answer The population (2 point) The sample (1 point) An example of a sample statistic from their study (2 points)

11 12. Some computer output from a database of 907 movies produced between The variable totalgrossreceipts is the total amount of money earned (in millions) domestically. totalgrossreceipts Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis A. The distribution of total gross receipts is: (2 points) i. Symmetric, with mean and Standard Deviation ii. Right (positively) skewed with mean > median iii. Right (negatively) skewed with mean > median iv. Left (positively) skewed with mean > median v. Left (negatively) skewed with mean > median vi. None of the above B. (1 point) Are there outliers present in this variable? (circle one) YES NO C. (3 points) Justify your answer in the space below (an answer utilizing numbers is required for full credit) D. Please calculate the range (not interquartile range AND SHOW YOUR WORK FOR FULL CREDIT) and list the values of the quartiles (i.e. Q1, Q2, Q3) for this variable in the space below. (5 points)

12 ,000 1,200 1,400 1, The horizontal axis should be labeled GROUP and the vertical axis should be labeled POINTS. The dark dots should look like open circles. Using the box plot shown above, please answer the following questions: a) Is there enough information present to estimate the inter-quartile range (IQR) for group 3? (circle one) YES NO If you answered YES please give an approximate estimate of that value in the space below, if you answered NO please explain why it is not possible to estimate the IQR in this situation. (3 points total) b) Which group appears to be the most symmetrical of the four groups? (circle one) (2 points) Not enough information c) Which group appears to be the most left skewed? (circle one) (2 points) Not enough information

13 14. Classify the following variables as categorical, ordinal or numeric (quantative) by checking the correct box, if it is a quantitative variable, further classify the variable as either discrete or continuous: Variable Categorical Ordinal Numeric Discrete Continuous (Quantitative) A Hair Color B C Frozen Food Brand Number of students in a classroom D Your age E Stages of economic development F Grades on high school report cards

14 The Dull Computer Company makes its own computers and delivers them directly to customers who order them via the Internet. Dull's market dominance has arisen from its quick delivery and competitive pricing. The CEO of Dull has stated publicly that if customers make unassisted online purchases of their computers, they will have a mean delivery time of 35 hours from time of purchase with a standard deviation of 11 hours and a mean cost of $1,603 with a standard deviation of $400. A consumer research organization decided to test the CEO's mean delivery time claim by purchasing 100 computers from Dull at randomly selected times and days. The 100 purchases were randomly divided into two groups: 51 were purchased by telephone and involved talking to a live salesperson, the remaining 49 were unassisted online purchases. The delivery time of the 49 had a mean of 36 hours with a standard deviation of 16 hours and they also had a mean cost of $1,588 with a standard deviation of $ of the 49 computers were delivered in less than 24 hours. 15A. (2 points) The population of interest is (a) all Dull Computers (b) all Dull Computers purchased online and unassisted (c) 100 computers purchased from Dull by the consumer research organization (d) 51 computers purchased by telephone (e) 49 computers purchased online and unassisted 15B. (2 points) The sample of interest is (a) all Dull Computers (b) all Dull Computers purchased online and unassisted (c) 100 computers purchased from Dull by the consumer research organization (d) 51 computers purchased by telephone (e) 49 computers purchased online and unassisted 15C. (2 points) The statistic of interest is (a) 36 hours (b) 11 hours (c) 35 hours (d) 16 hours (e) $1,603 (f) $1,588 (g) $400 15D. (2 points) The parameter of interest is (a) 36 hours (b) 11 hours (c) 35 hours (d) 16 hours (e) $1,603 (f) $1,588 (g) $400