Midterm Review Statistics Summer 2015

Size: px
Start display at page:

Download "Midterm Review Statistics Summer 2015"

Transcription

1 Name: Midterm Review Statistics Summer 2015 Problem 1: Statistics is the study and analysis of data. All of the following are great uses of statistics EXCEPT: a) Deriving valid conclusions in social sciences, psychology, medicine, business, and many other fields of knowledge b) Testing the validity of a social norm c) Identifying the likelihood of developing a certain disease by studying and doing statistical analysis on people who suffered from this disease in the past d) Influencing and promoting the agenda of a few people in the popular media such as, TV or News channels by presenting statistical studies where data may not represent the population Problem 2: Identify the data type (qualitative/quantitative) for each: 1) Your service experience at your local bank where you do banking 2) The number of homes that are sold last year in your city 3) The information collected from local businesses about the state of their current business 4) The amount of personal debt you have 5) The ratio of U.S. dollar to euro Population: The entire set of items/things/objects that are under consideration Sample: Any subset of the population under consideration Problem 3: Determine whether these represent sample or population or cannot be determined: a) Study subjects: The collection of all the students that your teacher taught over the last decade b) The students in your class while the subjects of study are all the students in your college c) The people of the city where you live d) The subset of a collection of objects under study e) The 500 stocks in the S&P 500 index in the U.S. Data Collection: Should consider the context, sampling method, and that the sample represent the population. Also data collection should be a random process to avoid any bias and to represent the population Problem 4: Do the following represent the population? a) While studying the economic wellbeing of a city considering a sample from the wealthy neighborhoods of that city b) Visiting a church on a Sunday morning and asking people about whether they support abortion rights

2 Levels of measurements: Nominal Ordinal Interval Ratio Problem 5: Identify different levels of measurements a) The numbers on the jerseys of our state champion Hartnell College men s soccer team b) The coaching rank of Daniel Ortega who won the 2013 National Coach of the Year Award for Junior College Men Division III c) Your educational expenses in the current academic year d) Temperatures today in Salinas, CA and Dhaka, Bangladesh e) The ranking of the four year schools that you are planning to attend f) Your yearly income g) The years that these great human beings Jesus, prophet Mohammed, Buddha, MLK, and Gandhi were born h) The ranking of all the Superbowl i) The meaningless score that you receive in the IQ test which is UNIDICATIVE of your true intellectual potential j) Different types of mortgage products/loans that you can obtain to purchase a home k) Your total debt l) The World Bank has a ease of doing business ranking as reported by Doing Business ranking at The U.S. ranks 4 th while Singapore ranks first. m) The distance that you travel from your home to your school Problem 6: Continuous versus discrete a) The average life expectancy in the U.S. b) The half, one third, one fourth, and one fifth of 60 people that are in a store c) The gradual speeding up of a car also known as acceleration d) The birth rate in the U.S. Problem 7: Statistic versus parameter a) The black eyed people in the U.S. population b) The left wing politicians in the U.S. congress c) The age information of all the U.S. senators d) The average weekly expenses of 5 households in your neighborhood given that there are more than five households in your neighborhood e) The racial composition information of all the people that live in Pebble Beach given that our study object is the population of Pebble Beach Problem 8: Descriptive statistic versus inferential statistics a) Presenting information in the form of a bar graph b) Collecting data from a sample that represents the population and finding the average of that data and claiming that to be the population average

3 c) Finding the median home price in the city that you live in and claiming that to be the median home price in your county d) Creating a pie chart of collected data Problem 9: Which ones are probability samples? Remember that in probability sampling every item/object/person has some probability of being randomly or semi randomly selected even though the probability of being selected may differ. a) Referral of new clients to a mortgage broker by his/her existing or previous clients b) Selecting a person by simple random sampling; remember the definition of simple random sampling c) Systematic sampling d) Convenience sampling e) Cluster sampling f) Stratified sampling g) Asking friends only to find out about what movies adults in your city like Problem 10: Sampling Types: Identify different sampling types such as simple random sample, convenience, systematic, stratified, and cluster a) The U.S. educational system is divided into instruction (61%), support (35%), and noninstruction (4%). You create a random sample by collecting 61% of the sample data from instruction, 35% of the sample data from support, and 4% of the sample data from noninstruction b) Assign a unique number to every item in the population and then randomly select the first item and then select every fifth item and collect data from the selected items c) Divide the population of your city into groups based on their religious preference and thereof lack of it; then collect data randomly from one group d) A computer program randomly generates a number between 1 and n where n is the total number of objects that your population has. You take the random number from the computer select the object that is associated with that number and collect data from that e) You find the average number of hours people watch unproductive TV by collecting data from your neighborhood; just for your information most of TV is unproductive, like most of social media such as FB; alas!!! you are wasting your precious time to make others rich but not building your own life Problem 11: Which of the following sampling methods you think may be most representative of the population and which is most likely to be biased or least likely to represent the population: a) Cluster b) Stratified c) Systematic d) Convenience e) Simple Random Sample

4 Problem 12: The situations below would be described as: a) Diabetes is about to become the world s most common disease, thanks to fast food and junk food. You study a group of people living in a city that has high incidence of diabetes and you attribute that to the genetic predisposition of the people that live in that city. But your friend Ms. Smarty says it is due to lack of exercise and fast food and junk food that they consume b) In Monterey county, particularly Salinas or areas near Ag industry, a higher proportion of people suffering from cancer. They say it is hard to determine what is causing these cancers. But as teacher I see students who live nearby Salinas and/or Ag areas are developing cancers and I propose that it may have something to do with the pesticide. Note: Over the years, I found many students, both young and old, suffering from cancer that lives in the Salinas area or Ag area Problem 13: Determine the study types: Longitudinal/prospective, cross-sectional, and retrospective study a) In a study, data was collected over a period of time in the past from two groups of nurses one that smoked and the other that did not smoke. Then researchers looked at the rate of lung cancer occurrences among these two groups b) It is realistically not possible to design real life car crashes, so a researcher studies the car crashes as they happen in recent times c) Observing the tremendous human suffering that is caused by dropping of Atomic bomb in Hiroshima and Nagasaki during the second world war and studying the lives and health of these people over the years that survived Problem 14: Critical and/or statistical thinking: a) By definition the U.N. located in NYC is an independent state/country that enjoys immunity from laws of the land in the U.S. This is kind of like the Vatican in Italy. So your friend Mr. Smart says Native American reservations are also like the independent states of their own and hence enjoy similar opportunities in the U.S. as the U.N. does. Your friend is: a) Smart b) Cunning or dumb c) Has no statistical basis for his conclusion b) One can change the order of the questioning or change the word slightly to influence outcomes of a study True False c) Collecting data from a large number of poor neighborhood schools in the U.S. and concluding that the students in the poor neighborhood are underperforming hence not as intellectually gifted would be valid conclusion because a large number of schools were considered in the study: True False d) In Bangladesh two women, Hasina and Khaleda Zia, have been ruling the country for the last 20 years. Hence we conclude that Bangladesh is more liberal than the U.S. in terms of women s equality, at least in the political arena. Another very interesting fact, in Bangladesh, around 10% of the cabinet members, like the House of Congress in the U.S., is reserved for women which I am not sure how many countries in the worlds has; so would you conclude that Bangladesh has better political integration of women than many of the Western nations Yes No

5 e) My little sister who at the time lived in Bangladesh and her sole source of information about the U.S. were the American TV channels. She concluded, based on watching American TV channels, that the crimes in the U.S. are mostly committed by the Blacks. While her experience or thereof lack of it, was due to her limited access to information about the U.S. But you live in the U.S. and do you believe that you are being biased as my sister was in making conclusions even though you have all the information at your disposal while she did not: Yes No Problem 15: Identify different levels of measurements for (a) through (d) (a) Heights of buildings in the city of Salinas Nominal Ordinal Interval Ratio (b) Temperature on different days of the year in Salinas Nominal Ordinal Interval Ratio (c) Possible letter grades you may receive in the class Nominal Ordinal Interval Ratio (d) Names of 10 students from the class Nominal Ordinal Interval Ratio (e) Identify different types of sampling and data: At a buffet they have different types of foods; these are American food, Asian food, Indian food, and Italian food. You eat two items randomly from each category of food. Convenience Systematic Stratified Clustering (f) Determine if the following example represents discrete or continuous data: Candy store sells candies only in the following amount: 0.2 lbs, 0.4 lbs, 0.6 lbs Continuous Discrete

6 Problem 16: Find the following for this data set {9, 1, 5, 3, 6, 8, 8, 4, 3, 2, 1, 1, 8, 9, 7} Mean: Median: Mode: Range: Midrange Problem 17: Your grade in the class consists of 2 midterms 15% each, homework 10%, project 5%, attendance 10%, and final 30%. Find the weighted mean if your scores are as follows in the class. Midterm 1: 85 Midterm 2: 90 Final: 80 Homework: 95 Project: 100 Attendance: 75 Weighted Mean: x = ( w x) w Problem 18: Let us assume that you are taking 4 courses this semester and you got the following grades: English A (4.0) 3 units Stats A (4.0) 5 units History B (3.0) 3 units Psych C (2.0) 4 units Calculate your weighted average for your grade this semester.

7 Problem 19: For the given set of data answer the following questions: {9, 1, 5, 3, 6, 8, 8, 4, 3, 2, 1, 1, 8, 9, 7}. To find class frequencies, find the number of digits in each class. a) Find the class boundaries, class frequencies, and class midpoints Class Limits LCB UCB Frequency Class Midpts b) Create a frequency histogram for the above dataset c) Create frequency polygon for the above dataset

8 Problem 20: For the given set of data create a relative frequency polygon and ogive: {9, 1, 5, 3, 6, 8, 8, 4, 3, 2, 1, 1, 8, 9, 7}. To find class frequencies, find the number of digits in each class. a) Find the relative frequencies and cumulative relative frequencies: Class Limits LCB/UCB Relative Frequency Cumulative Relative Frequency b) Create a relative frequency polygon for the above dataset c) Create ogive

9 Problem 21: The following are the ages of some students in a stats class; create a stem and leaf diagram for this using teens, twenties, thirties, and so on: 18, 29, 33, 40, 16, 22, 28, 19, 39, 20, 18, 19, 17, 27, 33, 55, 18, 49, 24, 29, 17, 26, 25, 31 Problem 22: Assume that you take a random sample of 200 people from Salinas and find that their average income is $48,000 per year with a standard deviation of $9000; also assume that income distribution is bell shaped: (a) What can you say about the number of people who make between $25,500 and $70,500 out of the sample of 200 people? (b) What can you say about the number of people who make more than $70,500 out of the sample of 200 people? Problem 23: An online survey of 164 undergraduates at Baylor University found that they spend the most time texting, with an average of 94.6 minutes a day. That was followed by sending s (48.5 minutes), checking Facebook (38.6 minutes), surfing the Internet (34.4 minutes), and listening to music (26.9 minutes).

10 (Source: Psychcentral.com) Ms. Popular Undergraduate spends about 150 minutes per day texting. Given that standard deviation for texting time per day is 25 minutes. What is her z-score? Would she be considered an outlier? Problem 24: Find the first quartile (25 th percentile) and the third quartile (75 th percentile) for this data set, and then create a Box Plot: {9, 1, 5, 3, 6, 8, 8, 4, 3, 2, 1, 1, 8, 9, 7, 11, 2, 20, 19, 31, 0, 5, 3, 12} Problem 25: a) Give one example of a situation where you would use median as a measure of center as opposed to using the mean or the mode b) Give an example of a situation where mode would be a more appropriate choice for the measure of center than would the mean or the median c) Give one example where mean would be a better choice as a measure of center than would the median or the mode

11 Problem 26: State the Empirical Rule Problem 27: A couple wants to have three babies for the next three years, only one baby per year (either boy or girl) and no other possibility. Create a sample space, i.e. collection of all simple events: Find the probability that they will have two girls and a boy. Find the probability that they will have at least one girl. Find the probability that they will have no more than two boys. Find the probability that they will have no girls. Find the probability that they will have between one and three girls. Problem 28: You need to answer two questions; one is true/false and the other is a multiple choice question with 6 choices. Create the sample space: Find the following probabilities: Getting both questions right: Getting both wrong: Getting at most two wrong: Getting at most two right: Getting at least one right:

12 Problem 29: A box contains 10 black socks and 10 white socks. If you close your eyes and pick socks, then how many socks do you have to pick in order make sure that you have a pair of the same color? Find the probability of picking 3 black socks in a row without replacement: Find the probability of picking 2 blacks and a white sock in a row without replacement: Problem 30: Quality Control: As a quality control manager in a clothing company you randomly select 5 shirts from a collection of 2000 shirts that just came to your company from Bangladesh. You will reject all the shirts of if you find at least one faulty shirt. It is assumed that there are 20 faulty shirts in the lot of 2000 shirts. Find the probability of accepting all the shirts in this lot. Problem 31: A student takes a multiple-choice test. Each question has 5 different choices and there are 5 different questions. Find the probability that the student gets at least one question right by pure guessing. Problem 32: An access code has 6 characters. First four are digits and the last two are alphabets which are case sensitive. A thief trying to break this code has a probability of success: Problem 33: You have the option of buying one car from 5 different types of cars and you may pick one insurance from 4 different choices. What are total number of ways you can have the car and insurance combination?

13 Problem 34: A student committee consists of 13 members. They need to elect a president, a vice president, and a treasurer. How many different ways this can be accomplished? Problem 35: Permutation and combination: What are your chances of winning the Mega Millions Lottery? You pick 5 numbers from 1 to 56 without replacement and 1 number from 1 to 46. Problem 36: Age discrimination: Among 13 managers the company laid off 3 oldest managers. Do you think there was discrimination involved in the process based on your calculations?

14 Problem 37: Your chances of passing the statistics class is 80% and your chances of passing another class that you are currently taking is 70%. The probability that you would pass both of the classes is 65%. What is the probability that you would pass at least one of the classes?