Introduction to Statistics I

Size: px
Start display at page:

Download "Introduction to Statistics I"

Transcription

1 Introduction to Statistics I Keio University, Faculty of Economics Course Description and Introduction Simon Clinet (Keio University) Intro to Stats September 27, / 26

2 General information Instructor : Simon Clinet (Faculty of Economics) clinet@keio.jp Office : South Building, (7th Floor), Mita Campus Course Schedule : Thursday, 9:00-10:30 (Room D201), 14 sessions Simon Clinet (Keio University) Intro to Stats September 27, / 26

3 Course description and definition of statistics The aim of this introductory course is to give an overview of basic statistical tools. Definition (Statistics:) Statistics consists of a body of methods for collecting and analyzing data. Simon Clinet (Keio University) Intro to Stats September 27, / 26

4 Reference books The course does not follow closely any book on statistics, although I sometimes pick up ideas, examples and exercises from Cheng-Few Lee, John, C. Lee, and Alice C. Lee. (2013) Statistics for Business and Financial Economics, parts I-IV, 3rd edition, Springer. It is available online on the Keio Media Center website. Simon Clinet (Keio University) Intro to Stats September 27, / 26

5 Webpage of the class You can find the webpage of this course at : Slides, other files related to the course, exercises, and changes in the schedule will be uploaded regularly. Simon Clinet (Keio University) Intro to Stats September 27, / 26

6 Teaching Assistant Teaching Assistant: TBA. There will be Q&A sessions for people who want/need help for this course. Simon Clinet (Keio University) Intro to Stats September 27, / 26

7 Evaluation One final exam (late January). One homework, optional, around November. Final evaluation if you do the homework : 20% homework, 80% final. Final evaluation if you don t do the homework : final only. Simon Clinet (Keio University) Intro to Stats September 27, / 26

8 Calculator Each student will need a calculator, capable of performing basic operations: +,,,,, and 2. It is not necessary to have a graphic calculator. Simon Clinet (Keio University) Intro to Stats September 27, / 26

9 Introduction What is statistics? Definition (Statistics) Statistics consists of a body of methods for collecting and analyzing data....and what is data? Definition (Dataset) A dataset is a collection of measurements related to a set of individuals (persons, objects,...). Simon Clinet (Keio University) Intro to Stats September 27, / 26

10 Datasets - Examples A few examples of datasets: Medical measurements (blood pressure,...) for a given group of people. Set of physical characteristics (miles per gallon, weight, acceleration,...) for a group of cars. List of unemployment rates by country. List of asset prices on a stock market. Simon Clinet (Keio University) Intro to Stats September 27, / 26

11 Statistical problems - Examples...and possible related statistical problems: What is the appropriate drug dose for a treatment? What are the characteristics of cars with the highest performances? What will be the unemployment rate in Japan next year? Are stock markets riskier than five years ago? Simon Clinet (Keio University) Intro to Stats September 27, / 26

12 Population and Sample Population and Sample are two fundamental notions in statistics. Definition (Population) Population is the collection of all individuals or items under consideration in a statistical study. Definition (Sample) Sample is that part of the population from which information is collected. Definition (Sample size) Sample size is the number of individuals in the sample. In most cases, a sample is just a subset of the population. Simon Clinet (Keio University) Intro to Stats September 27, / 26

13 Sample, Population - Example Example (Sample vs Population) In order to determine who will win the next national election, we often conduct an opinion poll. The population of this experiment is the set of all people who can vote in the country. Unfortunately, this set is very large, so that the poll is usually conducted on a sample of size, say, 1, 000 people. Simon Clinet (Keio University) Intro to Stats September 27, / 26

14 How to sample from the population? The population is always the primary target of a statistical investigation. Therefore, we need the sample to be representative of the global population. As much as possible, the sample should be picked from the population at random. Otherwise, the sample is said to be biased. Simon Clinet (Keio University) Intro to Stats September 27, / 26

15 Sampling - Example We conduct an opinion poll in Japan. What may be the sources of bias in the following methods? Conduct a door-to-door investigation in Tokyo. Call random cellphone numbers. Simon Clinet (Keio University) Intro to Stats September 27, / 26

16 Organization of the data - Variables Definition (Variable) A variable is a characteristic that varies from one individual (person, object,...) member of the population to another. Example Examples of variables for humans are age, height, weight, number of siblings, marital status, and so on. Exercise Give a few examples of variables for countries. Simon Clinet (Keio University) Intro to Stats September 27, / 26

17 Organization of the data - types of variables Definition (Quantitative/Qualitative variable) Example A variable is quantitative if it is represented by numerical values. It is always associated with a unit of measurement (meter, second, kilogram...). A variable is qualitative (or categorical) if it is represented by non-numerical values (called categories, or labels). For humans, age, height, weight, and number of siblings are quantitative. marital status is qualitative. Exercise Are body temperature, eye color, and yearly wage quantitative or qualitative variables? Simon Clinet (Keio University) Intro to Stats September 27, / 26

18 Organization of the data - basic representation Observing the values of the variables of individuals yields data. Each individual piece of data is thus called an observation. Usually, we represent a dataset as follows: Observation # Variable1 Variable x 11 x x 21 x x 31 x x 41 x x 51 x Each row corresponds to one individual from the sample, each column represents one variable, and the xs are the values of the different variables. For example, x 11 is the value of Variable1 for the first individual of the list. Simon Clinet (Keio University) Intro to Stats September 27, / 26

19 Example : life expectancy vs gender Person # Lifespan Sex (years) M/F M F M M F F M M M F M F M M F M M M One quantitative variable (lifespan), one qualitative variable (sex). What is the value of lifespan for observation 10? Sample size? Simon Clinet (Keio University) Intro to Stats September 27, / 26

20 Example : Stock price Simon Clinet (Keio University) Intro to Stats September 27, / 26

21 Example : Stock price (2) trade # price ($) Simon Clinet (Keio University) Intro to Stats September 27, / 26

22 Univariate, multivariate dataset Definition (Univariate, bivariate, multivariate) We say that a dataset is Example Univariate if it contains only one variable. Multivariate if it contains two or more variables. We also say bivariate if it contains exactly two variables. The life expectancy dataset is bivariate (and multivariate). The stock price dataset is univariate. Simon Clinet (Keio University) Intro to Stats September 27, / 26

23 Exercise We consider the following car dataset: car # miles per gallon cylinders horsepower weight (lbs.) acceleration name chevrolet chevelle malibu buick skylark plymouth satellite amc rebel sst ford torino ford galaxie 500 time (sec.) to accelerate from 0 to 60 mph. Exercise (car dataset) 1 What is the sample size? How many variables are reported in the above table? Is the dataset univariate? bivariate? multivariate? 2 are there qualitative variables? if yes, which ones? 3 Does miles per gallon vary across the sampled cars? Same question with cylinders? Simon Clinet (Keio University) Intro to Stats September 27, / 26

24 Conclusion/summary In order to derive statistical information from a population, we gather data from a sample (subset) of the population. A dataset is the collection of all the observations of a given sample. Each observation contains the values of a set of variables related to one member of the sample. A variable is anything that may vary from one individual to the other. Quantitative variables are measured with numbers, whereas Qualitative variables are reported with labels (non-numerical values). A raw dataset is often presented as a table, where, each row corresponds to an observation, and each column represents a variable. Simon Clinet (Keio University) Intro to Stats September 27, / 26

25 Maths for introductory statistics: what you need to know Basic operations on numbers: +,,,,. Exponentiation: x 2, x 3,... Absolute value: x = x if x 0, and x = x if x < 0. Functions: x f (x), in particular linear functions f (x) = ax + b. Graph of a function. Simon Clinet (Keio University) Intro to Stats September 27, / 26

26 Maths for introductory statistics: what you need to know (2) We often use the symbol (sum). Definition ( ) For n numbers x 1,..., x n, we write n i=1 the total sum x x n of those numbers. Exercise We consider four numbers (n = 4) of the form x 1 = 1, x 2 = 0, x 3 = 2, x 4 = 1. Calculate 4 i=1 x i. Calculate 2 i=1 x i. Calculate 4 i=1 x i 2. x i Simon Clinet (Keio University) Intro to Stats September 27, / 26