INTRODUCTION TO STATISTICS
|
|
- Pearl Hawkins
- 5 years ago
- Views:
Transcription
1 INTRODUCTION TO STATISTICS Pierre Dragicevic Oct 2017
2 WHAT YOU WILL LEARN Statistical theory Applied statistics This lecture
3 GOALS Learn basic intuitions and terminology Perform basic statistical inference with R Focus on high-level principles Accent on estimation rather than null hypothesis testing ("the New Statistics")
4 ORGANIZATION Part I - Elementary notions Part II - Tutorial with R Part III - Assignments
5 A DEFINITION Statistics is the study of the collection, analysis, interpretation, presentation and organization of data. Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP.
6 WHAT ARE STATS? A set of tools and methods With an old tradition: Origins in demographics Anchored in mathematics & probability theory Visual representations play a role A generally strong focus on (computationally cheap) numerical calculations
7 WHAT ARE STATS? Good for: - Summarizing data for presentation - Answering empirical questions rigorously - Making predictions - Making rational, evidence-based decisions - A long accumulated experience!
8 STATS & VISUALIZATION Exploratory data analysis is sometimes compared to detective work: it is the process of gathering evidence. Confirmatory data analysis is comparable to a court trial: it is the process of evaluating evidence. Exploratory analysis and confirmatory analysis can and should proceed side by side (Tukey; 1977). Quoted from the SAS Institute
9 German bombings in London during WWII
10 German bombings in London during WWII
11 German bombings in London during WWII
12 German bombings in London during WWII
13 STATISTICAL TOOLS DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
14 STATISTICAL TOOLS DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
15 AN EXAMPLE Selling encyclopedias Robert Steve Paul Roger Geoffrey Dan
16
17
18 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
19 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
20 CENTRAL TENDENCY From Kalid Azad
21 CENTRAL TENDENCY Mode Median Sales in euro
22 CENTRAL TENDENCY From Shreya Sethi
23 CENTRAL TENDENCY What is the best measure of central tendency? Income
24 DISPERSION Standard Deviation Image from Wikipedia
25 DEPENDENCE Correlation Source unknown
26 DEPENDENCE Correlation Image from Wikipedia
27 DEPENDENCE Correlation r = Seller 2 Seller 1
28 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
29 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
30
31
32 How much can we trust this chart? September 2014
33 LET US TRAVEL TO THE FUTURE
34 September 2014
35 October 2014
36 November 2014
37 December 2014
38 September 2014
39 October 2014
40 November 2014
41 December 2014
42 BACK TO THE PRESENT
43 September 2014
44 How much can we trust this chart? September 2014
45 STATISTICAL TOOLS DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
46 STATISTICAL INFERENCE
47 STATISTICAL INFERENCE Terminology: Sample vs. population Mean, median, standard deviation, correlation, etc: A sample statistic (e.g., M ) A population parameter (e.g., μ )
48 STATISTICAL INFERENCE Unit of statistical analysis = the thing that I m sampling from a larger population"
49 September 2014 What is the unit of statistical analysis?
50 SAMPLING ERROR From Lisa Sullivan
51 SAMPLING DISTRIBUTION "The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n. [ ] It may be considered as the distribution of the statistic for all possible samples from the same population of a given size"
52 SAMPLING DISTRIBUTION Demo
53 SAMPLING DISTRIBUTION
54 SAMPLING DISTRIBUTION But we don t know the population distribution!
55 SAMPLING DISTRIBUTION Resampling techniques Bootstrapping
56 SAMPLING ERROR Bootstrapping From Germain Salvato-Vallverdu
57 SAMPLING ERROR Bootstrapping From Germain Salvato-Vallverdu
58 SAMPLING ERROR Bootstrapping From Germain Salvato-Vallverdu
59 SAMPLING ERROR Bootstrapping resample with replacement From Germain Salvato-Vallverdu
60 SAMPLING ERROR Bootstrapping From Germain Salvato-Vallverdu
61 SAMPLING ERROR Bootstrapping Bootstrap distribution From Germain Salvato-Vallverdu
62 SAMPLING ERROR Bootstrapping Bootstrap distribution From Germain Salvato-Vallverdu
63 SAMPLING DISTRIBUTION How to summarize a sampling distribution?
64 SAMPLING DISTRIBUTION How to summarize a sampling distribution? With an error bar
65 SAMPLING DISTRIBUTION
66 SAMPLING DISTRIBUTION M Standard error 95% confidence interval
67 SAMPLING DISTRIBUTION How did people do before computers?
68 NORMAL DISTRIBUTION Sir Francis Galton Bean Machine or Galton Board:
69 NORMAL DISTRIBUTION Central Limit Theorem Given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed
70 NORMAL DISTRIBUTION Exact t-based confidence intervals t ~ 1.96 for large samples
71 CONFIDENCE INTERVALS
72 CONFIDENCE INTERVALS M True sampling distribution
73 CONFIDENCE INTERVALS M 95% confidence interval
74 tinyurl.com/danceptrial2 μ Different random samples
75 CONFIDENCE INTERVALS Several interpretations «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Examples of presentation formats: 2.2m, 95% CI [1.6m, 2.8m] 2.2m +/- 0.6m from 1.6m to 2.8m
76 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Drug A Drug B blood pressure
77 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Drug A Drug B blood pressure
78 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) A - B A - C A - D diff. in blood press.
79 CONFIDENCE INTERVALS values close to our M are the best bet for μ, and values closer to the limits of our CI are successively less good bets. (Cumming, 2013) Plausibility Confidence interval
80 BACK TO OUR EXAMPLE Selling encyclopedias
81
82