INTRODUCTION TO STATISTICS
|
|
- Aldous Shelton
- 5 years ago
- Views:
Transcription
1 INTRODUCTION TO STATISTICS Slides by Pierre Dragicevic
2 WHAT YOU WILL LEARN Statistical theory Applied statistics This lecture
3 GOALS Learn basic intuitions and terminology Perform basic statistical inference with R Focus on high-level aspects Accent on estimation rather than hypothesis testing ("the New Statistics")
4 ORGANIZATION Part I - Elementary notions Part II - Tutorial with R Part III - Assignments
5 A DEFINITION Statistics is the study of the collection, analysis, interpretation, presentation and organization of data. Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP.
6 ORIGINS 1750s German Statistik analysis of data about the state Quickly adopted in England (previously called political arithmetics ) Achenwall, Gottfried: Abriß der neuesten Staatswissenschaft der vornehmsten Europäischen Reiche und Republicken. Göttingen, 1749
7 ORIGINS John Graunt, 1662 Observations on the bills of mortality
8
9
10 ORIGINS John Graunt, 1662 Observations on the bills of mortality First life tables Dispelled several myths about the plague First analysis of sex ratio First realistic estimate of the population in London
11 ORIGINS Prompted collection of more data Parallel developments in probability theory Statistics then developed into a more rigorous discipline and was applied to: Business & industry Medicine Science...
12 STATS & VISUALIZATION Statistical Charts William Playfair
13 STATS & VISUALIZATION Exploratory Data Analysis Tukey, 1977
14
15
16 Statistical Graphics AT&T Bell Labs Video, 1985
17
18
19
20 German bombings in London during WWII
21 German bombings in London during WWII
22 German bombings in London during WWII
23 STATS & VISUALIZATION Confirmatory data analysis For answering questions rigorously Example: is this new drug effective? Strong focus on automatic procedures, computation and objectivity Looking at data can impair objectivity: Cherry picking, snooping, fishing, data mining
24 German bombings in London during WWII
25 Exploratory analysis and confirmatory analysis can and should proceed side by side (Tukey; 1977). Quoted from the SAS Institute STATS & VISUALIZATION Exploratory data analysis is sometimes compared to detective work: it is the process of gathering evidence. Confirmatory data analysis is comparable to a court trial: it is the process of evaluating evidence.
26 WHAT ARE STATS? A set of tools and methods With an old tradition: Origins in demographics Anchored in mathematics & probability theory Visual representations play a role A generally strong focus on (computationally cheap) numerical calculations
27 WHAT ARE STATS? Good for: - Summarizing data for presentation - Answering questions rigorously - Making predictions - Making rational, evidence-based decisions - A long accumulated experience!
28 STATISTICAL TOOLS
29 STATISTICAL TOOLS DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
30 STATISTICAL TOOLS DESCRIPTIVE STATISTICS
31 AN EXAMPLE Selling encyclopedias Robert Steve Paul Roger Geoffrey Dan
32
33
34 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
35 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
36 CENTRAL TENDENCY From Kalid Azad
37 CENTRAL TENDENCY Mode Median day Sales in euro
38 CENTRAL TENDENCY When are the mean and the median equal? When do they differ?
39 CENTRAL TENDENCY negative skew symmetric positive skew From Shreya Sethi
40 CENTRAL TENDENCY
41 CENTRAL TENDENCY
42 CENTRAL TENDENCY From Shreya Sethi
43 CENTRAL TENDENCY What is the best measure of central tendency? Income
44 DISPERSION Standard Deviation Image from Wikipedia
45 Source unknown DEPENDENCE Correlation
46 DEPENDENCE Correlation Image from Wikipedia
47 Seller 2 DEPENDENCE Correlation r = Seller 1
48 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
49 Seller 1 Seller 2 Seller 3 Seller 4 Seller 5 Seller 6
50
51
52 September 2016 How much can we trust this chart?
53 LET US TRAVEL TO THE FUTURE
54 September 2016
55 October 2016
56 November 2016
57 December 2016
58 September 2016
59 October 2016
60 November 2016
61 December 2016
62 BACK TO THE PRESENT
63 September 2016
64 September 2016 How much can we trust this chart?
65 STATISTICAL TOOLS INFERENTIAL STATISTICS
66 STATISTICAL INFERENCE
67 STATISTICAL INFERENCE
68 STATISTICAL INFERENCE Terminology: Sample vs. population Mean, median, standard deviation, correlation, etc: A sample statistic A population parameter
69 STATISTICAL INFERENCE Unit of statistical analysis = the thing that I m sampling from a larger population"
70 STATISTICAL INFERENCE Unit of statistical analysis
71 STATISTICAL INFERENCE Unit of statistical analysis
72 STATISTICAL INFERENCE Unit of statistical analysis
73 STATISTICAL INFERENCE Unit of statistical analysis
74 STATISTICAL INFERENCE Unit of statistical analysis
75 SAMPLING DISTRIBUTION
76 SAMPLING DISTRIBUTION "The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n. [ ] It may be considered as the distribution of the statistic for all possible samples from the same population of a given size"
77 SAMPLING DISTRIBUTION From Lisa Sullivan
78 SAMPLING DISTRIBUTION Demo
79 SAMPLING DISTRIBUTION
80 SAMPLING DISTRIBUTION Standard error 95% confidence interval
81 SAMPLING DISTRIBUTION
82 SAMPLING DISTRIBUTION Resampling techniques Bootstrapping
83 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping
84 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping
85 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping
86 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping resample with replacement
87 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping
88 From Germain Salvato-Vallverdu SAMPLING ERROR Bootstrapping
89 SAMPLING DISTRIBUTION Bootstrapping video
90 SAMPLING DISTRIBUTION How did people this do before computers?
91 MORE HISTORY Abraham De Moivre
92 MORE HISTORY Abraham De Moivre
93 MORE HISTORY Abraham De Moivre
94 MORE HISTORY Abraham De Moivre
95 MORE HISTORY Abraham De Moivre
96 MORE HISTORY Abraham De Moivre
97 MORE HISTORY
98 NORMAL DISTRIBUTION
99 NORMAL DISTRIBUTION Sir Francis Galton Bean Machine or Galton Board:
100 NORMAL DISTRIBUTION Central Limit Theorem Given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed
101 NORMAL DISTRIBUTION Exact Confidence Intervals t ~ 1.96 for large samples
102 CONFIDENCE INTERVALS
103 CONFIDENCE INTERVALS M margin of error = length of blue line
104 CONFIDENCE INTERVALS M 95% confidence interval
105 tinyurl.com/danceptrial2 μ Different random samples
106 CONFIDENCE INTERVALS Several interpretations «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Examples of presentation formats: 2.2m, 95% CI [1.6m, 2.8m] 2.2m +/- 0.6m from 1.6m to 2.8m
107 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Pill A Pill B weight loss
108 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) Pill A Pill B weight loss
109 CONFIDENCE INTERVALS «a range of plausible values for μ. Values outside the CI are relatively implausible.» (Cumming and Finch, 2005) A - B A - C A - D diff. in weight loss
110 CONFIDENCE INTERVALS values close to our M are the best bet for μ, and values closer to the limits of our CI are successively less good bets. (Cumming, 2013)
111 BE VAGUE IN YOUR DESCRIPTION A - B A - C A - D diff. in weight loss the figure provides good evidence that B outperforms A, whereas C and A seem very similar, and results are largely inconclusive concerning the difference between D and A.
112 PUBLISHED EXAMPLE
113 PUBLISHED EXAMPLE
114 BACK TO OUR EXAMPLE Selling encyclopedias
115
116
117
118
119