statistical computing in R

Size: px
Start display at page:

Download "statistical computing in R"

Transcription

1 statistical computing in R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop MCS 507 Lecture 37 Mathematical, Statistical and Scientific Software Jan Verschelde, 20 November 2013 Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

2 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

3 R R is a language and environment for statistical computing and graphics, released under the GNU General Public License. R is an implementation of the S language developed at Bell Labs by Rick Becker, John Chambers, and Allan Wilks. We run R explicitly in Sage via 1 the class R, do help(r); or 2 opening a Terminal Session with R, type r_console(); in a Sage terminal. rpy2 is a Python interface to R, at sourceforge, developed by Laurent Gautier. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

4 no multiprecision R is NOT a computer algebra package: > choose(5,2) [1] 10 > choose(52,5) [1] > choose(100,30) [1] e+25 Instead of displaying an exact integer of 25 decimal places, the outcome of choose switched to a float. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

5 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

6 estimating π We estimate x 2 dx = π 4 > u <- runif( ,min=0,max=1) > 4*mean(sqrt(1-u^2)) [1] To check, we can use integrate: using 1,000,000 samples: > integrand <- function(x) { sqrt(1-x^2) } > integrate(integrand,lower=0,upper=1) with absolute error < > pi/4 [1] Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

7 Monte Carlo integration We estimate Applied to b a π/2 0 f(x)dx by (b a) 1 n 4 sin(2x)e x2 dx: x [a,b] f(x). > u <- runif( ,min=0,max=pi/2) > pi/2*mean(4*sin(2*u)*exp(-u^2)) [1] Checking: > integrand <- function(x) { 4*sin(2*x)*exp(-x^2) } > integrate(integrand,lower=0,upper=pi/2) with absolute error < 2.4e-14 Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

8 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

9 r.plot in Sage After generating 10 points with coordinates x, y [0, 1], uniformly distributed, we make a plot: sage: filename = tmp_filename() +.png sage: r.png(filename= "%s" %filename) NULL sage: x = r("runif(10)") sage: y = r("runif(10)") sage: r.plot(x,y) null device 1 sage: filename /Users/jan/.sage//temp/getafix.local/333/\ /tmp_0.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

10 the plot in tmp_0.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

11 plotting to a specific file Plotting 10 random points: sage: filename = /Users/jan/Pictures/points.png sage: r.png(filename= "%s" % filename) NULL sage: x = r("runif(10)") sage: y = r("runif(10)") sage: r.plot(x,y) null device 1 sage: filename /Users/jan/Pictures/points.png We can see the picture as specified by filename. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

12 a density function Plotting the density function of the normal distribution: sage: filename = tmp_filename() +.png sage: r.png(filename= "%s" %filename) NULL sage: r.plot( dnorm,-3,+3) null device 1 sage: filename /Users/jan/.sage//temp/getafix.local/650/\ /tmp_0.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

13 the plot in tmp_0.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

14 a distribution function With pnorm we get the distribution function of the normal distribution, to plot it with Sage: sage: filename = tmp_filename() +.png sage: r.png(filename= "%s" %filename) NULL sage: r.plot( pnorm,-3,+3) null device 1 sage: filename /Users/jan/.sage//temp/getafix.local/650/\ /tmp_1.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

15 the plot in tmp_1.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

16 using rpy2 in python Plotting 10 random points: $ python Python (v2.7.3:70274d53c1dd, Apr , 20:52:43) [GCC (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more i >>> import rpy2.robjects as robjects >>> r = robjects.r >>> r.x11() rpy2.rinterface.null >>> r.plot(r.runif(10),r.runif(10)) rpy2.rinterface.null After r.x11() a new window pops up and the figure gets show in this window by the r.plot. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

17 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

18 sampled data We take 100 samples of the normal distribution. > R = rnorm(100) > R [1] [7] > sort(r) [1] [7] A summary of sorted data is in a stem and leaf plot. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

19 stem and leaf plot > stem(r) The decimal point is at the Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

20 histogram > h = hist(r) > h $breaks [1] [7] $counts [1] $intensities [1] [7] $density [1] [7] $mids [1] [7] Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

21 in a Sage session sage: r.quartz() # for OS X NULL sage: data = r.call( rnorm,100) sage: type(data) <class sage.interfaces.r.relement > sage: r.call( hist,data) $breaks [1] \ $counts [1] Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

22 the histogram plot Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

23 making a boxplot sage: r.quartz() NULL sage: r.call( boxplot,data) $stats [,1] [1,] [2,] [3,] [4,] [5,] Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

24 the boxplot Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

25 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

26 calling lm For x we generate 50 points in [0, 10] and y equals x plus some random noise: > x <- 10*runif(50) > y <- x + rnorm(50) > gx <- lm(y ~ x) lm is used to fit linear models, the method applied is QR decomposition. > gx$coefficients (Intercept) x Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

27 plotting the data sage: filename = tmp_filename() +.png sage: r.png(filename= "%s" %filename) NULL sage: x = r("x <- 10*runif(50)") sage: y = r("y <- x + rnorm(50)") sage: r.plot(x,y) null device 1 sage: filename /Users/jan/.sage//temp/getafix.local/1312/\ /tmp_0.png Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

28 the data points Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

29 the output of lm > summary(gx) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 48 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 48 DF, p-value: < 2.2e-16 Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

30 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

31 the package datasets Typing data() at the R prompt shows a list of data sets in the package datasets. The last one in the list is women, which contains "Average Heights and Weights for American Women" The output of help(women) shows that the data set appears to have been taken from the American Society of Actuaries for some unknown year earlier than Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

32 printing the entire data set > print(women) height weight Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

33 a summary of the data > summary(women) height weight Min. :58.0 Min. : st Qu.:61.5 1st Qu.:124.5 Median :65.0 Median :135.0 Mean :65.0 Mean : rd Qu.:68.5 3rd Qu.:148.0 Max. :72.0 Max. :164.0 The correlation matrix: > cor(women) height weight height weight To make a scatterplot: > pairs Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

34 a scatterplot Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

35 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

36 data about tornadoes NOAA = National Oceanic and Atmospheric Administration NOAA s National Weather Service provides free data. Downloaded from the file 2011_torn.csv is a comma separated file with data about tornadoes in The data file comes without a header in its first row. Information about the data in the columns is provided in a separate.pdf file. The file 2011_torn.csv with an extra header row is renamed into tornadoes.csv. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

37 reading data into R > torn <- read.csv("tornadoes.csv") > attributes(torn) $names [1] "number" "year" "month" [4] "day" "date" "time" [7] "timezone" "state" "statefips" [10] "statenum" "scale" "injuries" [13] "fatalities" "loss" "croploss" [16] "startinglatitude" "startinglongitude" "endinglatitude" [19] "endinglongitude" "length" "width" [22] "statesaffected" "statenum.1" "tornsegment" [25] "county1fips" "county2fips" "county3fips" [28] "county4fips" These names are the headers for the 28 columns. There are 1778 tornadoes recorded on file. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

38 selecting data > torn[,8] shows all states in column 8 > torn[8,] shows row 8. > torn[8,8] shows the state: row 8 and column 8. To select specific rows and columns: > torn[c(5,20),c(3,8,11)] month state scale 5 1 MS HI 0 > hist(torn[,11]) Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

39 a histogram of the scales of tornadoes Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

40 correlating scale, injuries, fatalities, and loss > cor(torn[,c(11,12,13,14)]) scale injuries fatalities loss scale injuries fatalities loss Plotting starting latitude and longitudes: > plot(torn[,c(16,17)]) Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

41 starting latitudes and longitudes Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

42 statistical computing with R 1 R in Sage the language and environment R Monte Carlo integration making plots with R in Sage 2 Statistical Computing with R histograms of data fitting linear models 3 Data Sets exploring available data sets looking at a downloaded data set 4 Big Data R and Hadoop Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

43 Large Data Files Process big data with parallel and distributed computing: MySQL scales well. overview of Hadoop: Hadoop has become the de facto standard for processing big data. Hadoop is used by Facebook to store photos, by LinkedIn to generate recommendations, and by Amazon to generate search indices. Hadoop works by connecting many different computers, hiding the complexity from the user: work with one giant computer. Hadoop uses a model called Map/Reduce. RHadoop by Antonio Piccolboni is a project for R and Hadoop, hosted at Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

44 the Map/Reduce model Goal: help with writing efficient parallel programs. Common data processing tasks: filtering, merging, aggregating data and many (not all) machine learning algorithms fit into Map/Reduce. Two steps: Map: Tasks read in input data as a set of records, process the records, and send groups of similar records to reducers. The mapper extracts a key from each input record. Hadoop will then route all records with the same key to the same reducer. Reduce: Tasks read in a set of related records, process the records, and write the results. The reducer iterates through all results for the same key, processing the data and writing out the results. Strength: the map and reduce steps run well in parallel. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

45 example: predicting user behavior Problem: how likely is a user to purchase an item from a website? Suppose we have already computed (maybe using Map/Reduce) a set of variables describing each user: most common locations, the number of pages viewed, the number of purchases made in the past. Calculate forecast using random forests: Random forests work by calculating a set of regression trees and then averaging them together to create a single model. It can be time consuming to fit the random trees to the data, but each new tree can be calculated independently. One way to tackle this problem is to use a set of map tasks to generate random trees, and then send the models to a single reducer task to average the results and produce the model. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

46 Summary + Exercises Available at Joseph Adler: R in a Nutshell. 2nd edition, O Reilly, Richard Cotton: Learning R. A Step-by-Step Function Guide to Data Analysis. O Reilly, Write a wrapper around r.plot which takes two arguments: a string with the argument for r.plot and the name of the file for the plot. 2 Annual U.S. Killer Tornado Statistics are listed at Use python to get the annual data from the web page and into R data sets, making one data set for every year. Project Three due Wednesday 27 November at 9AM. Homework Six is due on Wednesday 4 December at 9AM: ex 3 of L-27; ex 2 of L-28; ex 1 of L-29; ex 1 of L-30; ex 2 of L-31; ex 3 of L-32; ex 1 of L-33; ex 2 of L-34; ex 1 of L-35; and ex 1 of L-36. Scientific Software (MCS 507 L-37) statistical computing with R 20 November / 46

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

SPSS 14: quick guide

SPSS 14: quick guide SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

More Multiple Regression

More Multiple Regression More Multiple Regression Model Building The main difference between multiple and simple regression is that, now that we have so many predictors to deal with, the concept of "model building" must be considered

More information

Analytical Capability Security Compute Ease Data Scale Price Users Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs.

More information

1. Contingency Table (Cross Tabulation Table)

1. Contingency Table (Cross Tabulation Table) II. Descriptive Statistics C. Bivariate Data In this section Contingency Table (Cross Tabulation Table) Box and Whisker Plot Line Graph Scatter Plot 1. Contingency Table (Cross Tabulation Table) Bivariate

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

SPSS Guide Page 1 of 13

SPSS Guide Page 1 of 13 SPSS Guide Page 1 of 13 A Guide to SPSS for Public Affairs Students This is intended as a handy how-to guide for most of what you might want to do in SPSS. First, here is what a typical data set might

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

4.3 Nonparametric Tests cont...

4.3 Nonparametric Tests cont... Class #14 Wednesday 2 March 2011 What did we cover last time? Hypothesis Testing Types Student s t-test - practical equations Effective degrees of freedom Parametric Tests Chi squared test Kolmogorov-Smirnov

More information

Checklist before you buy software for questionnaire based surveys

Checklist before you buy software for questionnaire based surveys Checklist before you buy for questionnaire based surveys Continuity, stability and development...2 Capacity...2 Safety and security...2 Types of surveys...3 Types of questions...3 Interview functions (Also

More information

Core vs NYS Standards

Core vs NYS Standards Core vs NYS Standards Grade 5 Core NYS Operations and Algebraic Thinking -------------------------------------------------------------------------- 5.OA Write / Interpret Numerical Expressions Use ( ),

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

Data from a dataset of air pollution in US cities. Seven variables were recorded for 41 cities:

Data from a dataset of air pollution in US cities. Seven variables were recorded for 41 cities: Master of Supply Chain, Transport and Mobility - Data Analysis on Transport and Logistics - Course 16-17 - Partial Exam Lecturer: Lidia Montero November, 10th 2016 Problem 1: All questions account for

More information

Using R for Introductory Statistics

Using R for Introductory Statistics R http://www.r-project.org Using R for Introductory Statistics John Verzani CUNY/the College of Staten Island January 6, 2009 http://www.math.csi.cuny.edu/verzani/r/ams-maa-jan-09.pdf John Verzani (CSI)

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

Creative Commons Attribution-NonCommercial-Share Alike License

Creative Commons Attribution-NonCommercial-Share Alike License Author: Brenda Gunderson, Ph.D., 2015 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution- NonCommercial-Share Alike 3.0 Unported License:

More information

Loblolly Example: Correlation. This note explores estimating correlation of a sample taken from a bivariate normal distribution.

Loblolly Example: Correlation. This note explores estimating correlation of a sample taken from a bivariate normal distribution. Math 3080 1. Treibergs Loblolly Example: Correlation Name: Example Feb. 26, 2014 This note explores estimating correlation of a sample taken from a bivariate normal distribution. This data is taken from

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

Mathematical Computing

Mathematical Computing Mathematical Computing IMT2b2β/MSP3b9β Department of Mathematics University of Ruhuna A.W.L. Pubudu Thilan Department of Mathematics University of Ruhuna, Mathematical Computing, IMT2b2β/MSP3b9β 1/77 Method

More information

Mathematical Computing

Mathematical Computing Mathematical Computing IMT2b2β Department of Mathematics University of Ruhuna A.W.L. Pubudu Thilan Department of Mathematics University of Ruhuna, Mathematical Computing, IMT2b2β 1/76 Method of evaluation

More information

Incorporating Predictive Models for Operational Intelligence

Incorporating Predictive Models for Operational Intelligence Incorporating Predictive Models for Operational Intelligence Presented by Curt Hertler Partner Solutions Architect, OSIsoft Copyright 2015 OSIsoft, LLC History The only has thing a way new of repeating

More information

ADVANCED DATA ANALYTICS

ADVANCED DATA ANALYTICS ADVANCED DATA ANALYTICS MBB essay by Marcel Suszka 17 AUGUSTUS 2018 PROJECTSONE De Corridor 12L 3621 ZB Breukelen MBB Essay Advanced Data Analytics Outline This essay is about a statistical research for

More information

LIR 832: MINITAB WORKSHOP

LIR 832: MINITAB WORKSHOP LIR 832: MINITAB WORKSHOP Opening Minitab Minitab will be in the Start Menu under Net Apps. Opening the Data Go to the following web site: http://www.msu.edu/course/lir/832/datasets.htm Right-click and

More information

Machine Learning Models for Sales Time Series Forecasting

Machine Learning Models for Sales Time Series Forecasting Article Machine Learning Models for Sales Time Series Forecasting Bohdan M. Pavlyshenko SoftServe, Inc., Ivan Franko National University of Lviv * Correspondence: bpavl@softserveinc.com, b.pavlyshenko@gmail.com

More information

Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,

More information

Risk Simulation in Project Management System

Risk Simulation in Project Management System Risk Simulation in Project Management System Anatoliy Antonov, Vladimir Nikolov, Yanka Yanakieva Abstract: The Risk Management System of a Project Management System should be able to simulate, to evaluate

More information

REPORTING CATEGORY I NUMBER, OPERATION, QUANTITATIVE REASONING

REPORTING CATEGORY I NUMBER, OPERATION, QUANTITATIVE REASONING Texas 6 th Grade STAAR (TEKS)/ Correlation Skills (w/ STAAR updates) Stretch REPORTING CATEGORY I NUMBER, OPERATION, QUANTITATIVE REASONING (6.1) Number, operation, and quantitative reasoning. The student

More information

Using SPSS for Linear Regression

Using SPSS for Linear Regression Using SPSS for Linear Regression This tutorial will show you how to use SPSS version 12.0 to perform linear regression. You will use SPSS to determine the linear regression equation. This tutorial assumes

More information

Background Information. Instructions. Problem Statement. HOMEWORK HELP PROJECT INSTRUCTIONS Homework #3 Help Charleston Federal Grant Problem

Background Information. Instructions. Problem Statement. HOMEWORK HELP PROJECT INSTRUCTIONS Homework #3 Help Charleston Federal Grant Problem Background Information As one might expect, it is generally in a city s best financial interest to have more residents. A larger population generally yields a larger tax base and more income. There is

More information

Unravelling Airbnb Predicting Price for New Listing

Unravelling Airbnb Predicting Price for New Listing Unravelling Airbnb Predicting Price for New Listing Paridhi Choudhary H John Heinz III College Carnegie Mellon University Pittsburgh, PA 15213 paridhic@andrew.cmu.edu Aniket Jain H John Heinz III College

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

MATH 2070 Final Exam Mixed Practice (MISSING 6.5 & 6.6)

MATH 2070 Final Exam Mixed Practice (MISSING 6.5 & 6.6) The profit for Scott s Scooters can be estimated by P(x, y) = 0.4x + 0.1y + 0.3xy + 2y + 1.5x hundred dollars, where x is the number of mopeds sold and y is the number of dirt bikes sold. (Check: P(3,2)

More information

STAT 350 (Spring 2016) Homework 12 Online 1

STAT 350 (Spring 2016) Homework 12 Online 1 STAT 350 (Spring 2016) Homework 12 Online 1 1. In simple linear regression, both the t and F tests can be used as model utility tests. 2. The sample correlation coefficient is a measure of the strength

More information

Statistics, Data Analysis, and Decision Modeling

Statistics, Data Analysis, and Decision Modeling - ' 'li* Statistics, Data Analysis, and Decision Modeling T H I R D E D I T I O N James R. Evans University of Cincinnati PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Preface xv

More information

Quantitative Methods

Quantitative Methods THE ASSOCIATION OF BUSINESS EXECUTIVES DIPLOMA PART 2 QM Quantitative Methods afternoon 4 June 2003 1 Time allowed: 3 hours. 2 Answer any FOUR questions. 3 All questions carry 25 marks. Marks for subdivisions

More information

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer Lesson: Quantitative Data Analysis part -I Research Methodology - COMC/CMOE/ COMT 41543 The Four Windows: Data Editor Data Editor Spreadsheet-like system for defining, entering, editing, and displaying

More information

Business Data Analytics

Business Data Analytics MTAT.03.319 Business Data Analytics Lecture 4 Marketing and Sales Customer Lifecycle Management: Regression Problems Customer lifecycle Customer lifecycle Moving companies grow not because they force people

More information

Chapter 5 Regression

Chapter 5 Regression Chapter 5 Regression Topics to be covered in this chapter: Regression Fitted Line Plots Residual Plots Regression The scatterplot below shows that there is a linear relationship between the percent x of

More information

Business Quantitative Analysis [QU1] Examination Blueprint

Business Quantitative Analysis [QU1] Examination Blueprint Business Quantitative Analysis [QU1] Examination Blueprint 2014-2015 Purpose The Business Quantitative Analysis [QU1] examination has been constructed using an examination blueprint. The blueprint, also

More information

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score 1 40 2 25 3 28 USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.4 Advanced analytics at your hands Today, most organizations are stuck at lower-value descriptive analytics. But more sophisticated analysis can bring great business value. TARGET APPLICATIONS Business

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system

More information

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES by MICHAEL MCCANTS B.A., WINONA STATE UNIVERSITY, 2007 B.S., WINONA STATE UNIVERSITY, 2008 A THESIS submitted in partial

More information

Spreadsheets for Accounting

Spreadsheets for Accounting Osborne Books Tutor Zone Spreadsheets for Accounting Practice material 1 Osborne Books Limited, 2016 2 s p r e a d s h e e t s f o r a c c o u n t i n g t u t o r z o n e S P R E A D S H E E T S F O R

More information

1. A/an is a mathematical statement that calculates a value. 2. Create a cell reference in a formula by typing in the cell name or

1. A/an is a mathematical statement that calculates a value. 2. Create a cell reference in a formula by typing in the cell name or Question 1 of 20 : Select the best answer for the question. 1. A/an is a mathematical statement that calculates a value. A. argument B. function C. order of operations D. formula Question 2 of 20 : Select

More information

Business Analytics using R

Business Analytics using R Business Analytics using R 64 hours 12 weekends 3 hours/day www.greycampus.com Program Overview The future to decision making is data and no industry will remain untouched by it. However, data comes with

More information

Solving Transportation Logistics Problems Using Advanced Evolutionary Optimization

Solving Transportation Logistics Problems Using Advanced Evolutionary Optimization Solving Transportation Logistics Problems Using Advanced Evolutionary Optimization Transportation logistics problems and many analogous problems are usually too complicated and difficult for standard Linear

More information

ISE480 Sequencing and Scheduling

ISE480 Sequencing and Scheduling ISE480 Sequencing and Scheduling INTRODUCTION ISE480 Sequencing and Scheduling 2012 2013 Spring term What is Scheduling About? Planning (deciding what to do) and scheduling (setting an order and time for

More information

Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Washington Mathematics Standards for Grade 7

Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Washington Mathematics Standards for Grade 7 Grade 7 7.1. Core Content: Rational numbers and linear equations (Numbers, Operations, Algebra) 7.1.A Compare and order rational numbers using the number line, lists, and the symbols , or =. 7.1.B

More information

Chapter 10 Regression Analysis

Chapter 10 Regression Analysis Chapter 10 Regression Analysis Goal: To become familiar with how to use Excel 2007/2010 for Correlation and Regression. Instructions: You will be using CORREL, FORECAST and Regression. CORREL and FORECAST

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Washington Mathematics Standards for Grade 7

Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Washington Mathematics Standards for Grade 7 Grade 7 Connected Mathematics 2, 7th Grade Units 2009 7.1. Core Content: Rational numbers and linear equations (Numbers, Operations, Algebra) 7.1.A Compare and order rational numbers using the number line,

More information

GUIDED PRACTICE: SPREADSHEET FORMATTING

GUIDED PRACTICE: SPREADSHEET FORMATTING Guided Practice: Spreadsheet Formatting Student Activity Student Name: Period: GUIDED PRACTICE: SPREADSHEET FORMATTING Directions: In this exercise, you will follow along with your teacher to enter and

More information

Statistics 201 Summary of Tools and Techniques

Statistics 201 Summary of Tools and Techniques Statistics 201 Summary of Tools and Techniques This document summarizes the many tools and techniques that you will be exposed to in STAT 201. The details of how to do these procedures is intentionally

More information

Solutions Implementation Guide

Solutions Implementation Guide Solutions Implementation Guide Salesforce, Winter 18 @salesforcedocs Last updated: November 30, 2017 Copyright 2000 2017 salesforce.com, inc. All rights reserved. Salesforce is a registered trademark of

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

Creating Simple Report from Excel

Creating Simple Report from Excel Creating Simple Report from Excel 1.1 Connect to Excel workbook 1. Select Connect Microsoft Excel. In the Open File dialog box, select the 2015 Sales.xlsx file. 2. The file will be loaded to Tableau, and

More information

RiskyProject Lite 7. Getting Started Guide. Intaver Institute Inc. Project Risk Management Software.

RiskyProject Lite 7. Getting Started Guide. Intaver Institute Inc. Project Risk Management Software. RiskyProject Lite 7 Project Risk Management Software Getting Started Guide Intaver Institute Inc. www.intaver.com email: info@intaver.com Chapter 1: Introduction to RiskyProject 3 What is RiskyProject?

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 2 Organizing Data

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 2 Organizing Data STA 2023 Module 2 Organizing Data Rev.F08 1 Learning Objectives Upon completing this module, you should be able to: 1. Classify variables and data as either qualitative or quantitative. 2. Distinguish

More information

Super-marketing. A Data Investigation. A note to teachers:

Super-marketing. A Data Investigation. A note to teachers: Super-marketing A Data Investigation A note to teachers: This is a simple data investigation requiring interpretation of data, completion of stem and leaf plots, generation of box plots and analysis of

More information

RiskyProject Lite 7. User s Guide. Intaver Institute Inc. Project Risk Management Software.

RiskyProject Lite 7. User s Guide. Intaver Institute Inc. Project Risk Management Software. RiskyProject Lite 7 Project Risk Management Software User s Guide Intaver Institute Inc. www.intaver.com email: info@intaver.com 2 COPYRIGHT Copyright 2017 Intaver Institute. All rights reserved. The information

More information

Accelerating Microsoft Office Excel 2010 with Windows HPC Server 2008 R2

Accelerating Microsoft Office Excel 2010 with Windows HPC Server 2008 R2 Accelerating Microsoft Office Excel 2010 with Windows HPC Server 2008 R2 Technical Overview Published January 2010 Abstract Microsoft Office Excel is a critical tool for business. As calculations and models

More information

Who Are My Best Customers?

Who Are My Best Customers? Technical report Who Are My Best Customers? Using SPSS to get greater value from your customer database Table of contents Introduction..............................................................2 Exploring

More information

ExpressMaintenance Release Notes

ExpressMaintenance Release Notes ExpressMaintenance Release Notes ExpressMaintenance Release 9 introduces a wealth exciting features. It includes many enhancements to the overall interface as well as powerful new features and options

More information

optislang 5 David Schneider Product manager

optislang 5 David Schneider Product manager optislang 5 David Schneider Product manager 1. Postprocessing 2. optislang for ANSYS 6. Algorithms 3. Customization 5. SPDM 4. Workflows 2 Update optislang Dynardo GmbH Postprocessing Predefined modes

More information

ECDL / ICDL Spreadsheets Sample Part-Tests

ECDL / ICDL Spreadsheets Sample Part-Tests The following are sample part-tests for. This sample part-test contains 16 questions giving a total of 16 marks. The actual certification test contains 32 questions giving a total of 32 marks. The candidate

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/318/5853/1107/dc1 Supporting Online Material for Hurricane Katrina s Carbon Footprint on U.S. Gulf Coast Forests Jeffrey Q. Chambers,* Jeremy I. Fisher, Hongcheng Zeng,

More information

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other The New Era of Credit Risk Modeling and Validation Presenter Boaz Galinson, V.P, Credit Risk Manager, Leumi Bank Boaz Galinson is Head of the Group Credit Risk Modeling and Measurement in the Risk Management

More information

Hardness Example: Tukey s Multiple Comparisons

Hardness Example: Tukey s Multiple Comparisons Math 3080 1. Treibergs Hardness Example: Tukey s Multiple Comparisons Name: Example January 25, 2016 This R c program explores Tukey s Honest Significant Differences test. This data was taken from An investigation

More information

Brian Macdonald Big Data & Analytics Specialist - Oracle

Brian Macdonald Big Data & Analytics Specialist - Oracle Brian Macdonald Big Data & Analytics Specialist - Oracle Improving Predictive Model Development Time with R and Oracle Big Data Discovery brian.macdonald@oracle.com Copyright 2015, Oracle and/or its affiliates.

More information

mbusa Instagram account report Jan 01, May 28, 2016

mbusa Instagram account report Jan 01, May 28, 2016 mbusa Instagram account report Jan 01, 2016 - May 28, 2016 mbusa / Audience Total Followers Count 1,116,679 25.37% Followers Change +245,517 Max. Followers Change 19,132 Jan 04 Avg. Followers Change +11,691.29

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

If you want to flag a question for later review, select the "Mark for review" button.

If you want to flag a question for later review, select the Mark for review button. Exam Number: 584002RR Lesson Name: Microsoft Excel 2016 Exam Guidelines: This exam is now available only in the online assessment system. If your study guide still contains an exam, that exam is no longer

More information

A is used to answer questions about the quantity of what is being measured. A quantitative variable is comprised of numeric values.

A is used to answer questions about the quantity of what is being measured. A quantitative variable is comprised of numeric values. Stats: Modeling the World Chapter 2 Chapter 2: Data What are data? In order to determine the context of data, consider the W s Who What (and in what units) When Where Why How There are two major ways to

More information

ACCOUNTING SOFTWARE Certification Program

ACCOUNTING SOFTWARE Certification Program FINAL CERTIFICATION AWARDED BY IMRTC - USA ACCOUNTING SOFTWARE Certification Program This training program is highly specialized with the duration of 72 Credit hours, where the program covers all the major

More information

Data Description U.S. Air Pollution Dataset: data("usairpollution", package = "HSAUR2")

Data Description U.S. Air Pollution Dataset: data(usairpollution, package = HSAUR2) MÀSTER DE SUPPLY CHAIN, TRANSPORT I MOBILITAT (UPC). CURS 14-15 Q1 FINAL EXAM Anàlisi de Dades de Transport i Logística (ADTL). (Data: 8/7/2015 17:00-19:00 h Lloc: Aula H2.5 Professor responsable: Lídia

More information

Sage 300 ERP Sage 300 ERP Intelligence Release Notes

Sage 300 ERP Sage 300 ERP Intelligence Release Notes Sage 300 ERP Intelligence Release Notes The software described in this document is protected by copyright, and may not be copied on any medium except as specifically authorized in the license or non disclosure

More information

Understanding Total Dissolved Solids in Selected Streams of the Independence Mountains of Nevada

Understanding Total Dissolved Solids in Selected Streams of the Independence Mountains of Nevada Understanding Total Dissolved Solids in Selected Streams of the Independence Mountains of Nevada Richard B. Shepard, Ph.D. January 12, 211 President, Applied Ecosystem Services, Inc., Troutdale, OR Summary

More information

Biostat Exam 10/7/03 Coverage: StatPrimer 1 4

Biostat Exam 10/7/03 Coverage: StatPrimer 1 4 Biostat Exam 10/7/03 Coverage: StatPrimer 1 4 Part A (Closed Book) INSTRUCTIONS Write your name in the usual location (back of last page, near the staple), and nowhere else. Turn in your Lab Workbook at

More information

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Sam Harris, SAS Institute Inc., Cary, NC Abstract Measures of market risk project the possible loss in value of a portfolio

More information

GRADE 7 MATHEMATICS CURRICULUM GUIDE

GRADE 7 MATHEMATICS CURRICULUM GUIDE GRADE 7 MATHEMATICS CURRICULUM GUIDE Loudoun County Public Schools 2010-2011 Complete scope, sequence, pacing and resources are available on the CD and will be available on the LCPS Intranet. Grade 7 Mathematics

More information

TIM/UNEX 270, Spring 2012 Homework 1

TIM/UNEX 270, Spring 2012 Homework 1 TIM/UNEX 270, Spring 2012 Homework 1 Prof. Kevin Ross, kross@soe.ucsc.edu April 11, 2012 Goals: Homework 1 starts out with a review of essential concepts from statistics in Problem 1 we will build on these

More information

Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R

Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R Why Advanced Analytics Companies that inject big data and analytics into their operations show productivity rates

More information

Data Viz Helping Your Staff Become a Data Wiz. Charlie Farah Director Market Development, Healthcare & Public Sector, Qlik APAC August 2017

Data Viz Helping Your Staff Become a Data Wiz. Charlie Farah Director Market Development, Healthcare & Public Sector, Qlik APAC August 2017 Data Viz Helping Your Staff Become a Data Wiz Charlie Farah Director Market Development, Healthcare & Public Sector, Qlik APAC August 2017 65%?% Organizations where managers would follow their gut feel

More information

Practices of Business Intelligence

Practices of Business Intelligence Tamkang University Practices of Business Intelligence II Tamkang University (Descriptive Analytics II: Business Intelligence and Data Warehousing) 1071BI05 MI4 (M2084) (2888) Wed, 7, 8 (14:10-16:00) (B217)

More information

STA Module 2A Organizing Data and Comparing Distributions (Part I)

STA Module 2A Organizing Data and Comparing Distributions (Part I) STA 2023 Module 2A Organizing Data and Comparing Distributions (Part I) 1 Learning Objectives Upon completing this module, you should be able to: 1. Classify variables and data as either qualitative or

More information

Learn What s New. Statistical Software

Learn What s New. Statistical Software Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Let Minitab s Assistant

More information

ICTCM 28th International Conference on Technology in Collegiate Mathematics

ICTCM 28th International Conference on Technology in Collegiate Mathematics MULTIVARIABLE SPREADSHEET MODELING AND SCIENTIFIC THINKING VIA STACKING BRICKS Scott A. Sinex Department of Physical Sciences & Engineering Prince George s Community College Largo, MD 20774 ssinex@pgcc.edu

More information

DATA ANALYTICS WITH R, EXCEL & TABLEAU

DATA ANALYTICS WITH R, EXCEL & TABLEAU Learn. Do. Earn. DATA ANALYTICS WITH R, EXCEL & TABLEAU COURSE DETAILS centers@acadgild.com www.acadgild.com 90360 10796 Brief About this Course Data is the foundation for technology-driven digital age.

More information

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) OUTLINE FOR THE POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) Module Subject Topics Learning outcomes Delivered by Exploratory & Visualization Framework Exploratory Data Collection and

More information

Forecast Scheduling Microsoft Project Online 2018

Forecast Scheduling Microsoft Project Online 2018 Forecast Scheduling Microsoft Project Online 2018 Best Practices for Real-World Projects Eric Uyttewaal, PMP President ProjectPro Corporation Published by ProjectPro Corporation Preface Publishing Notice

More information

Oracle Hyperion Planning for Interactive Users ( )

Oracle Hyperion Planning for Interactive Users ( ) Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 40291196 Oracle Hyperion Planning 11.1.2 for Interactive Users (11.1.2.3) Duration: 3 Days What you will learn This course will teach you

More information

CEE3710: Uncertainty Analysis in Engineering

CEE3710: Uncertainty Analysis in Engineering CEE3710: Uncertainty Analysis in Engineering Lecture 1 September 6, 2017 Why do we need Probability and Statistics?? What is Uncertainty Analysis?? Ex. Consider the average (mean) height of females by

More information

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES *

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Michael Gelman a, Yuriy Gorodnichenko b,c, Shachar Kariv b, Dmitri Koustas b, Matthew D. Shapiro c,d, Dan Silverman

More information