Data Mining Applications with R

Size: px
Start display at page:

Download "Data Mining Applications with R"

Transcription

1 Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO ELSEVIER SAN FRANCISCO SYDNEY TOKYO Academic Prcsi is in imprint of Elsevier

2 Contents Preface Acknowledgments Review Committee Foreword Chapter 1: Power Grid Data Analysis with R and Hadoop Introduction A Brief Overview of the Power Grid Introduction to MapReduce, Hadoop, and RHIPE MapReduce Hadoop RHIPE: R with Hadoop Other Parallel R Packages Power Grid Analytical Approach Data Preparation Exploratory Analysis and Data Cleaning Event Extraction Discussion and Conclusions 31 Appendix 32 References 34 xiii xv xvii xix Chapter 2: Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization Introduction Related Works Motivations and Requirements R Packages Requirements Probabilistic Framework of NB Classifiers Choosing the Model Estimating the Parameters Two-Dimensional Visualization System Design Choices Visualization Design 49 v

3 vi Contents 2.6 A Case Study: Text Classification Description of the Dataset Creating Document-Term Matrices Loading Existing Term-Document Matrices Running the Program Conclusions 59 Acknowledgments 60 References 60 Chapter 3: Discovery ofemergent Anthropology Using Text Mining, Topic Modeling, Network Analysis ofmicroblog Issues and Controversies in and Social Content Introduction How Many Messages and How Many Twitter-Users in the Sample? Who Is Writing All These Twitter Messages? Who Are the Influential Twitter-Users in This Sample? What Is the Community Structure of These Twitter-Users? What Were Twitter-Users Writing About During the Meeting? What Do the Twitter Messages Reveal About the Opinions of Their Authors? What Can Be Discovered in the Less Frequently Used Words in the Sample? What Are the Topics That Can Be Algorithmically Discovered in This Sample? Conclusion 88 References 91 Chapter 4: Text Mining and Network Analysis of Digital Libraries in R Introduction Dataset Preparation Manipulating the Document-Term Matrix The Document-Term Matrix Term Frequency-Inverse Document Frequency Exploring the Document-Term Matrix Clustering Content by Topics Using the LDA The Latent Dirichlet Allocation Learning the Various Distributions for LDA Using the Log-Likelihood for Model Validation Topics Representation Plotting the Topics Associations Using Similarity Between Documents to Explore Document Cohesion Computing Similarities Between Documents Using a Heatmap to Illustrate Clusters of Documents 109

4 Contents vii 4.6 Social Network Analysis of Authors Constructing the Network as a Graph Author Importance Using Centrality Measures Conclusion 115 References 115 Chapter 5: Recommender Systems in R Introduction Business Case Evaluation Collaborative Filtering Methods Latent Factor Collaborative Filtering Simplified Approach Roll Your Own Final Thoughts 149 References 151 Chapter 6: Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection Introduction/B ackground Business Problem Proposed Response Model Modeling Detail Data Collection Data Preprocessing Feature Construction Feature Selection Data Sampling for Training and Test Class Balancing Classifier (SVM) Prediction Result Model Evaluation Conclusion 177 References 178 Chapter 7: Caravan Insurance Customer Profile Modeling with R Introduction Data Description and Initial Exploratory Data Analysis Variable Correlations and Logistic Regression Analysis Classifier Models of Caravan Insurance Holders Overview of Model Building and Validating Review of Four Classifier Methods RP Model Bagging Ensemble 192

5 viii Contents Support Vector Machine LR Classification Comparison of Four Classifier Models: ROC and AUC Model Comparison: Recall-Precision, Accuracy-v-Cut-off, and Computation Times Discussion of Results and Conclusion 206 Appendix Appendix B Customer Profile Data-Frequency of Binary Appendix C Proportion of Caravan Insurance Holders vis-a-vis other A Details of the Full Data Set Variables 209 Values 212 Customer Profile Variables 220 Appendix D LR Model Details 222 Appendix E R Commands for Computation of ROC Curves for Each Model Using Validation Dataset 225 Appendix F Commands for Cross-Validation Analysis of Classifier Models 225 References 226 Chapter 8: Selecting Best Features for Predicting Bank Loan Default Introduction Business Problem Data Extraction Data Exploration and Preparation Null Value Detection Outlier Detection Missing Imputation Relevance Analysis Data Set Balancing Feature Selection Modeling Model Evaluation Finding and Model Deployment Lessons and Discussions 244 Appendix Selecting Best Features for Predicting Bank Loan Default 244 References 245 Chapter 9: A Choquet Integral Toolbox and Its Application in Customer Preference Analysis Introduction Background Aggregation Functions Choquet Integral Fuzzy Measure Representation Shapley Value and Interaction Index 252

6 Contents ix 9.3 Rfmtool Package Installation Toolbox Description Preference Analysis Example Case Study Traveler Preference Study and Hotel Management Data Collection and Experiment Design Model Evaluation Result Analysis Discussion Conclusions 270 References 271 Chapter 10: A Real-Time Property Value Index Based on Web Data Introduction Housing Prices and Indices A Data Mining Approach Data Capture Geocoding Price Evolution Real Estate Pricing Models Model 1: Hedonic Model Plus Smooth Term Model 2: GWR Plus a Smooth Term Relationship to Other Work Conclusion 295 Acknowledgments 295 References 295 Chapter 11: Predicting Seabed Hardness Using Random Forest in R Introduction Study Region and Data Processing Study Region Data Processing of Seabed Hardness Predictors Dataset Manipulation and Exploratory Analyses Features of the Dataset Exploratory Data Analyses Application of RF for Predicting Seabed Hardness Model Validation Using rfcv Optimal Predictive Model Application of the Optimal Predictive Model Discussion and Conclusions Selection of Relevant Predictors and the Consequences of Missing the Most Important Predictors Issues with Searching for the Most Accurate Predictive Model Using RF 323

7 x Contents Predictive Accuracy of RF and Prediction Maps of Seabed Hardness Limitations 325 Acknowledgments 326 Appendix AA Dataset of Seabed Hardness and 15 Predictors 326 Appendix BA R Function, if.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model 326 References 327 Chapter 12: Supervised Classification ofimages, Applied to Plankton Samples Using R and Zooimage Background Challenges Data Extraction and Exploration Data Preprocessing Modeling Model Evaluation Model Deployment Lessons, Discussion, and Conclusions 359 Acknowledgments 362 References 363 Chapter 13: Crime Analyses Using R Introduction Problem Definition Data Extraction Data Exploration and Preprocessing Visualizations Modeling Model Evaluation Discussions and Improvements 394 References 395 Chapter 14: Football Mining with R Introduction to the Case Study and Organization of the Analysis Background of the Analysis: The Italian Football Championship Data Extraction and Exploration Data Extraction Data Exploration Data Preprocessing Variable Importance Evaluation Composite Indicators Construction Model Development: Building Classifiers Learning Step 413

8 Contents xi Model Selection Model Refinement Model Deployment Concluding Remarks 430 Acknowledgments 431 References, 431 Chapter 15: Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization Introduction Data Extraction from PCAP to CSV File Data Importation from CSV File to R Dimension Reduction Via PCA Initial Data Exploration Via Graphs Variables Scaling and Samples Selection Clustering for Segmenting the FQDN Building Routing Table Thanks to Clustering Building Routing Table Thanks to Mixed Integer Linear Programming Building Routing Table Via a Heuristic Final Evaluation Conclusion 454 References 455 Index 457

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. Contents List of Figures xv Foreword xxiii Preface xxv Acknowledgments xxix Chapter

More information

From Profit Driven Business Analytics. Full book available for purchase here.

From Profit Driven Business Analytics. Full book available for purchase here. From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business

More information

Leveraging Analytics and. User Segmentation

Leveraging Analytics and. User Segmentation Freemium Economics Leveraging Analytics and User Segmentation to Drive Revenue Eric Benjamin Seufert ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

Strategic Marketing Planning

Strategic Marketing Planning Strategic Marketing Planning Second edition Colin Gilligan Emeritus Professor of Marketing Sheffield Hallam University and Visiting Professor, Newcastle Business School and Richard M. S. Wilson Emeritus

More information

Security Risk Management

Security Risk Management Security Risk Management Building an Information Security Risk Management Program from the Ground Up Evan Wheeler Technical Editor Kenneth Swick ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD

More information

Implementing Analytics

Implementing Analytics Implementing Analytics A Blueprint for Design, Development, and Adoption Nauman Sheikh ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan

More information

Exploring Engineering

Exploring Engineering Exploring Engineering An Introduction to Engineering and Design Third Edition Philip Kosky Robert Balmer William Keat George Wise ELSEVIER AMSTERDAM BOSTON HI'IDIU.HURG LONDON * NliW YORK OXFORD PARIS

More information

CONTENT STRATEGY AT WORK

CONTENT STRATEGY AT WORK CONTENT STRATEGY AT WORK REAL-WORLD STORIES TO STRENGTHEN EVERY INTERACTIVE PROJECT MARGOT BLOOMSTEIN WITH A FOREWORD BY KRISHNA HALVORSON %& && PT SFA/TPR AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD

More information

IFFICULT PROJECT: Andre A. Costin AMSTERDAM BOSTON HEIDELBERG LONDON OXFORD NEW YORK

IFFICULT PROJECT: Andre A. Costin AMSTERDAM BOSTON HEIDELBERG LONDON OXFORD NEW YORK IFFICULT PROJECT: Andre A. Costin ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON OXFORD NEW YORK PARIS * SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Butterworth-Heinmann is an imprint of Elsevier Contents

More information

Power Generation Technologies

Power Generation Technologies Power Generation Technologies Paul Breeze AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO ELSEVIER Newnes is an imprint of Elsevier Newnes Contents

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 1 Develop Models for Customers Likely to Churn Churn is a term used to indicate a customer leaving the service of one company in

More information

DATA ANALYTICS WITH R, EXCEL & TABLEAU

DATA ANALYTICS WITH R, EXCEL & TABLEAU Learn. Do. Earn. DATA ANALYTICS WITH R, EXCEL & TABLEAU COURSE DETAILS centers@acadgild.com www.acadgild.com 90360 10796 Brief About this Course Data is the foundation for technology-driven digital age.

More information

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Building the In-Demand Skills for Analytics and Data Science Course Outline

Building the In-Demand Skills for Analytics and Data Science Course Outline Day 1 Module 1 - Predictive Analytics Concepts What and Why of Predictive Analytics o Predictive Analytics Defined o Business Value of Predictive Analytics The Foundation for Predictive Analytics o Statistical

More information

Predicting the Odds of Getting Retweeted

Predicting the Odds of Getting Retweeted Predicting the Odds of Getting Retweeted Arun Mahendra Stanford University arunmahe@stanford.edu 1. Introduction Millions of people tweet every day about almost any topic imaginable, but only a small percent

More information

Engineering. Gas and Oil Reliability. Modeling and Analysis. Dr. Eduardo Calixto ELSEVIER

Engineering. Gas and Oil Reliability. Modeling and Analysis. Dr. Eduardo Calixto ELSEVIER Gas and Oil Reliability Engineering Modeling and Analysis Dr. Eduardo Calixto ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Gulf Professional

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 MATLAB 을이용한머신러닝 ( 기본 ) Senior Application Engineer 엄준상과장 2015 The MathWorks, Inc. 2 Machine Learning is Everywhere Solution is too complex for hand written rules or equations

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro

2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro 2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro Mia Stephens mia.stephens@jmp.com http://bit.ly/1uygw57 Copyright 2010 SAS Institute Inc. All rights reserved. Background TQM

More information

Thermodynamics of. Turbomachinery. Fluid Mechanics and. Sixth Edition. S. L. Dixon, B. Eng., Ph.D. University of Liverpool, C. A. Hall, Ph.D.

Thermodynamics of. Turbomachinery. Fluid Mechanics and. Sixth Edition. S. L. Dixon, B. Eng., Ph.D. University of Liverpool, C. A. Hall, Ph.D. Fluid Mechanics and Thermodynamics of Turbomachinery Sixth Edition S. L. Dixon, B. Eng., Ph.D. Honorary Senior Fellow, Department of Engineering, University of Liverpool, UK C. A. Hall, Ph.D. University

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

Handbook of Small Modular Nuclear

Handbook of Small Modular Nuclear Woodhead Publishing Series in Energy: Number 64 Handbook of Small Modular Nuclear Reactors Edited by Mario D. Carelli and Daniel T. Ingersoll WP ELSEVIER AMSTERDAM BOSTON CAMBRIDGE HEIDELBERG LONDON NEW

More information

Intelligence and. Vivek Kaie

Intelligence and. Vivek Kaie Enterprise Performance Intelligence and Decision Patterns Vivek Kaie /0\ CRC Press \CtJ Taylor & Francis Croup V- 'S Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an

More information

Marketing Communications in Tourism and Hospitality

Marketing Communications in Tourism and Hospitality Marketing Communications in Tourism and Hospitality This page intentionally left blank Marketing Communications in Tourism and Hospitality Concepts, Strategies and Cases Scott McCabe AMSTERDAM BOSTON HEIDELBERG

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Effective CRM Using Predictive Analytics

Effective CRM Using Predictive Analytics Effective CRM Using Predictive Analytics Effective CRM Using Predictive Analytics Antonios Chorianopoulos This edition first published 2016 2016 John Wiley & Sons, Ltd Registered Office John Wiley & Sons,

More information

Business Intelligence

Business Intelligence The Profit Impact of Business Intelligence Steve Williams Nancy Williams ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS. SAN DIEGO SAN FRANCISCO. SINGAPORE SYDNEY TOKYO Morgan Kaufmann

More information

BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS. Massimiliano Marcellino Bocconi University

BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS. Massimiliano Marcellino Bocconi University BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS Massimiliano Marcellino Bocconi University CES 2017 Seminar on the new generation of statisticians and

More information

TABLE OF CONTENTS ix

TABLE OF CONTENTS ix TABLE OF CONTENTS ix TABLE OF CONTENTS Page Certification Declaration Acknowledgement Research Publications Table of Contents Abbreviations List of Figures List of Tables List of Keywords Abstract i ii

More information

Applications of Machine Learning to Predict Yelp Ratings

Applications of Machine Learning to Predict Yelp Ratings Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer

More information

IT Architectures and Middleware

IT Architectures and Middleware IT Architectures and Middleware Second Edition Strategies for Building Large, Integrated Systems Chris Britton Peter Bye AAddison-Wesley TT Boston San Francisco New York Toronto Montreal London Munich

More information

Practical Application of Predictive Analytics Michael Porter

Practical Application of Predictive Analytics Michael Porter Practical Application of Predictive Analytics Michael Porter October 2013 Structure of a GLM Random Component observations Link Function combines observed factors linearly Systematic Component we solve

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

PROJECT MANAGEMENT. Systems, Principles, and Applications. Taylor & Francis Group Boca Raton London New York

PROJECT MANAGEMENT. Systems, Principles, and Applications. Taylor & Francis Group Boca Raton London New York PROJECT MANAGEMENT Systems, Principles, and Applications Adedeji B. Badiru C R C P r e s s Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa

More information

Software Metrics. Practical Approach. A Rigorous and. Norman Fenton. James Bieman THIRD EDITION. CRC Press CHAPMAN & HALIVCRC INNOVATIONS IN

Software Metrics. Practical Approach. A Rigorous and. Norman Fenton. James Bieman THIRD EDITION. CRC Press CHAPMAN & HALIVCRC INNOVATIONS IN CHAPMAN & HALIVCRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT Software Metrics A Rigorous and Practical Approach THIRD EDITION Norman Fenton Queen Mary University of London. UK James

More information

Brian Macdonald Big Data & Analytics Specialist - Oracle

Brian Macdonald Big Data & Analytics Specialist - Oracle Brian Macdonald Big Data & Analytics Specialist - Oracle Improving Predictive Model Development Time with R and Oracle Big Data Discovery brian.macdonald@oracle.com Copyright 2015, Oracle and/or its affiliates.

More information

Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi, Kost Elisevich. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets

Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi, Kost Elisevich. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi, Kost Elisevich Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction Prediction accuracy of practical

More information

Credit Scoring, Response Modelling and Insurance Rating

Credit Scoring, Response Modelling and Insurance Rating Credit Scoring, Response Modelling and Insurance Rating Also by Steven Finlay THE MANAGEMENT OF CONSUMER CREDIT CONSUMER CREDIT FUNDAMENTALS Credit Scoring, Response Modelling and Insurance Rating A Practical

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

Advanced Job Daimler. Julian Leweling, Daimler AG

Advanced Job Daimler. Julian Leweling, Daimler AG Advanced Job Analytics @ Daimler Julian Leweling, Agenda From Job Ads to Knowledge: Advanced Job Analytics @ Daimler About Why KNIME? Our Inspiration Use Case KNIME Walkthrough Application Next steps Advanced

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

Real Estate Modelling and Forecasting

Real Estate Modelling and Forecasting Real Estate Modelling and Forecasting Chris Brooks ICMA Centre, University of Reading Sotiris Tsolacos Property and Portfolio Research CAMBRIDGE UNIVERSITY PRESS Contents list of figures page x List of

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 19 1 Acknowledgement The following discussion is based on the paper Mining Big Data: Current Status, and Forecast to the Future by Fan and Bifet and online presentation

More information

DATA MINING AND BUSINESS ANALYTICS WITH R

DATA MINING AND BUSINESS ANALYTICS WITH R DATA MINING AND BUSINESS ANALYTICS WITH R DATA MINING AND BUSINESS ANALYTICS WITH R Johannes Ledolter Department of Management Sciences Tippie College of Business University of Iowa Iowa City, Iowa Copyright

More information

MARKETING RESEARCH AN APPLIED APPROACH FIFTH EDITION NARESH K. MALHOTRA DANIEL NUNAN DAVID F. BIRKS. W Pearson

MARKETING RESEARCH AN APPLIED APPROACH FIFTH EDITION NARESH K. MALHOTRA DANIEL NUNAN DAVID F. BIRKS. W Pearson MARKETING RESEARCH AN APPLIED APPROACH FIFTH EDITION NARESH K. MALHOTRA DANIEL NUNAN DAVID F. BIRKS W Pearson Marlow, England London New York Boston San Francisco Toronto Sydney Dubai Singapore Hong Kong

More information

IBM SPSS & Apache Spark

IBM SPSS & Apache Spark IBM SPSS & Apache Spark Making Big Data analytics easier and more accessible ramiro.rego@es.ibm.com @foreswearer 1 2016 IBM Corporation Modeler y Spark. Integration Infrastructure overview Spark, Hadoop

More information

Introduction to Logistics Systems Management

Introduction to Logistics Systems Management Introduction to Logistics Systems Management Second Edition Gianpaolo Ghiani Department of Innovation Engineering, University of Salento, Italy Gilbert Laporte HEC Montreal, Canada Roberto Musmanno Department

More information

Business Risk Management Handbook

Business Risk Management Handbook Business Risk Management Handbook A sustainable approach Linda Spedding Adam Rose i*" ""''SS^IH AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD ELSEVIER PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

Natural Resource and Environmental Economics

Natural Resource and Environmental Economics Natural Resource and Environmental Economics Third Edition Roger Perman Yue Ma James McGilvray Michael Common PEARSON Addison Wesley Harlow, England London New York Boston San Francisco Toronto Sydney

More information

Data mining and Renewable energy. Cindi Thompson

Data mining and Renewable energy. Cindi Thompson Data mining and Renewable energy Cindi Thompson June 2012 Analytics, Big Data, and Data Science 1 What is Analytics? makes extensive use of data, statistical and quantitative analysis, explanatory and

More information

Analytics for Banks. September 19, 2017

Analytics for Banks. September 19, 2017 Analytics for Banks September 19, 2017 Outline About AlgoAnalytics Problems we can solve for banks Our experience Technology Page 2 About AlgoAnalytics Analytics Consultancy Work at the intersection of

More information

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

Methodological challenges of Big Data for official statistics

Methodological challenges of Big Data for official statistics Methodological challenges of Big Data for official statistics Piet Daas Statistics Netherlands THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Content Big Data: properties

More information

Stock Price Prediction with Daily News

Stock Price Prediction with Daily News Stock Price Prediction with Daily News GU Jinshan MA Mingyu Derek MA Zhenyuan ZHOU Huakang 14110914D 14110562D 14111439D 15050698D 1 Contents 1. Work flow of the prediction tool 2. Model performance evaluation

More information

Multiple Attribute Decision Making

Multiple Attribute Decision Making Multiple Attribute Decision Making M E T H O D S AND A P P L I C A T I O N S Gwo-Hshiung Tzeng Jih-Jeng Huang CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the

More information

ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring

ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring 1 Kavyashree M Bandekar, 2 Maddala Tejasree, 3 Misba Sultana S N, 4 Nayana G K, 5 Harshavardhana Doddamani 1, 2, 3, 4 Engineering

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE

MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE Wala Abedalkhader and Noora Abdulrahman Department of Engineering Systems and Management, Masdar Institute of Science and Technology, Abu Dhabi, United

More information

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) OUTLINE FOR THE POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) Module Subject Topics Learning outcomes Delivered by Exploratory & Visualization Framework Exploratory Data Collection and

More information

Power Plants. Structural Alloys for. Operational Challenges and. High-temperature Materials. Edited by. Amir Shirzadi and Susan Jackson.

Power Plants. Structural Alloys for. Operational Challenges and. High-temperature Materials. Edited by. Amir Shirzadi and Susan Jackson. Woodhead Publishing Series in Energy: Number 45 Structural Alloys for Power Plants Operational Challenges and High-temperature Materials Edited by Amir Shirzadi and Susan Jackson AMSTERDAM BOSTON CAMBRIDGE

More information

Machine Learning Techniques For Particle Identification

Machine Learning Techniques For Particle Identification Machine Learning Techniques For Particle Identification 06.March.2018 I Waleed Esmail, Tobias Stockmanns, Michael Kunkel, James Ritman Institut für Kernphysik (IKP), Forschungszentrum Jülich Outlines:

More information

Aircraft Structures B H. for engineering students. T. H. G. Megson ELSEVIER SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Aircraft Structures B H. for engineering students. T. H. G. Megson ELSEVIER SAN FRANCISCO SINGAPORE SYDNEY TOKYO Aircraft Structures for engineering students Fifth Edition T. H. G. Megson Sag- ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Butterworth-Heinemann

More information

DETECTING COMMUNITIES BY SENTIMENT ANALYSIS

DETECTING COMMUNITIES BY SENTIMENT ANALYSIS DETECTING COMMUNITIES BY SENTIMENT ANALYSIS OF CONTROVERSIAL TOPICS SBP-BRiMS 2016 Kangwon Seo 1, Rong Pan 1, & Aleksey Panasyuk 2 1 Arizona State University 2 Air Force Research Lab July 1, 2016 OUTLINE

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

A Smart Tool to analyze the Salary trends of H1-B Workers

A Smart Tool to analyze the Salary trends of H1-B Workers 1 A Smart Tool to analyze the Salary trends of H1-B Workers Akshay Poosarla, Ramya Vellore Ramesh Under the guidance of Prof.Meiliu Lu Abstract Limiting the H1-B visas is bad news for skilled workers in

More information

New Customer Acquisition Strategy

New Customer Acquisition Strategy Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and

More information

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2007 Prentice Hall Chapter Objectives To explain how knowledge is discovered

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

BIOMEDICAL ENGINEERING ACADEMIC PRESS SERIES IN BIOMEDICAL ENGINEERING ELSEVIER ACADEMIC PRESS. "mmmmmm

BIOMEDICAL ENGINEERING ACADEMIC PRESS SERIES IN BIOMEDICAL ENGINEERING ELSEVIER ACADEMIC PRESS. mmmmmm ACADEMIC PRESS SERIES IN BIOMEDICAL ENGINEERING ELSEVIER ACADEMIC PRESS "mmmmmm vmnkmmwmmm'''mmmmmmmmmimmmmmmmmiinivmiv INTRODUCTION TO BIOMEDICAL ENGINEERING SECOND EDITION JOHN SUSAN foseph END ERIE

More information

Contents PREFACE 1 INTRODUCTION The Role of Scheduling The Scheduling Function in an Enterprise Outline of the Book 6

Contents PREFACE 1 INTRODUCTION The Role of Scheduling The Scheduling Function in an Enterprise Outline of the Book 6 Integre Technical Publishing Co., Inc. Pinedo July 9, 2001 4:31 p.m. front page v PREFACE xi 1 INTRODUCTION 1 1.1 The Role of Scheduling 1 1.2 The Scheduling Function in an Enterprise 4 1.3 Outline of

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

DATA SCIENCE: HYPE AND REALITY PATRICK HALL DATA SCIENCE: HYPE AND REALITY PATRICK HALL About me SAS Enterprise Miner, 2012 Cloudera Data Scientist, 2014 Do you use Kolmogorov Smirnov often? Statistician No, I mix my martinis with gin. Data Scientist

More information

Who Is Likely to Succeed: Predictive Modeling of the Journey from H-1B to Permanent US Work Visa

Who Is Likely to Succeed: Predictive Modeling of the Journey from H-1B to Permanent US Work Visa Who Is Likely to Succeed: Predictive Modeling of the Journey from H-1B to Shibbir Dripto Khan ABSTRACT The purpose of this Study is to help US employers and legislators predict which employees are most

More information

Application of Machine Learning to Financial Trading

Application of Machine Learning to Financial Trading Application of Machine Learning to Financial Trading January 2, 2015 Some slides borrowed from: Andrew Moore s lectures, Yaser Abu Mustafa s lectures About Us Our Goal : To use advanced mathematical and

More information

Modular Design for Machine Tools

Modular Design for Machine Tools Modular Design for Machine Tools Yoshimi Ito, Dr.-Eng., C.Eng., FIET Professor Emeritus Tokyo Institute of Technology Mc Graw Hill New York Chicago San Francisco Lisbon London Madrid Mexico City Milan

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Fundamentals of Preparatiue and Nonlinear Chromatography

Fundamentals of Preparatiue and Nonlinear Chromatography Fundamentals of Preparatiue and Nonlinear Chromatography Georges Guiochon University of Tennessee and Oak Ridge National Laboratory Distinguished Scientist Knoxville, Tennessee Sadroddin Golshan Shirazi

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.4 Advanced analytics at your hands Today, most organizations are stuck at lower-value descriptive analytics. But more sophisticated analysis can bring great business value. TARGET APPLICATIONS Business

More information

Mining Heterogeneous Urban Data at Multiple Granularity Layers

Mining Heterogeneous Urban Data at Multiple Granularity Layers Mining Heterogeneous Urban Data at Multiple Granularity Layers Antonio Attanasio Supervisor: Prof. Silvia Chiusano Co-supervisor: Prof. Tania Cerquitelli collection Urban data analytics Added value urban

More information

e-marketing Applications of information technology and the Internet within marketing Cor Molenaar Routledge Taylor & Francis Croup LONDON AND NEW YORK

e-marketing Applications of information technology and the Internet within marketing Cor Molenaar Routledge Taylor & Francis Croup LONDON AND NEW YORK e-marketing Applications of information technology and the Internet within marketing Cor Molenaar Routledge Taylor & Francis Croup LONDON AND NEW YORK Contents List of figures ix List of tables xi List

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

Natural Resource and Environmental Economics

Natural Resource and Environmental Economics Natural Resource and Environmental Economics Fourth Edition Roger Perman Yue Ma Michael Common David Maddison James McGilvray Addison Wesley is an imprint of Harlow, England London New York Boston San

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

Data Mining. Chapter 7: Score Functions for Data Mining Algorithms. Fall Ming Li

Data Mining. Chapter 7: Score Functions for Data Mining Algorithms. Fall Ming Li Data Mining Chapter 7: Score Functions for Data Mining Algorithms Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University The merit of score function Score function indicates

More information

New restaurants fail at a surprisingly

New restaurants fail at a surprisingly Predicting New Restaurant Success and Rating with Yelp Aileen Wang, William Zeng, Jessica Zhang Stanford University aileen15@stanford.edu, wizeng@stanford.edu, jzhang4@stanford.edu December 16, 2016 Abstract

More information

FINAL PROJECT REPORT IME672. Group Number 6

FINAL PROJECT REPORT IME672. Group Number 6 FINAL PROJECT REPORT IME672 Group Number 6 Ayushya Agarwal 14168 Rishabh Vaish 14553 Rohit Bansal 14564 Abhinav Sharma 14015 Dil Bag Singh 14222 Introduction Cell2Cell, The Churn Game. The cellular telephone

More information

DASI: Analytics in Practice and Academic Analytics Preparation

DASI: Analytics in Practice and Academic Analytics Preparation DASI: Analytics in Practice and Academic Analytics Preparation Mia Stephens mia.stephens@jmp.com Copyright 2010 SAS Institute Inc. All rights reserved. Background TQM Coordinator/Six Sigma MBB Founding

More information

Predictive Modelling for Customer Targeting A Banking Example

Predictive Modelling for Customer Targeting A Banking Example Predictive Modelling for Customer Targeting A Banking Example Pedro Ecija Serrano 11 September 2017 Customer Targeting What is it? Why should I care? How do I do it? 11 September 2017 2 What Is Customer

More information

Machine Learning Models for Sales Time Series Forecasting

Machine Learning Models for Sales Time Series Forecasting Article Machine Learning Models for Sales Time Series Forecasting Bohdan M. Pavlyshenko SoftServe, Inc., Ivan Franko National University of Lviv * Correspondence: bpavl@softserveinc.com, b.pavlyshenko@gmail.com

More information

Hotel Industry Demand Curves

Hotel Industry Demand Curves Cornell University School of Hotel Administration The Scholarly Commons Articles and Chapters School of Hotel Administration Collection 2012 Hotel Industry Demand Curves John B. Corgel Cornell University,

More information

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1 Data Analytics for Semiconductor Manufacturing 2016 The MathWorks, Inc. 1 Competitive Advantage What do we mean by Data Analytics? Analytics uses data to drive decision making, rather than gut feel or

More information

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow haoyeec@stanford.edu ac1408@stanford.edu Problem Statement It is often said that stock prices are determined

More information

Analytical Capability Security Compute Ease Data Scale Price Users Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs.

More information