The Hunt for the Data Scientist GIEWEE HAMMOND MSCAN, MSCAS LEAD DATA SCIENTIST, ARAMCO SERVICES COMPANY

Size: px
Start display at page:

Download "The Hunt for the Data Scientist GIEWEE HAMMOND MSCAN, MSCAS LEAD DATA SCIENTIST, ARAMCO SERVICES COMPANY"

Transcription

1 The Hunt for the Data Scientist GIEWEE HAMMOND MSCAN, MSCAS LEAD DATA SCIENTIST, ARAMCO SERVICES COMPANY

2 Overview Highlight that data science has domain specific specialties To provide clarity on what human resources may be needed for your data science projects To provide a foundation of what is necessary to begin a data science team 2

3 Outline Data Scientist vs Data Analyst The origin of data science The need for a Data Scientist Distinction: Selecting the right Data Scientist(s) Consequences of not choosing the right Data Scientist(s) Where to find the right Data Scientists 3

4 Data Analyst Data Scientist vs. Data Analyst Business Administration Domain specific; responsible for understanding, analyzing, and documenting/reporting business processes, intermediary statistics, BI tools Data Exploration Analysis & Insight Deriving a story from the data, asking more questions Advanced Algorithms & Machine Learning Going beyond what Is obvious to distill a problem into a set of distinct hypothesis Data Product Engineering Data Scientist Enhancing an existing software or developing a new software that solves data problems and creates value 4

5 The origin of data science Data Collation Computer Science Information Science Mathematics Data Visualization 5

6 Reasons data science projects fail Lack of software access Wrong technology Missing data science leader Unreceptive company culture Wrong Data Scientist 6

7 Consequences of not selecting the right Data Scientist Solve the wrong problem Lack of data integrity Lack of data availability Unscalable solution 7

8 The need for a Data Scientist E&P Operator A 90-day production rates from horizontal wells increased by 250% Costs to drill, complete, and operate its well have fallen by as much as 40% E&P Operator B Any prospective oil field allows geologist to easily compare potential drilling sites and provide better accuracy when recommending drilling locations 8

9 9 Distinction: Selecting the right Data Scientists

10 Ux Strategy Application Data Scientist Developer and #1Design Parallel Computing BA or MS MS BS Ph.D. or / MS PhD Information STEM STEM Design Field Graphic Design usercentered design Project and Management product development Domain Expertise R + Python Hadoop + Spark JAVA + HTML5 10

11 Data AI Researcher Data OR Analyst Scientist Scientist Engineer#2 Strong Deep Supply ETL Learning Chain Knowledge MS or PhD Ph.D. PhD Operations Computer Artificial intelligence Research Science Optimization Optimization Data Mining + Data Mining Domain Expertise SAS + R + Python API + Knowledge MATLAB Enterprise Systems SQL 11

12 To Begin.. Data Engineer Data Storage & Scalability Source Data Store Data Convert & ETL Data Scientist Machine learning tools Transform Data Exploratory Analysis Application Developer Developer tools Model building & uncovering insights Visualization Production 12

13 Finding the right Data Scientists Competitions / Repositories Kaggle CrowdAnalytix Github Buddy Training Courses Coursera / Datacamp Galvanize Bootcamp DataScienceDojo Bootcamp Conferences Energy Conference Network Datacon REWORK Clubs Houston Energy Conference Network R Group Python Group 13

14 Summary Data Engineer Data Scientist Software Engineer 14

15 15 Thank You

16 Data Scientist #1 PhD in Quantitative Discipline (Math, Engineering, Computer Science, Chemistry) Parallel Computing Skills Languages: R, Hadoop, Spark, JAVA, HTML5, Python is a Plus Strong visualization skills and open source tools Domain Expertise Project Management Works with multi-functional teams focused on enterprise data, data quality, business intelligence, operational data and enterprise resource planning integration to deliver data driven insights. *Soft Skills 16

17 Data Scientist #2 MS degree (Econometrics, At least 5+ years (data mining Statistics, Operations Research, methods, developing algorithms, Optimization, Data Mining, Machine statistical analysis, and developing Learning, Physics, or other predictive models related to quantitative disciplines) pricing, demand forecasting, and SAS, R, and SQL (Python a plus) supply chain optimization) Experience with enterprise systems (SAP) Experience in dealing with timeseries and highly correlated data (ARIMA, ARMA) Project management Experience Hands-on experience with using and/or developing Deep Learning models Experience in dealing with timeseries and highly correlated data. *Soft Skills 17