Data Science End to End

Size: px
Start display at page:

Download "Data Science End to End"

Transcription

1 Data Science End to End Big Bang Data Science Solutions 13 Week DS Program 5th Batch April 2018 BBDS Inc. Demand for deep analytical talent in the United States could be 50 to 60 percent greater than its projected supply by 2018 Source: The McKinsey Global Institute

2 Table of Contents 1. Introduction 2. Course Overview 3. Course Information 4. What You Get in Return? 5. Your Portfolio 6. Course Curriculum 7. Course Fee 8. Instructor 9. About BBDS Global BI and Analytics market would grow to $22.8 billion by 2020 Source: Gartner Advanced Analytics market to grow from $7.04 billion in 2014 to $29.53 billion in 2019 Source: Markets and Markets

3 BBDS 13 Week Program 5th Batch Why Data Science? 1. Introduction Congratulations! You are interested in learning everything necessary to become a Data Scientist and join the ranks of the fastest growing jobs in IT. "Data science" wasn't even a term 10 years ago. And now everyone is asking the same question. What's the deal? We are here to help you sort through the data. Data science is the new happening. Companies in every industry are recognizing the strategic importance of using data to stay competitive. Organizations across the globe are desperate to hire data scientists. A shortage of data scientists and business analysts means the employment outlook for professionals with the required knowledge and technical skills in these areas is extremely positive. Harvard Business Review s Data Scientist: The Sexiest Job of the 21st Century, notes that the shortage of data scientists is becoming a serious constraint in some sectors. 2. Rewarding Career! BBDS 13 Week course assumes that students know close to nothing about Data Science and Machine Learning. Its goal is to give you the concepts, intuitions you need to actually implement programs capable of learning from data. We will cover a large number of techniques, from simplest and most commonly used (such as Linear Regression) to some Deep Learning techniques that regularly win competitions Working Model showcase your skills Data Science Methodology Skills matrix for roles best fit Extensive and Flexible Live Online Training Instructor-Led Course Training Video Recordings Quality Training Materials Two-Way Interactive Sessions Graded Assignments & Professional Certificate Mock Exam/Assessment Career Path to learn, grow and prosper Repeat anytime at no costs Possible apprenticeship 2-3 projects Post training support in career search Resume & Interview Prep Job Oriented Training Job Placement and Placement Guidance

4 The McKinsey Global Institute agrees. Demand for deep analytical talent in the United States could be 50 to 60 percent greater than its projected supply by The result may be a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings. Data science careers are growing in virtually every sector: manufacturing, construction, transportation, warehousing, communication, science, health care, computer science, information technology, retail, sales, marketing, finance, insurance, education, government, security, law enforcement, and more. The future is bright for the budding data scientist, but the path to arrival is not so clear. The advent of open data, open source computing, and inexpensive processing power is clearing the way for participation in the data revolution. But these elements need to be combined in an intelligent fashion. People from all walks can become data scientists; the requisites are hard work, talent, a desire to discover, and the right skills and tools Big Bang Data Science Solutions now offers a complete intensive Data Science 13 Week Program that covers topics including Data Mining, Data Analytics, Applied Machine Learning, Predictive Analytics, Statistical Analysis, Regression Analysis, Classification Analysis, Clustering Analysis, Data Visualization. 3. Course Overview Total Session Hours 140+ Total Coding Hours 80+ Effort Needed HRS/WEEK Projects & Case Studies 10+ Prospective Careers Data Scientist AI/ML Engineer Course Duration 12 Weeks + ( 1 Week for Resume & Interview Preparation) Session Duration 3 HRS (1 Hr 45 Mins Concept Learning, 45 Mins Concept Implementation) Collaborate with mentors on coding assignments and projects in the last 45+ minutes of every session By 2020, 60% of Information Delivered to Decision Makers Will Be Considered by Them Always Actionable, Doubling the Rate from the Current (2015) Level Source: Cloudera

5 4. Course Information Course Dates April 21st, TO- July 14th, 2018 Meeting Times 5. Building a Stunning Portfolio By the end of the program, students will be able to prove their skills with a public portfolio of Data Science, Machine Learning and Deep learning projects. Many employers want to see your public GitHub page or Notebooks Azure. Sunday Mondays Fridays 3:00 pm - 6:00 pm EST 8:00 pm - 11:00 pm EST 8:00 pm 11:00 pm EST - OR - - All our projects follow standards guidelines and will be turned in through Notebooks Azure or public Git repositories with the intention of making it look good for future employers Watch recorded sessions anytime Location ATC Innovation Center 2972 Webb Bridge Road Alpharetta, GA - OR - Join remotely Available for extra time if needed 5.1 Business Decision Framing (DecisionsFirst) Using DecisionsFirst for framing Analytics requirement We define the data that needs to be provided, we identify the analytic technology to be used and we define the work-flow for delivering this data to decisionmakers. A more effective way to define requirements for analytic projects, to frame those requirements accurately, is to model the decision-making to be improved.. This Decisions First requirements approach provides critical information for successful analytic DecisionsFirst used to frame the decision for projects used in the program projects, complements work-flow requirements, and correctly identifies the data that will be required for the effort.

6 6. Course Curriculum Week 1: Data Science Fundamentals Learn Data Science basics, Good understanding of Data Science and Data Analytics, Overview of EMC and CAP certificates, Introduction to concepts, methodologies, and best practices.. - Introduction to Data Science - Learning Path, CAP Certificate - Crash Course in R - Tableau: Introduction & Basics: Bar chart - RapidMiner: Introduction to RapidMiner - Capstone Project Project Selection (Open datasets & problems) Week 2: Business Understanding & Problem Framing Determine Business Objectives, Assess Situation, and Determine Data Mining Goals, Produce Project Plan. Good understanding of problem framing, Decision Framing, Decision Analysis and Decision implementation using DecisionsFirst Modeler. 5.2 Data Science Process (CRISP-DM) The program follows the Cross-Industry Standard For Data Mining (CRISP-DM). CRISP-DM puts business understanding front and center at the beginning of the project. CRISP-DM remains the top methodology for data mining projects, with essentially the same percentage as in 2007 (43% vs 42%) 5.3 Fundamentals of Python and R Data scientists must know how to code start by learning the fundamentals of two popular programming languages Python and R. Basics of Python and R Conditional and loops String and list objects Functions & OOPs concepts Exception handling Business Understanding & Use Business Focused - Data Understanding & Data Preparation - Exploratory Data Analysis in R - Exploratory Data Analysis in Python Implementation Operations Focused PHASES Data Understanding & Use Data Focused - Tableau: Time series, Aggregation, and Filters - RapidMiner: Data Preparation & Correlation Analysis - Capstone Project Analytics Approach 1 (Data preparation & classification ) Analysis & Assessment Analytics Focused Data science phases at a high level.

7 Week 3: Data Understanding & Data Preparation Data Understanding & Data Preparation Exploratory Data Analysis using R & Python, Descriptive statistics, hypothesis testing, data preprocessing, missing values imputation, data transformation, Dive deep into R programming language from basic syntax to advanced packages and data visualization (e.g. reshape2, dplyr, string manipulation, ggplot2, R Shiny). - Data Understanding & Data Preparation - Exploratory Data Analysis in R - Exploratory Data Analysis in Python - Tableau: Time series, Aggregation, and Filters - RapidMiner: Data Preparation & Correlation Analysis - Capstone Project Analytics Approach 1 (Data preparation & classification ) Week 4 : Classification Analysis - Part Machine Learning in R/Python and RapidMiner Machines have increased the ability to interpret large volumes of complex data. Combine aspects of computer science with statistics to formulate algorithms that help machines draw insights from structured and unstructured data. Building models using below algorithms Linear and logistics regression Decision trees Support vector machines (SVMs) and Kernal SVM Random forests XGBoost K nearest neighbor & hierarchical clustering Principal component analysis Text analytics and time series forecasting Decisions Time to build your first models. You'll start in machine learning with supervised models, covering everything from the classics to cutting-edge techniques Advanced Analytics Business Intelligence Get to know R Packages and sci-kit learn, the most widely used modeling and machine learning package in Python. Data Infrastructure Data Types (Structured, Semi-structured, Unstructured) From Raw Data To Intelligent Decisions

8 Here we'll focus on situations where we have a knowable and observable outcome. You'll start with some of the classical models of machine learning like decision trees and OLS. Learn techniques for both regression and classification. You'll then move to more and more sophisticated models, with random forest and boosting. - Classification Analysis 5.5 Data Visualization: Matplotlib, Seaborn, ggplot & Tableau Complex data sets call for simple representations that are easy to follow. Visualize and communicate key insights derived from data effectively by using tools like Matplotlib and Tableau Interactive visualizations with Matplotlib Data visualizations using Tableau Tableau dashboard and storyboard Tableau and R integration - Decision Tree & Random Forest in R - Decision Tree & Random Forest in Python - Tableau Basics: Maps, Scatter-plots, - Rapid Miner: Decision Tree - Capstone Project Analytic Approach II (Machine learning techniques) Week 5 : Classification Analysis - Part 2 - Logistic Regression, KNN, Naïve Bayes in R - Logistic Regression, KNN, Naïve Bayes in Python - SVM, Kernel SVM in R & Python - Tableau: Joining and Blending Data, PLUS: Dual Axis Charts - RapidMiner: Logistic Regression, KNN, Naïve Bayes - Capstone Project Project Analysis Techniques (Presentation techniques) 5.6 Data Visualization: Matplotlib, Seaborn, ggplot & Tableau Lastly, manage your infrastructure with a data engineering platform like Spark so that your efforts can be focused on solving data problems rather than problems of machines. Introduction to Big Data & Spark RDD s in Spark, data frames & Spark SQL Spark streaming, MLib & GraphX Linear algebra 5.7 Capstone Projects The course is project based. There are 9 projects in total and students will be working on ONE team project starting from day 1. By the end, you'll present a capstone after going off to gather real data and build your own models using machine learning techniques. Customer Churn Project Classification Problem Projects Regression Problem Projects Clustering Problem Projects Regression Problem Projects Time Series Analysis Projects Fraud & Anomalies detection Projects

9 Week 6 : Regression Analysis - Part 1 - Data Mining & Machine Learning (Regression Analysis) - Decision Tree & Random Forest in R - Decision Tree & Random Forest in Python - Tableau: Table Calculations, Advanced Dashboards, Storytelling - Capstone Project Project Analysis Techniques (Data visualization techniques) 5.8 Analytical Tools used 9 Analytics tools are being used throughout the program. Students will learn how to use them all to transform data into product and information R & Python for Data Science Tableau BigML RapidMiner SAS for Enterprise Miner IBM Watson Analytics IBM BleuMix Week 7 : Regression Analysis - Part 2 - Simple Linear, Multiple Linear, Polynomial Linear in R - Simple Linear, Multiple Linear, Polynomial Linear in Python - Support Vector Machine in R & Python - Tableau: Table Calculations, Advanced Dashboards, Storytelling - Rapid Miner: Linear Regression - Capstone Project Data Analysis Execution Plan (Data visualization tools) Week 8: Clustering, Association Rules, Dimensionality Reduction The second largest branch of machine learning, here we'll cover various forms of unsupervised models, useful for things like feature reduction and clustering.

10 Unsupervised learning doesn't focus on an outcome, but rather finds associations and relationships present in data. You'll start with some classic clustering algorithms like k- means to break almost any data set into groups. You'll then get into cutting-edge techniques like Anomalies Detections and Time Series Analysis - Clustering Analysis - Kmeans, Hierarchical Clustering in R - Kmeans, Hierarchical Clustering in Python - Association Rules - Apriori, Eclat in R - Apriori, Eclat in Python - Dimensionality Reduction: PCA, LDA & Kernel PCA in R and Python - RapidMiner: Data Preparation & Correlation Analysis - Capstone Project Data Analysis Review (Interpretation) Organizations That Analyze All Relevant Data and Deliver Actionable Information Will Achieve Extra $430 Billion in Productivity Gains Over Their Less Analytically Oriented Peers by 2020 Source: Cloudera Week 9 : Anomaly Detection, Time Series Analysis, Text Mining & NLP - - Text Analysis & NLP - Anomaly Detection, Time Series Analysis - Capstone Project Project Selection (Open datasets & problems) Week 10 : Deep Learning & Reinforcement Deepen machine learning skills with R and scikit learn. Deep learning with Theano, TenserFlow & Karas, Neural Networks learn, Convolutional Neural Networks - Deep Learning and Neural Nets - ANN in R & Python - CNN in Python - Reinforcement Learning: Random Selection, UCB, Thompson Sampling in R - Reinforcement Learning: Random Selection, UCB, Thompson Sampling in Python - Tableau: Advanced Data Preparation - Capstone Project Data Analysis Execution Plan (Data visualization tools)

11 Week 11: Model Assessment, Validation, Optimization & Tuning Introduction to Cost Function, Object Function, Model Optimization, Model Tuning, Regularization, Gradient Boosting, Grid and Random Search. Analyze the performance of each algorithm - Session Assessment CM, ROC, Rank- Ordered Approach, R2, MSE, MAE, Median Error, Median Absolute error, Correlation - k-fold Cross Validation & Grid Search in R - k-fold Cross Validation & Grid Search in Python - XGBoost, AdaBoost in R & Python - Lasso Regression in R & Python - Ridge Regression in R & Python - Elastic Net Regression in R & Python - Rapid Miner: Cross-Validation - Capstone Project Presentation (Research and trends in data analytics) Week 12: Predictive Analytics, Cognitive Computing & Big Data Introduction to Predictive Analytics, Introduction to SAS for Enterprise Miner, Decision Tree classification with SAS, Decision Tree Regression with SAS, Neural Network with SAS. Learn the concepts of high-performance computing with parallel computing and Cognitive computing, and Watson Analytics. Introduction to Map Reduce, Hadoop, Hive, Spark, and Spark MLlib Week 13: Resume and Interview Preparation Students will have access to more than 250 real interview questions followed by a session with answers and resume preparation. - Interview Preparation - Resume Preparation - Job Placement and Placement Guidance Class Price The class fee is $2499 You can repeat multiple times at no cost Class Interest Group Registration register-for-a-course Contact mo.medwani@bigbang-datascience.com

12 ABOUT BBDS Big Bang Data Science Solutions BBDS trains students (individuals and corporate) on Data Science and Data Analytics to help you uncover actionable insights to drive competitive advantage and capture business value. We train on integrating and ope-rationalizing data analytics solutions, enabling you to gain visibility into previously opaque or hard to measure processes. This empowers you to make smarter business decisions. Our team of data experts, consultants and data scientists leverage proven analytics methodologies, tools and best practices to define the right analytics solutions for you, that solve complex business challenges/ specific use cases and drive future growth. 12+ Years of experience in IT (Service Delivery Management) BBDS Founder 7+ Years of experience in Data Analytics 3+ Years of experience in Data Science and related technologies 3 Master degrees (MBA, MS-IT, MS- Data Science) Mo Medwani Data Scientist mo.medwani@bigbang-datascience.com Ph.D. Candidate (Data Science and Data Analytics) Training Individuals and Corporations in Data Science since Dec 2016