Building Data Teams. Business Data Science Use Cases
|
|
- Edmund Clifton Sims
- 5 years ago
- Views:
Transcription
1 Building Data Teams Business Data Science Use Cases
2 Agenda (mail order company) (social network) (online marketing agency) (web-pages to the people!) 2012-
3 OTTO - Catalogue Targeting SENN* Neural Networks *Software Environment for Neural Networks ( com/innovation/apps/pof_microsite/_pof-fall-2011/_html_en/neural-networks. html)
4 OTTO - Catalogue Targeting Sample Explore Modify Model Assess
5 OTTO - Catalogue Targeting
6 OTTO - Lessons Learned Standard processes, transparency and reproducibility might top performance
7 XING - Recsys
8 XING - Recsys Tag Graph Source: Data-Driven Ontologies for Recommender Engines in Social Networks, I. Bax, J. Moldvay, 2009
9 XING - Recsys-Tag Graph Clustering SQL Data Analytics BI Data Mining Python SAS Multivariate Analysis R Predictive Analytics Java Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
10 XING - Recsys-Tag Graph Clustering SQL Data Analytics BI Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
11 XING - Recsys-Tag Graph Clustering =6.3 SQL Data Analytics BI Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 = =6.8 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
12 XING - Recsys
13 XING - Lessons Learned ab-test performance (against random, top10) evaluate and tune your recsys according to how they are integrated (top 3 or top 10) implementing first productive recsys was guerilla warfare
14 Unique Digital (Online Marketing)
15 Unique Digital (Online Marketing) Independent Variables Display Ad View Google Ad Click Facebook Ad Click Affiliate Click Model Dependent Variables P(Sale=1) logreg
16 Unique Digital (Online Marketing) P(Sale=1)= Display += Display Model P(Sale=1)= Affiliate += Display, Affiliate Model P(Sale=1)= Search += Display, Affiliate, Search Model P(Sale=1)= 0.02
17 Unique Digital - Lessons Learned Went from local MySQL, to AWS RDS, to AWS S3 and EMR All modelling done in R
18 Jimdo - Design Recommender
19 Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from:
20 Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from:
21 Jimdo - Design Recommender def get_top_2(products): p = dists[products].apply(lambda row: np.sum(row), axis=1) p = p.order(ascending=false) return p.index[p.index.isin(products)==false][:2] print get_top_2(["334","de_de"]) Index([u'283', u'278'], dtype='object') adopted from:
22 Jimdo - No Data Princelings Data
23 Jimdo - Self Service DWH Data Self Service DWH
24 Jimdo - Lessons Learned Data Driven through: AB-Testing & Self Service SQL Serve the business (Recsys might come later)
25 Building Data Teams Finding Data Scientist Unicorns Source: &
26 Building Data Teams Source:
27 Building Data Teams Hacking & Engineering Statistics & Analytics DATA TEAM Business Analysis & Communication
28 Building Data Teams Recruiting fairs: OTTO Create positions & projects for talent: XING & Jimdo Internships: Unique Digital & Jimdo PhDs & academics!: Jimdo
29 What s next? - Street Fighting DS* *Source:
30 What s next? Soft Skills matter! Source:
31 What s next?
32 Contact blog: blabladata.com