Building Data Teams. Business Data Science Use Cases

Size: px
Start display at page:

Download "Building Data Teams. Business Data Science Use Cases"

Transcription

1 Building Data Teams Business Data Science Use Cases

2 Agenda (mail order company) (social network) (online marketing agency) (web-pages to the people!) 2012-

3 OTTO - Catalogue Targeting SENN* Neural Networks *Software Environment for Neural Networks ( com/innovation/apps/pof_microsite/_pof-fall-2011/_html_en/neural-networks. html)

4 OTTO - Catalogue Targeting Sample Explore Modify Model Assess

5 OTTO - Catalogue Targeting

6 OTTO - Lessons Learned Standard processes, transparency and reproducibility might top performance

7 XING - Recsys

8 XING - Recsys Tag Graph Source: Data-Driven Ontologies for Recommender Engines in Social Networks, I. Bax, J. Moldvay, 2009

9 XING - Recsys-Tag Graph Clustering SQL Data Analytics BI Data Mining Python SAS Multivariate Analysis R Predictive Analytics Java Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark

10 XING - Recsys-Tag Graph Clustering SQL Data Analytics BI Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark

11 XING - Recsys-Tag Graph Clustering =6.3 SQL Data Analytics BI Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 = =6.8 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark

12 XING - Recsys

13 XING - Lessons Learned ab-test performance (against random, top10) evaluate and tune your recsys according to how they are integrated (top 3 or top 10) implementing first productive recsys was guerilla warfare

14 Unique Digital (Online Marketing)

15 Unique Digital (Online Marketing) Independent Variables Display Ad View Google Ad Click Facebook Ad Click Affiliate Click Model Dependent Variables P(Sale=1) logreg

16 Unique Digital (Online Marketing) P(Sale=1)= Display += Display Model P(Sale=1)= Affiliate += Display, Affiliate Model P(Sale=1)= Search += Display, Affiliate, Search Model P(Sale=1)= 0.02

17 Unique Digital - Lessons Learned Went from local MySQL, to AWS RDS, to AWS S3 and EMR All modelling done in R

18 Jimdo - Design Recommender

19 Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from:

20 Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from:

21 Jimdo - Design Recommender def get_top_2(products): p = dists[products].apply(lambda row: np.sum(row), axis=1) p = p.order(ascending=false) return p.index[p.index.isin(products)==false][:2] print get_top_2(["334","de_de"]) Index([u'283', u'278'], dtype='object') adopted from:

22 Jimdo - No Data Princelings Data

23 Jimdo - Self Service DWH Data Self Service DWH

24 Jimdo - Lessons Learned Data Driven through: AB-Testing & Self Service SQL Serve the business (Recsys might come later)

25 Building Data Teams Finding Data Scientist Unicorns Source: &

26 Building Data Teams Source:

27 Building Data Teams Hacking & Engineering Statistics & Analytics DATA TEAM Business Analysis & Communication

28 Building Data Teams Recruiting fairs: OTTO Create positions & projects for talent: XING & Jimdo Internships: Unique Digital & Jimdo PhDs & academics!: Jimdo

29 What s next? - Street Fighting DS* *Source:

30 What s next? Soft Skills matter! Source:

31 What s next?

32 Contact blog: blabladata.com