Building Data Teams Business Data Science Use Cases
Agenda (mail order company) 2003-2007 (social network) 2007-2009 (online marketing agency) 2011-2012 (web-pages to the people!) 2012-
OTTO - Catalogue Targeting SENN* Neural Networks *Software Environment for Neural Networks (http://www.siemens. com/innovation/apps/pof_microsite/_pof-fall-2011/_html_en/neural-networks. html)
OTTO - Catalogue Targeting Sample Explore Modify Model Assess
OTTO - Catalogue Targeting
OTTO - Lessons Learned Standard processes, transparency and reproducibility might top performance
XING - Recsys
XING - Recsys Tag Graph Source: Data-Driven Ontologies for Recommender Engines in Social Networks, I. Bax, J. Moldvay, 2009
XING - Recsys-Tag Graph Clustering SQL Data Analytics BI Data Mining Python SAS Multivariate Analysis R Predictive Analytics Java Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
XING - Recsys-Tag Graph Clustering SQL Data Analytics BI 4.1 2.2 Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
XING - Recsys-Tag Graph Clustering 2.2+4.1=6.3 SQL Data Analytics BI 4.1 2.2 Multivariate Analysis Data Mining Python SAS R 3.3 Java 3.5 =3.5+3.3=6.8 Predictive Analytics Data Science Hive Statistical Modelling HBase Machine Learning Big Data MapReduce Spark
XING - Recsys
XING - Lessons Learned ab-test performance (against random, top10) evaluate and tune your recsys according to how they are integrated (top 3 or top 10) implementing first productive recsys was guerilla warfare
Unique Digital (Online Marketing)
Unique Digital (Online Marketing) Independent Variables Display Ad View Google Ad Click Facebook Ad Click Affiliate Click Model Dependent Variables P(Sale=1) logreg
Unique Digital (Online Marketing) P(Sale=1)= 0.001 Display += 0.001 Display Model P(Sale=1)= 0.002 Affiliate += 0.006 Display, Affiliate Model P(Sale=1)= 0.008 Search += 0.012 Display, Affiliate, Search Model P(Sale=1)= 0.02
Unique Digital - Lessons Learned Went from local MySQL, to AWS RDS, to AWS S3 and EMR All modelling done in R
Jimdo - Design Recommender
Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from: http://docs.yhathq.com/scienceops/deploying-models/examples/python/deploy-a-beer-recommender.html
Jimdo - Design Recommender dists = cosine_similarity(df_train) adopted from: http://docs.yhathq.com/scienceops/deploying-models/examples/python/deploy-a-beer-recommender.html
Jimdo - Design Recommender def get_top_2(products): p = dists[products].apply(lambda row: np.sum(row), axis=1) p = p.order(ascending=false) return p.index[p.index.isin(products)==false][:2] print get_top_2(["334","de_de"]) Index([u'283', u'278'], dtype='object') adopted from: http://docs.yhathq.com/scienceops/deploying-models/examples/python/deploy-a-beer-recommender.html
Jimdo - No Data Princelings Data
Jimdo - Self Service DWH Data Self Service DWH
Jimdo - Lessons Learned Data Driven through: AB-Testing & Self Service SQL Serve the business (Recsys might come later)
Building Data Teams Finding Data Scientist Unicorns Source: http://www.forbes.com/sites/danwoods/2012/03/08/hilary-mason-what-is-a-data-scientist/ & http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Building Data Teams Source: http://www.oreilly.com/data/free/files/analyzing-the-analyzers.pdf
Building Data Teams Hacking & Engineering Statistics & Analytics DATA TEAM Business Analysis & Communication
Building Data Teams Recruiting fairs: OTTO Create positions & projects for talent: XING & Jimdo Internships: Unique Digital & Jimdo PhDs & academics!: Jimdo
What s next? - Street Fighting DS* *Source: http://de.slideshare.net/pskomoroch/street-fighting-data-science-12072010
What s next? Soft Skills matter! Source: http://data-informed.com/soft-skills-matter-data-science/
What s next?
Contact twitter: @jmoldvay email: janos@jimdo.com blog: blabladata.com