Develop once deploy everywhere Advanced Text Analytics with KNIME Server Stefan Weingärtner, DYMATRIX CONSULTING GROUP GmbH KNIME User Day UK, 25 th June 2013 1
Agenda 1 Company Introduction 2 The growing importance of text analytics 3 DYMATRIX Text Mining Prozess 4 Benefits 5 Application Integration I: DynaSocial 6 Application Integration II: Advanced Email Classification 7 New Developments 8 Q & A 2
Company Introduction 3
DYMATRIX The analytical CRM Company» Solution provider for Customer Intelligence, Marketing Automation and Advanced Predictive Analytics» Consulting, development and implementation know how, based upon more than 900 projects with mid- and large cap companies across industries» Goal- and client- oriented project execution based upon award winning, established solutions» Owner managed and independent 4
Our Consulting Competence Centers Business Intelligence Advanced Analytics Campaign Management E-commerce insight» Conception of (big) data warehouse and business intelligence architectures» Corporate group reporting systems» Dashboards» Sales controlling» Planning & forecasting» Balanced scorecard» Customer segmentation» Customer value analysis» Propensity Modeling (Cross-/Upsell/Churn)» Shopping basket analysis» Credit rating analysis & credit scoring» Text Mining» Data mining automation» Design and optimization of campaign processes and workflows» Implementation of campaign management systems» Integration of Data Mining Models in Campaign Processes» Campaign Optimization» Consulting & Implementation of Next Best Activity Processes» Web tracking» Web controlling» Web mining» Real time recommendation» ecrm» Social media tracking & analysis» Web performance measurement» Customer Journey Analytics» Big Data Analytics Analysis of client oriented processes Initial situation Analysis Conception of processes for customer retention and its optimization - customer reactivation and new customer activation benchmarking against industry leaders 5
Solution Portfolio The Customer Insight Suite DynaCampaign» Intelligent multi-touchpoint campaign management platform» Planning, target group selection, execution and response measurement of campaigns» Event-triggered realtime campaigning DynaMine» End2end automation of data mining processes» Intelligent model management for automation of preprocessing, training & scoring of models DynaCision» Realtime decision management platform» Design & exection of complex embedded decision processess DynaSocial» Social CRM platform to listen, track, identify and quantify customer needs and sentiments 6
Our KNIME Solution Nodes & KNIME Consulting Services PMML2SQL / PMML2SAS Converter» Convert PMML to executable SQL Code for In- Database-Scoring» Convert PMML to executable SAS Code for Model Scoring within SAS Big Data Integration + Business Consulting + Analytical Consulting + Technical Consulting + Trainings» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME» Efficiently distribute large amounts of data & preprocessing across a set of machines Uplift Modeling» Train a predictive model that predicts the incremental response to marketing actions» For up-sell, cross-sell, churn and retention activities Interactive Scorecard Builder» Powerful interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards 7
Referenzen References Telecommunication Travel, Transportation Retail, Service Provider 8
References Banks, Insurances Media Utilities, Industries, Public Schwäbisch Hall 9
The growing importance of Text Analytics 10
Big Data is not just about structured data 80% 80% of the world s data is unstructured. Unstructured data is growing at 15 times 15 times the rate of structured data. Source: Google Trends April 6, 2012 11
Challenge: Big Data Collection & Integration Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 12
Imagine» to classify all customer related text messages by Source / Origin Sentiment Product or Service Business Transaction Context etc.» to identify unknown trends» to identify cause and effect relations» to react on that information, e.g. Technical Problems Needs Usability Competition etc. The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities! 13
DYMATRIX Text Mining Process 14
DYMATRIX Text Mining Process Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Datasources: Facebook Twitter Emails Data Provider like GNIP, Datasift etc. For Machine Learning Provide Training Data for Classification (e.g. Sentiment) Language Detection English German Many more Language individual NLP POS Tagging Penn Treebank Tagger STTS Tagger Text Cleansing Stop Words Stemming Punctuations Sentiment Amplifier Matching of Sentiment- & Emoticon- Dictionaries Text Tagging with any Subjects Products Brands Business Transactions Service Complaints Requests etc. Fuzzy Matching with Dictionary Tagger Matching of Subject- Dictionaries Text Vectorization Creation of text predictors to predict sentiments Machine Learning Classification with Predictive Analytics (e.g. Decision Tree) Retraining Interface Adjustment of misclassified messages for permanent optimization of classification Text Data Mart Make information available in central Text Data Mart for visualization, alerting etc. Fields of Application Email-Routing Event triggered Campaign Management etc. 15
DYMATRIX Text Mining Process: Datasources Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Access any Text Datasource to start the Text Mining Process» Facebook» Twitter» Emails» Crawler» Data Provider like GNIP, Datasift etc. Exemplified contribution on Facebook Fanpage Vodafone UK 16
DYMATRIX Text Mining Process: Text Enrichment Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Sentiment Amplifier Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it. Penn Treebank POS Tagger (English Messages) Why[WRB] not[rb] sort[vbg] your[prp] signal[vbp] issues [VBZ] out[in] instead[rb] of[in] bringing[vbg] new[jj] phones[nns]!!!![sym] Wk[NNP] 3[CD] of[in] crap[nn] but[cc] yet[rb] paying[vbg] FULL[NNP] monthly[rb] contract[nn]![sym] Vodafone[NNP] sort[vbg] it[prp].[sym] Removal of Stop Words & Punctuations sort[vbg] signal[vbp] issues [VBZ] instead[rb] bringing[vbg] phones[nns] Wk[NNP] 3[CD] crap[nn] paying[vbg] monthly[rb] contract[nn] Vodafone[NNP] 17
DYMATRIX Text Mining Process: Subject Matching Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. BUSINESS TRANSACTION: Complaint NETWORK: No Signal Subject Matching (Fuzzy Matching) Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT]. PRODUCT: Nokia Lumia 925 18
DYMATRIX Text Mining Process: Sentiment Classification Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Text Classification with Decision Tree Output from Text Enrichment Text Vectorization (Transformation) Predictors relevant for Text Classification, e.g. - Emoticons positive/negative - Length of message - Fragments positive/negative - Likes - Words positive/negative - Comments - Author-related Inputs - Other linguistic Inputs Resulting Classification 19
DYMATRIX Text Mining Process: Information Delivery Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Make information available in central Text Data Mart Visualization in DynaSocial Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. + Sentiment Business Transaction Product Relevance Network Other Fields of Application + + +» Subject-oriented Email-Classification & Email-Routing 20
DYMATRIX Text Mining Process: KNIME Workflow 21
Benefits 22
Develop once, deploy everywhere!» The KNIME-Server based Text Enrichment & Classification Workflow can be used for classification of any electronic text messages (e.g. Social Content, Blogs, Emails).» The Text Enrichment & Classification Workflow is deployed as a webservice and can be called easily from any other application. Benefits» Uniformed Sentiment- and Classification-Handling for all customer-related messages.» Batch- or Realtime-Execution from any application. 23
Application Integration I: DynaSocial Social Media Monitoring & Analytics 24
DynaSocial Social Media Excellence Architecture Social Media Analytics Content Extractor Facebook Twitter Social Media Data Provider Advanced Social Media Analytics Text Mining & Network Mining Text Enrichment & Classification Network Insights Social Media Analytics Data Management Social Media Analytics Dashboard Social Service Platforms Generic Big Data Model Client individual Sources Social Engagement Emails Integrated Social Inbox including all Social Touchpoints DynaSocial Configuration Center Data Sources Sentiments & Classifications Reports & Dashboard 25
DynaSocial Management Dashboard Activities Platform Distribution Overall Sentiments Sentiment Ratio Trends compared to competition (Share of Voice) Top Keywords Key Influencer Geographic Distribution Flexible Selection of Time Windows 26
DynaSocial Management Dashboard (Project Example) 27
Application Integration II: Advanced Email-Classification Multidimensional realtime Email-Classification 28
Email Classification: MS Exchange Connector.NET Batch 2 Call.NET Procedure and transfer email contents to KNIME Server via Webservice Call. 1 Incoming Email KNIME Server 3 Call KNIME Text Enrichment & Classification Workflows und return classification results. Microsoft Exchange Webservice 4 5 Classification results are returned to Exchange Server and are saved persistantly with object categories. Any clients having access to Exchange Server get the same classification. Microsoft Outlook Microsoft Outlook Webaccess Other Email-Clients 29
Livedemo Realtime Email- Classification 30
New Developments 31
New Developments Multilingual NLP» Alongside German, Englisch, French and Italian also Spanish and Chinese is supported. Influencer Detection» Influencer Identification (DYMATRIX-Algorithm) Social Earthquake Warning System / Social Hype Detection» Early warning of hypes by monitoring of news extension» Comparison to spreading patterns of past hypes/hot topics Authenticity Score» Authenticity-Score for automatic identification of Fake-Assessments (e.g. Assessments of Products or Hotels etc.) 32
Contact DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner Thank you for your attention. We are happy to answer any of your questions! Phone Fax E-Mail Web +49.711.22.007.88-12 +49.711.22.007.88-88 s.weingaertner@dymatrix.de www.dymatrix.de 33