Transforming Social Data into Business Insights. Marie Wallace, Vincent Burckhardt

Size: px
Start display at page:

Download "Transforming Social Data into Business Insights. Marie Wallace, Vincent Burckhardt"

Transcription

1 Transforming Social Data into Business Insights Marie Wallace, Vincent Burckhardt

2 Notices and Disclaimers Copyright 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law 2

3 Notices and Disclaimers cont. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera, Bluemix, Blueworks Live, CICS, Clearcase, Cognos, DOORS, Emptoris, Enterprise Document Management System, FASP, FileNet, Global Business Services, Global Technology Services, IBM ExperienceOne, IBM SmartCloud, IBM Social Business, Information on Demand, ILOG, Maximo, MQIntegrator, MQSeries, Netcool, OMEGAMON, OpenPower, PureAnalytics, PureApplication, purecluster, PureCoverage, PureData, PureExperience, PureFlex, purequery, purescale, PureSystems, QRadar, Rational, Rhapsody, Smarter Commerce, SoDA, SPSS, Sterling Commerce, StoredIQ, Tealeaf, Tivoli, Trusteer, Unica, urban{code}, Watson, WebSphere, Worklight, X-Force and System z Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: 3

4 Please Note: IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. 4

5 Are you an Analytics Rockstar? You are custodian of the most valuable data within the enterprise IF you can release it for business value 5

6 Organizations with a highly engaged workforce significantly outperform those without The shift to digital now makes analysis of engagement networks possible 6

7 Organizations with a highly engaged workforce significantly outperform those without Can we use analytics to better understand employee engagement and it s impact on the business? The shift to digital now makes analysis of engagement networks possible 7

8 Capture & Understand your Enterprise Network Management Employee 8

9 Capture & Understand your Enterprise Network Management Employee 9

10 ODPi (Open Data Platform Initiative, odpi.org) ODPi is an industry effort to promote and advance the state of Apache Hadoop and Big Data technology for the enterprise. It currently has 24 member companies. IBM is a founding member of ODPi and is one of 4 members to release a data platform based on the ODP core; IBM Open Platform. Priorities Certifications for ODPi compatible distributions Guidelines for ODPi ISVs and consumers Introduce more big data projects into ODPi 10

11 IBM Open Platform (ibm.biz/ibmopenplatform) Data Exchange Data Scientist & Developer Platform Services Analytic Services Data Processing & Management 11

12 IBM Engagement Analytics (ibm.com/engage) Data Exchange Data Scientist & Developer Platform Services Analytic Services Data Processing & Management 12

13 Capture & Understand your Enterprise Network Management Employee 13

14 The Personal Social Dashboard Activity: Measure of your activity Reaction: Measure of how people respond to your activity Eminence: Measure of how people respond to you Network: Measure of the quality of your network and your role within it Helps each employee better understand their engagement, reputation, and helps them more effectively activate their network for maximum value 14

15 The Organizational Dashboard Shows connectivity within & between teams Identifies people who play key roles Highlights organizational brittleness Helps management better understand overall engagement and organizational health, identify issues and action accordingly 15

16 Organizational Health 16

17 Many analysis actionable w/ recommendations Understand your engagement & reputation within the social network Act ased on Recommendation Templates & Network Analysis: on your personal recommendations to drive improvement mployee Matching: Based on a person s social activity define if, and to what level, they fit a specific social engagement trait emplate Instantiation: Generate recommendations that if followed can change and strengthen their engagement patterns 17

18 Innovation & Advocacy #1 Collaboration Does Impact Business Outcome Engaged employees are 120% more likely to generate Innovation and 150% more likely to demonstrate Customer Advocacy #2 Optimal Behavior is Different for Everyone A variety of interactions most effectively contribute to business outcome #3 Discovering & Disseminating Optimal Behaviors is Key to Improving Business Outcome The Personal Social Dashboard provides such a channel 18

19 Employee Retention Does engagement change prior to an attrition event? Analyzed organizational, social, and retention data Inspected 10,000 random employees as a control group and 1188 employees who quit Yes! And engagement analytics can help to predict attrition events Social Behavior Patterns: less engaged with differences in types of activity Volume of Activity: less activity several months prior to attrition event Network: Attrition is viral (common manager, passive and active network 19

20 Capture & Understand your Enterprise Network Management Employee 20

21 Transforming discrete data into insights

22 Big Data Analytics 22

23 Our scope: making sense of the data da ta data data data data data data data data ta da ta da ta da data t a ata da d data data data ta da ta da ta da data data data data data data Analytics ta data da data Business Insights ta da data data data data 23

24 Single location Date/time Latitude Longitude 01/10/ :

25 Locations for one person over one year Date/time Latitude Longitude /10/ : /10/ : /10/ : /10/ : Where the person lives House, apartment,... Type of neighbourhood Where the person works Potential income indications Type of work 01/10/ : /10/ : /10/ : Where the person shops Type of supermarket Practice sport (cycling, running )... 25

26 Locations for multiple people over one year Date/time Latitude Longitude Person /10/ : Vincent 01/10/ : Bob 01/10/ : Sally 01/10/ : James 01/10/ : Vincent 01/10/ : Bob 01/10/ : Sally Social connections (2 or more people at the same location on a regular basis) Build general patterns to predict preferences and behaviors People who live in X and shop in Y tend to like Z 26

27 Value of IBM Connections Collaboration tool... that lets you look under the hood Connections generates discrete events about who did what in the system at very granular level By applying analytics to large number of events allows to define patterns, statistics... business insights Business Insights Social events IBM Connections Analytics 27

28 Extracting meaningful data from your social platform

29 IBM Connections Profiles Forums Find the people you need Exchange ideas with, and benefit from the expertise of others Communities Work with people who share common roles and expertise Blogs Present your own ideas, and learn from others Files Micro-blogging Post, share, and discover documents, presentations, images, and more Reach out for help your social network Wikis Bookmarks Create web content together Save, share, and discover bookmarks Activities Home page Organize your work and tap your professional network See what's happening across your social network 29

30 Connections Maximizes The Value of Social Data IBM Connections provides APIs and SPIs that allow the value of the social data to be maximized by external systems: ALL Connections data can be accessed by external systems Open, transparent, breaking down silos Pull data from IBM Connections Programmatically access much of the same information that you can through the IBM Connections user interface Have Connections push data to you All data changes (CUD) event in all IBM Connections components can be supplied to external consumers 30

31 Connections Architecture Common Services JMX / WSAdmin Administration Search IBM Connections Apps Person Card User Directory Navigational Header RDB File System Directory 31

32 Connections Architecture Connections Atom API Browser Mashups Feed Reader Sametime Lotus Notes Portlets Microsoft Office Your App HTTP Server & Proxy Cache REST API PUT Common Services JMX / WSAdmin Administration Search DELETE Atom Entry POST HTML Form GET JavaScript HTML Atom Feed JSON IBM Connections Apps Person Card User Directory Navigational Header RDB File System Directory 32

33 Connections Architecture Connections Atom API Browser Mashups Feed Reader Sametime Lotus Notes Microsoft Office Portlets Your App HTTP Server & Proxy Cache REST API PUT Common Services JMX / WSAdmin Administration Search DELETE Atom Entry POST GET HTML Form JavaScript HTML Atom Feed JSON IBM Connections Apps Person Card User Directory Navigational Header Directory RDB File System Integration Otherbus Enterprise Services Event SPI Your App 33

34 The Event SPI is the social data fire-hose Designed to allow 3rd party to get notified whenever a data change happens in any of the IBM Connections service Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations Potential to represent the complete interaction footprint of the enterprise Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network SPI (System Programming Interface) vs API (Application Programming Interface) SPI at lower level than APIs... contribute Java code at system level By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections 34

35 Event SPI Programming aspects Events: collections of data generated when activities (datamodifying, notifications) occur in IBM Connections In the SPI, an event is represented by a Java bean / object A Event encapsulate data such as the type of action and the object (and container) involved in the action Events Event SPI Eventconfig.xml are delivered to Event Handlers: An event handler is a Java class implemented by a 3rd party (you!) Event handlers are registered in an XML file (eventconfig.xml) Instructing what type of event to send to a given handler Connections delivers Java bean representing the event to registered event handler(s) Handler 1 Handler N Handler 2 35

36 Cloud considerations The Event SPI relies on event handlers written in Java to allow vendors to listen and process events generated by the system Running external code (untrusted) on Cloud is not possible Running 3rd party code on same WebSphere servers as our applications is not safe Multitenancy issues Introducting Switchbox Our plan is to allow customers/vendors to listen events generated for their own organization on our Cloud applications without running code on our system Already leveraged by compliance solutions Currently being implemented for broader consumption, not available as of now 36

37 Cloud considerations Reliable delivery mechanism Delivery at least once, support and recover from network failure Latency tolerant Ease of transition between on-premise and Cloud Java event handlers implemented for Event SPI can be run by Switchbox client Main difference being that the event handlers are deployed and run on customer infrastructure, outside IBM Connections datacenter SwitchBox client invokes event handlers upon reception of event Switchbox is not currently available. This diagram shows our desire to provide such a solution to allow customer consume events from their own organization on Cloud IBM Connections Cloud infrastructure Event SPI Switchbox handler SwitchBox server Base for generation of events from most IBM social apps (Sametime) Customer infrastructure SwitchBox client Handler 1 Handler 2 37

38 Event SPI available data in each event blog.entry.created: Amy Jones posted a blog entry in the blog named XYZ Actor The person who initiated this action. Details: External id, name and, if not disabled, address Type Type of action Example: CREATE, UPDATE, DELETE, NOTIFY, MEMBERSHIP,.. Item Container General concept for representing an individual entity within a container General concept for representing a "bucket" or "container" that contains other items Details: id, name, textual content, HTML and ATOM paths Details: id, name 38

39 Event SPI available data in each event Many more data fields encapsulated in events: Correlation item set to represent parent-child relationship (events about commenting action) Target set, allowing to deduce interaction between content and people Membership delta field, indicating who has been added/removed from a community, activity, see Event SPI documentation for full list (JavaDoc) Key point: the event model encapsulates all of data needed to understand the interaction between people, content and containers in the platform 39

40 Event SPI in the context of an analytic solution Challenges of analytics: Large amount of incoming event stream Over 100+ events per second CUD Growing on longer term Scalable framework for analysis Horizontal scale to address growth (Near) No real-time indexing data loss 40

41 Taming the fire-hose... (1/2) Analysis, even basic, is time consuming, thus: Event SPI Event Handler Data backbone Storage for asynchronous processing Goal: retaining as many events as possible for further analysis Analytics Service Analysis should not occur in the event handler, but in an external system ( Analytics Service ) The event handler should not wait until the analytic service processes the event It would result in an accumulation of events at Connections level Problematic as Connections queue retaining events to be delivered to event handler has a limited depth => Design event handler to consume and process events as fast as possible, ie: as the interface between IBM Connections and an external system 41

42 Taming the fire-hose... (2/2) Characteristics of the data backbone Distributed and highly available Horizontal scale High throughput Agnostic to consumers' state Multiple options Message broker MQ / MQTT / ActiveMQ / Apache Kafka Database... 42

43 Integration with a message broker Apache Kafka Java class implementing the EventHandler interface Send JSON representation of the event. Serialization to JSON through Open Source GSON library 43

44 Integration with a message broker Apache Kafka Registration through events-config.xml Java class implementing EventHandler interface Subscriptions define the events delivered by the SPI to the event handler. Properties: name/value pair injected in the event handler java class. Typically used to pass config. settings Filtered by event name, source (IBM service), or/and type (CREATE, UPDATE, DELETE,...) 44

45 Integration with a message broker Apache Kafka Deployment jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server 45

46 3rd party events can also participate in the social analytics solution IBM Connections provides OpenSocial Activity Streams APIs allowing 3rd party to push their own events to the Activity Stream Since Connections 4.5: Events pushed through the Activity Stream APIs are also surfaced in the Event SPI An option allows to NOT surface an event in the Activity Stream APIs, ie: only surface through the Event SPIs => 3rd party applications can also participate in the social analytics graph simply by publishing to the Connections Activity Stream APIs 46

47 Pulling data when is it needed? You can pull all data from Connections... but is it really needed? Good news: Events surface in most case all data needed for analytics purposes (including the content the event is about) Events about the same object repeat data If there are X events about the same object, the item/correlation data set will always contain the most up-to-date information about the referenced object For an analytic solution in a nutshell, this means that the Event SPI should be sufficient in most cases 47

48 Pulling data when is it needed? Push approach (Event SPI) is sufficient to build most analytic solution All necessary content (textual content, tags, ) is surfaced in every single event All operation changing relationships (ie: adding/removing member, colleague, follower) are surfaced as events Pull (REST APIs) approaches should stay limited to: 1. Bootstrap the Analytics Service based on a Connections system with data existing prior to the introduction of the event handler used in your analytic solution Essentially building membership/network data (as needed) Seeding the content should not be needed, as it is repeated whenever an event about the content is generated 1.Fetching data not available through the Event SPI Relatively rare for events generated from Connections 48

49 Pulling data from Connections 2 main approaches for pulling data from Connections 1. REST APIs (Atom / OpenSocial format) REST-style HTTP based APIs (XML, Json format) Transparency: programmatically access much of the same information that can be accessed through the IBM Connections UI Drink your own champagne - public APIs used internally by plug-ins, mobile and even some components Web UI (Activity Stream, Activities, ) 2. Seedlist Designed to allow crawling of Connections data for indexing purpose by a search engine Surfacing all content in the system therefore it can be of some value for an analytic solution HTTP based APIs (Atom XML format) 49

50 Seedlist Example: /forums/seedlist/myserver returns ALL forum entries in the system Textual content, author, number of comments, number of recommendations, parent id, ACL 50

51 Authentication aspects for the REST APIs REST APIs support basic authentication, formbased authentication and (for most APIs) Oauth Private data: strict enforcement of access on API calls Not very convenient for access by an analytic system... Super user Concept of super user - access control checks on private data are by-passed On-premise: the super user is a user mapped in the JEE admin role across all Connections services On Cloud: impersonation support can help to fetch data for a range of users (progressively being disclosed) 51

52 Pulling data from Connections What to use, when? REST APIs (Atom / OS APIs) Seedlist Pros Fine granularity: access content / meta-data for a specific object / container Access relationship information APIs are available for fetching membership lists, network information, who liked a given object,... Batch retrieval of textual content Incremental updates (but the Event SPI is much more suitable for this purpose) Cons Lack of batch retrieval capabilities Focused around content - does not expose all the data (missing tags membership information) In some very specific cases, data not available in a form easily consumable to build an analytic solution Example: getting the list of followers for a given object in the system Query directly the Connections databases (in these specific cases only) Database schema can change overtime and is private 52

53 Key Points Leverage the Event SPI as much as possible Provides (most of) the data needed for any elaborated analytics solution Just let Connections push data to you! Easier, perform well Fill the gaps by pulling data from the Atom/Seedlist APIs Initial loading of relationship / content data Data not available through the Event SPI One final warning: Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin role) => Ensure your solution is not leaking private data to unauthorized users 53

54 Analytics and Connections data 54

55 55

56 Key parts of typical analytic pipeline Credit: Paco Nathan 56

57 Key parts of typical analytic pipeline IBM Connections! 57

58 Key parts of typical analytic pipeline * Extract: Consume events IBM Connections! * Transform: Transform format * Load: Load transformed data to database / disk * Clean data (fetch specific data fields from events, assign unique id to objects) * Represent social relationship as graph 58

59 Representing Connections data as graph A property graph has: vertices and edges can have any number of properties directed relationships creates Person A Graph comments on Status Update Status Update Comment creates Person B structure is ideal to represent relationships between entities (people, objects) Context around the event Cause and effect of an event Artefacts related to an event 59

60 Key parts of typical analytic pipeline * Extract: Consume events IBM Connections! * Transform: Transform format * Load: Load transformed data to database / disk * Clean data (fetch specific data fields from events, assign unique id to objects) * Represent social relationship as graph Query graph to generate insights: activity, eminence, reaction, network. Store score per user and org 60

61 Key parts of typical analytic pipeline * Extract: Consume events IBM Connections! * Transform: Transform format * Load: Load transformed data to database / disk * Clean data (fetch specific data fields from events, assign unique id to objects) * Represent social relationship as graph Query graph to generate insights: activity, eminence, reaction, network. Store score per user and org API / UI to surface scores generated in previous step 61

62 4 dimensions of Big Data Volume Veracity Variety Velocity 100s of events per seconds ~500 bytes per event + bulk data => 180 GB per hour, 4.3 TB per day Not an issue with Connections, can trust veracity of events from Connections Semi-structured data Time and spatial aspects Easy to represent as graph 62

63 63

64 IBM Open Platform (ibm.biz/ibmopenplatform) Data Exchange Data Scientist & Developer Platform Services Analytic Services Data Processing & Management 64

65 IBM Engagement Analytics (ibm.com/engage) Data Exchange Data Scientist & Developer Platform Services Analytic Services Data Processing & Management 65

66 Key points Value of collaboration data: From discrete events to generating deep insights about people, network the whole organization Key insights by leveraging Big Data Analytics on events Insights only limited by data and your own ability to process it IBM Connections has its own powerful set of APIs to access to most interactions in the system Fully available on promise Being unlocked on Cloud Analytic platform available (IBM Open Platform) Get started with IBM Open Platform and build on top of it 66

67 More resources online IBM Open ibm.biz/ibmopenplatform IBM Engagement ibm.com/engage Event ibm.biz/eventspi w/ Java ibm.biz/eventspijavadoc SocialBiz User Follow us @marie_wallace ibm.biz/socbizlinkedin; participate in the our Social Business group give us a Like Social Business Insights ibm.com/blogs/socialbusiness; join the conversation! 67

68 Thank you 68

69 Your Feedback Is Important! Based upon your session attendance, a customized list of surveys will be built for you. Please complete your surveys via the conference kiosks or any web enabled device at or through IBM Event Connect. 69