Monetizing the Lake Kirk Haslbeck, Hortonworks Dan Kernaghan, Pitney Bowes
Hadoop is Lower Cost and more Scalable 14000 Cost Per Terabyte 12000 10000 8000 6000 4000 2000 0 HDP Oracle X Teradata Netezza Hortonworks #REF! 2 Hortonworks Inc. 2011 2016. All Rights Reserved
Cost Drivers The Big Picture Insights Produce more valuable and more holistic insights Security - Apply Security Policies in one place instead of repeating them in each Silo Collaborate - Curate Feature Vectors for our Data Scientists and Promote Collaboration Time Get models into production faster. Human time still the most costly Storage Store data in an accessible file system at the lowest cost Time-to- Market Insights Storage Security Collaborate 3 Hortonworks Inc. 2011 2016. All Rights Reserved
Various Data Types Structured Time-Series Unstructured First_Name SSN Net_Worth Joe 233-33 100,000 Mark 456-77 200,000 40 35 30 25 20 15 10 5 0 12:05 12:08 12:11 12:14 12:17 12:20 Best Buy released their earnings this quarter and beat analyst expectations. Earnings per share increased by 0.02 DB2, Oracle KDB File System 4 Hortonworks Inc. 2011 2016. All Rights Reserved
HDP Stack Attack the Data with the Right Tool 5 Hortonworks Inc. 2011 2016. All Rights Reserved
Limitations of Building a Model on a Traditional Platform If you need a lot of data to build a good model, what tools can you use? Data volumes can eliminate the possibility of desktop tools R, Eclipse all limited to 8G of Ram on the desktop machine Sampling? Well we better get an even distribution of true and false positives in each sample, but wait that requires data munging, back to what tools can we use. Security Concerns? Extracting data from it s secure resting place and pushing it into other environments, often times unsecure files or desktops where Matlab or R can be installed. Collaboration Push processing to the data using modern distributed tooling. 6 Hortonworks Inc. 2011 2016. All Rights Reserved
Web-based Notebook for interactive analytics Apache Zeppelin Features Ad-hoc experimentation Deeply integrated with Spark + Hadoop Supports multiple language backends Incubating at Apache Use Case Data exploration and discovery Visualization Interactive snippet-at-a-time experience Modern Data Science Studio 7 Hortonworks Inc. 2011 2016. All Rights Reserved
Data Science Notebooks - Collaborate 8 Hortonworks Inc. 2011 2016. All Rights Reserved
Insider Trading 9 Hortonworks Inc. 2011 2016. All Rights Reserved
10 Hortonworks Inc. 2011 2016. All Rights Reserved
Banking: Credit Card Fraud Detection 11 Hortonworks Inc. 2011 2016. All Rights Reserved
Discovery Gathered all Credit Card Transactions Problem is they didn t make sense No identifiable patterns, no log normal curves Gas $45, Chipotle $8.50, Steak dinner $88, Amazon shoes $55 Classification 12 Hortonworks Inc. 2011 2016. All Rights Reserved
Outlier Detection: identify abnormal patterns Example: identify anomalies Features: - Time frequency - Category - Amount - Distance 13 Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks Data Flow 14 Hortonworks Inc. 2011 2016. All Rights Reserved Page 14
Pitney Bowes and Hortonworks Spatially Enabling the Data Lake 15 Hortonworks Inc. 2011 2016. All Rights Reserved
6 Pitney Bowes Data Global Coverage Global coverage built on a legacy of accuracy and precision Recognized leader for LI Data and capabilities. 16 AMER EMEA APAC 764 3079 719 Datasets Datasets Datasets Hortonworks Inc. 2011 2016. All Rights Reserved Local datasets for 240 Countries
Pitney 17 Bowes Partner Program Overview February 14, 2017 Pitney Bowes Data Unparalleled Depth 17 Hortonworks Inc. 2011 2016. All Rights Reserved
Risk of Relying Solely on Public Data 5 / 5 / Incorrect information for this property: Last sale date Last sale price # of bedrooms # of rooms Finished basement # of spaces (garage) Structure type Lot width Parcel boundary $207,000 July 1997 Unfinished /13 July 1997 $207,000 / 2 18 Hortonworks Inc. 2011 2016. All Rights Reserved 75
Easy to Deploy and Use Spatial Visualization Reporting Big Data Ecosystem Tools Analytics Custom Applications Client Applications Spectrum Data Quality for Big Data Spectrum Addressing for Big Data Spectrum Spatial for Big Data Spectrum Geocoding for Big Data Spectrum Routing for Big Data Pitney Bowes Data Products Distributed Cluster NoSQL Database HDFS Reference Datasets Hive Spark Pitney Bowes April 19, 2017 19 Hortonworks Inc. 2011 2016. All Rights Reserved
Enriching Data with a Location Stack For a given location: POI (carries attributes) Retail (Business) Footprint poly Building Footprint Parcel (Lot) Isochrone(travel time) Demographics, lifestyle attributes, financial and consumer vitality, etc. 20 Hortonworks Inc. 2011 2016. All Rights Reserved
Hydrating the Spatial Data Lake Property Data 180M+ Property Addresses Geocode Property Attributes Risk Data Property Boundaries Distance to Water Flood Risk Wild Fire Risk Market Data GeoDemographics Neighborhood Boundaries Zip Code Boundaries Points of Interest Property Data Risk Data Market Data Wild Fire Risk Walkability Scores Plus Transactions IOT Sensors Social Media 21 Hortonworks Inc. 2011 2016. All Rights Reserved
22 Case studies: Drive superior business outcomes and gain a deeper understanding of customers. Online Mortgage Loan Provider By consolidating data and running real-time address validation, they gained a complete view of customers, enabling more effective marketing, accelerated mortgage origination to enable loan processing in days not weeks. Financial service firm gains richer profiles Restored missing address data through data standardization, data augmentation and geocoding. Enabled firm to run targeted multichannel promotions via web and smartphone apps. Global US Based Wealth Management Organization Increase customer lifetime value and provide ideal customer experience by optimizing every contact with its mass-affluent customers, with 35% increase in revenues and 55% improvement in client satisfaction Pitney 22 Bowes Partner Program Overview April 19, 2017 Hortonworks Inc. 2011 2016. All Rights Reserved
Large US Online Loan Provider Property Analytics Case Study Business Challenge: Close loans more quickly, improve client experience while mitigating lender s risk This lender, unlike most others, relies on wholesale funding to make its loans and uses online applications rather than a system of branches. Close Loans Faster Lender found many specific requirements delayed loan funding and closure, causing clients to abandon online process. Integration if Pitney Bowes data through the pb key enabled the analysis of loan requests to provided an accurate qualification of the property for a loan, reducing abandoned rates and accelerating revenue. Mitigating Risk The accurate and complete attributes provided by the spatial data lake, correctly assessed the risks associated with a loan, enabling more accurate pricing and profitability. Desired Outcomes Improved real-time and long-term decisions Access to accurate date for 180M properties in the US Sharing information with partners (e.g. Fannie Mae) Complete picture of property, risk and market Benefits Accurate qualification of property for a particular loan type Faster loan processing and closure Improved risk assessment of loan to particular property. 23 Hortonworks Inc. 2011 2016. All Rights Reserved
24 Five reasons to modernize with Pitney Bowes Big Data SDKs 1 They re easy Simple and intuitive user experience Program in SQL to run processes in the Hortonworks Spatial Data Lake 2 3 4 5 They re powerful Take advantage of more data Answer questions that were too big before They re incredibly fast Process enormous amounts of data in a fraction of the time They re practical Avoid large capital outlays They ll run in the cloud They re secure Extend and enforce your Hadoop permissions Easy to manage and configure Pitney Bowes April 19, 2017 24 Hortonworks Inc. 2011 2016. All Rights Reserved