Building Enterprise OLAP on Hadoop for Financial Services Industry

Size: px
Start display at page:

Download "Building Enterprise OLAP on Hadoop for Financial Services Industry"

Transcription

1 Building Enterprise OLAP on Hadoop for Financial Services Industry Luke Co-founder & CEO of Kyligence Creator & VP of Apache Kylin Microsoft Regional Director & MVP

2 About Kyligence Kyligence booth: #855 Formed by creators of Apache Kylin in 2016 Offers Enterprise and Cloud version of Apache Kylin Funding from Redpoint, Cisco, CBC and Shunwei Member of Microsoft Accelerator Shanghai 2017 Dual HQ in Silicon Valley & Shanghai, China

3 Transition to Big Data How about your traditional data warehouse? How about your existing OLAP/BI application?

4 Data Warehouse/OLAP in Financial Services Industry o The biggest industry rely on DW/OLAP application o Thousands applications build on top of EDW o Experienced analysts with decade expertise in data but not in technologies

5 Enterprise Data Warehouse Architecture Presentation Visualization OLAP Data Mart Enterprise Data Warehouse o Optimized for missioncritical analytics o Well modeling o Best practices of industry o Thriving ecosystem o Trained experts everywhere Data Source

6 But you are asked to o Migrate or build existing OLAP/BI app to Big Data o Better performance just because you have Big Data now o Train yourself to learn MR/Spark/ML and AI

7 OLAP: The Missing Part of Big Data Presentation Visualization o o o o o Too many options Low performance Long learning curve Compatibility issue Technology vs Data Data Lake Hive Impala Spark SQL Drill MapReduce Spark Data Source

8 Apache Kylin: Bring OLAP back to Big Data Presentation Visualization OLAP Data Mart Data Lake Data Source Hive Impala Spark SQL Drill MapReduce Spark o MOLAP on Hadoop o Simplified Data Modeling o Optimized for aggregation query o ANSI SQL o Native on Hadoop o On-Prem & In the Cloud

9 Kylin vs Hive: Star-Schema Benchmark Response Time (seconds) Apache Kylin vs. Apache Hive (lower is better) * Based on 4 Nodes, 16 Core CPU, 96 GB Memory per node KAP Apache Kylin Apache Hive Data Volume (Scale Factor)

10 Global Users 500+ use cases in production global Internet FSI Telecom Manufacturing Others ebay ABC China Mobile SAIC MachineZone Yahoo! Japan CCB China Telecom HUAWEI Glispa Baidu CMB Chine Unicom Lenovo Inovex Meituan CPIC AT & T OPPO Adobe NetEase Citic Bank XIAOMI iflytec Expedia JD.com VIP.com 360 Toutiao China Unionpay HUATAI Securities GUOTAI Securities Lufax VIVO Data collected from public information and kylin community

11 Enterprise OLAP on Hadoop

12 Kyligence: Enterprise OLAP on Hadoop Kyligence Solutions Kyligence Analytics Platform (KAP) Kyligence Robot Online Optimize & Tuning Services KyAnalyzer Agile BI Apache Kylin Open Source OLAP On Hadoop KyStudio Model Designer KyManager Administrator Tool KyStorage Columnar Storage Security Cell Level ACL On-Demand Deployment On-Premises Hybrid In the Cloud

13 Kyligence: Enterprise OLAP on Hadoop Kyligence Analytics Platform (KAP) Query Pushdown: minutes latency Cube Access: sub-second latency Hive Spark SQL Data Exploration/Discovery Impala Intelligent Cubing by KAP Mission Critical Analytics

14 Support Data Exploration and Discovery

15 TPC-DS KAP: TPC-DS Hive: 33 queries can t support or run out of time KAP: all 99 queries supported Routine query between SQL on Hadoop and Apache Kylin

16 Speed Up Mission Critical Analytics

17 TPC-H Benchmark 60 KAP vs SparkSQL 2.1 (lower is better) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 SparkSQL 2.1 KAP 2.4

18 Kyligence Studio: Data Modeling Designer o Drag & Drop o Smart Data Modeling o Intelligent Optimization

19 Integrate with Business Intelligence tools

20 Seamless Integration with BI tools o KyAnalyzer o Tableau o Power BI/Excel o IBM Cognos o MicroStrategy o Superset o Zeppenlin o Saiku o

21 Enhanced Security and Management Cell Level ACL/SSO/LDAP/Kerberos

22 Use Case: CPIC

23 CPIC: China Pacific Insurance (Group) Co., LTD Global Fortune 500 insurance company Top 2 insurance company in China $40+ billion revenue 8+ million customers 97,000+ employees

24 Challenges Legacy IBM Cognos + DB2 solution can t support Big Data scenarios Long waiting time (minutes ~ hours for reporting) Low concurrency (100,000+ employees!) High cost

25 Journey of Kyligence Analytics Platform ~ KAP POC: Performance Testing Query Latency Concurrency No changes on Hadoop side No additional ~ Development Fixed Reports Flexible Reports ~ KAP POC: Compatibility Cognos Connection Cognos Syntax engineers required Most of work done by analysts ~ Go alive All dataset aggregation and testing Fixed Reports released

26 KAP + Cognos: Deployment Dynamic Report JDBC Fixed Report KAP Query Server ODBC Reporting & Dashboard OLAP & Data Mart Big Data Platform

27 Benefits after Adopting Kyligence One-stop BI platform generates complicated reports Over 90% queries return within 3 seconds (including high-dimensional queries) Seamless integration with IBM Cognos, no change at front-end 2 KAP cubes replaced IBM Cognos cubes Cost reduced significantly by adopting open source technology

28 Customer Quote Kyligence enables us to find valuable insights faster from every insurance policy within seconds. Kyligence s platform allows us to achieve more with less. Our lean management system has improved significantly -- Minchen Wu, Depute GM of IT, CPIC

29 China Construction Bank (CCB): 2nd Largest Bank in the World Fusion Big Data Platform Apache Kylin is last piece of puzzle to serving data asserts management between legacy DW and new Big Data. -- Zhi Zhu, Vice Senior Manager of Tech Dept, CCB Open: Connect to Teradata/Greenplum and IBM Cognos/Saiku Flexible: Self-Services for end users Efficiency: Speed up PC and Mobile analytics experience

30 Enterprise OLAP on Hadoop Speed Up Mission Critical Analytics Booth #855