Modernizing Data Integration To Accommodate New Big Data and New Business Requirements Philip Russom Research Director for Data Management, TDWI December 16, 2015
Sponsor
Speakers Philip Russom TDWI Research Director, Data Management Ron Agresta Principal Product Manager, Data Management, SAS 3
New Checklist Report from TDWI on Data Integration Modernization The report discusses common modernizations users are applying to data integration programs today. In this webinar, we ll discuss some of the report s findings. Stay tuned, to learn how to get a free copy of the report.
Agenda Background Defining Data Integration (DI) Modernization Technology and Business Drivers High-Priority DI Modernization Tasks 1. Multiple data ingestion techniques 2. Agile data prep 3. Self-service data access 4. New data platform types 5. Right-time data movement 6. Non-traditional data 7. Integrated tool platforms Concluding Summary PLEASE TWEET @prussom, #TDWI, @SASDataMGMT, #BigData, #Analytics, #DataIntegration
DEFINING Data Integration Modernization Upgrades Newer versions of current data integration software and other middleware Bigger and faster hardware Additions to existing data integration solutions New data sources, transforms, targets, etc. More server instances, nodes, storage Use more functions in your existing DI tools Move from exclusively batch to more diverse interfaces and processing Turn on real-time functions for federation, virtualization, replication Turn on event processing to embrace streaming data Turn on text analytics to embrace unstructured data Acquire new specialized tools to complement the old ones Wide range of natural language processing (NLP) tools Native tools for Hadoop or other new environments Rip and Replace A few users may modernize by migrating to a different toolset
Big Data Drivers for Data Integration Modernization New big data sources New business analytics New data integration techniques Old and new are coexisting
NUMBER ONE Complement the high latency of older DI practices with a broader range of data ingestion techniques. Data ingestion is How, where, and how frequently data entering an environment is landed or loaded into targets Some new sources of data generate data frequently Business practices requiring fresh data continue to grow. Data ingestion practices need many speeds and frequencies. Repurposing data is more and more being done on the fly, at run time, instead of prior to load time. Many functions support varied data ingestion Event stream processing, data federation, self-service data access, data prep, micro batch, etc.
NUMBER TWO Embrace the new practices and tools of data prep, for agility, speed, simplicity, and ease of use. Data prep (short for data preparation ) is DI Light, as a subset of DI functionality, trimmed down for usability and performance Synonyms: data wrangling, munging, blending Data prep functions are built into many tool types: Data integration, quality, profiling Data exploration, visualization, analytics Data prep complements traditional data mgt Data prep originated for data exploration and discovery oriented analytics Permanent designs or highly accurate reports still require in-depth traditional data preparation
NUMBER THREE Integrate data in ways that enable self-service access to new big data for a wide range of users. Self-service data access functions are important They give data workers spontaneity, speed, agility, autonomy TDWI Survey identified top self-service tasks users want Data discovery, viz, dashboard authoring, data prep Modern DI integrates data specifically for self-service access Data warehouses and marts are still relevant But new big data may require new database types: data lakes, vaults, enterprise data hubs, maybe on Hadoop Depend on special tool functions or characteristics for self-service data access Ease of use, biz friendly data views, data prep
NUMBER FOUR Modernize your data integration infrastructure by leveraging new data platform types like Hadoop. Hadoop is an effective data landing area for many feed speeds and data types. Hadoop is a scalable data staging area. Hadoop is also suited to data archiving. Hadoop scales with push-down processing. Hadoop can offload your DI platform or hub. Other relatively new platforms Those based on columns, appliances, NoSQL, open source, etc.
NUMBER FIVE Keep adding more right-time functions as you modernize your data integration solutions. New practices discussed earlier demand right-time DI: Data ingestion assumes multiple DI speeds frequencies Data prep tends to near-time federation & micro batch Data exploration assumes immediate response for user Many right-time DI functions are available today: High performance (for fast extracts & loads), micro batch (running frequently during day), data federation (for time-sensitive metrics) Many can be configured to run at multiple right-time speeds; data replication & changed data capture Millisecond real time; streaming, event processing DI Modernization often involves using more of above
NUMBER SIX Modernize your data integration functionality, for business value from non-traditional data. Non-traditional data is Anything that s not relational or other structured data Unstructured from human language text to video Semi-structured hierarchies in JSON or XML Multi-structured a mix of the above For biz value from non-traditional data, modernize 5 layers of DI: Capture Storage Processing Structure Metadata
NUMBER SEVEN Consider modernizing your DI tool portfolio with an integrated platform of multiple data mgt tools. Defining the DI integrated platform DI and/or DQ tool at its heart, plus tools for MDM, metadata mgt, stewardship, governance, CDC, replication, event processing, data services, data profiling, data monitoring, etc. Not just a suite. All tools are integrated by sharing metadata, biz rules, master data, development artifacts, collaborative functions Strongest trend in data integration tools, by both users & vendors Away from separate best-of-breed tools toward a unified toolset Practical reasons for using a unified DI platform Greater collaboration among multiple DI/DM developers and others Single DM solutions that combine multiple DM capabilities Most of the traditional and big data functionality mentioned today in one integrated platform
CONCLUDING SUMMARY Data Integration Modernization Multiple data ingestion techniques Agile data prep Self-service data access New data platform types Right-time data movement Non-traditional data Integrated tool platforms
Download a free copy of the TDWI Checklist Report about Data Integration Modernization Download the report in a PDF file at: bit.ly/dataintmod
DATA INTEGRATION MODERNIZATION WITH SAS Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
PLATFORM SAS DATA MANAGEMENT Ingestion Data Prep Self-Service Hadoop Right-Time DI New Data Integrated Platform SAS delivers a complete, integrated platform for data access, quality, integration, management, transformation, monitoring, mastering, and governance across a wide range of use cases. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
IN-HADOOP SAS DATA LOADER FOR HADOOP Ingestion Data Prep Self-Service Hadoop Right-Time DI New Data Integrated Platform Integrated environment for self-service data preparation with data profiling, data quality, data transformation, and code execution actions processed directly in Hadoop Business user oriented web application with a guided workflow experience Automatic optimization that uses most appropriate run-time execution available Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
IN-STREAM SAS EVENT STREAM PROCESSING Ingestion Data Prep Self-Service Hadoop Right-Time DI New Data Integrated Platform Enables processing on huge volumes of streaming data flowing at very high rates with very low latency Delivers in-stream advanced analytics, decisions, and data quality transformations Supports varied use cases such as clickstream analysis, IoT sensor analysis, decision management, fraud detection, and risk monitoring Streaming Events Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
IN-FEDERATED VIEW SAS FEDERATION SERVER Ingestion Data Prep Self-Service Hadoop Right-Time DI New Data Integrated Platform Federated view building application that creates dynamic views of heterogeneous data and is made available to other systems through ODBC, JDBC, or web services Supports data masking, caching and in-view data quality transformations Offers table, row, and column level data access controls Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
IN-DATABASE SAS IN-DATABASE TECHNOLOGIES Ingestion Data Prep Self-Service Hadoop Right-Time DI New Data Integrated Platform Data transformation, data quality processing, and analytics performed directly in database or in Hadoop Data Quality Accelerators move power of SAS data quality algorithms to the data taking advantage of database parallel computing capabilities Embedded processing can be invoked from a number of different execution environments Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
SAS WANT TO LEARN MORE? Learn more about SAS Data Management http://sas.com/data Join the SAS Data Management Community https://communities.sas.com/ Follow us on Twitter: @sasdatamgmt Like us on Facebook: SAS Software Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
Questions? 24
Contact Information If you have further questions or comments: Philip Russom, TDWI prussom@tdwi.org Ron Agresta, SAS ron.agresta@sas.com 25