7 Steps to Virtualized Data Preparation

Size: px
Start display at page:

Download "7 Steps to Virtualized Data Preparation"

Transcription

1 7 Steps to Virtualized Data Preparation 7 Steps to Virtualized Data Preparation 1

2 Overview Today, enterprises are constantly challenged by inconsistent, incomplete and inaccurate data from myriad data sources. Data preparation is the process by which data analysts transform and organize that data into new data sets suitable for exploration and analysis. Data preparation is also a highly interactive experience, which needs to enable data analysts to quickly, accurately and independently prepare data for trusted reporting and analytics. Data is exposed to analysts in a spreadsheet-like interface, where they can then easily improve its quality, as well as enrich and shape it, to meet their analytical needs. One of the key challenges with traditional data preparation is that it is a disjointed process. Typically, data preparation and analytics processes are conducted in separate software applications, making it slow, hard and expensive to prepare and then integrate data into a second tool for reporting and analysis. Data analysts use those dedicated data preparation tools to work with relatively small data sets, which limits the flexibility and speed of data provisioning that organizations require to feed their Business Intelligence (BI) and analytics initiatives. The second major challenge encountered with most data preparation processes relates to the complications that arise from an ungoverned self-service approach. This approach, while perhaps facilitating short-term gratification, leads to a proliferation of ad-hoc data sets across the organization. This lack of data management increases costs, risks and the reliability of data preparation practices, while simultaneously decreasing its capacity for scalability, security and governance. However, with virtualized data preparation integrated into your analytics solution, you can avoid these complications and limitations. Unlike traditional data preparation tools (which extract and then prepare a data set in a separate environment), integrating data preparation capabilities into your BI platform means you do not have to move your data. Rather, you define the rules you wish to apply to your data, which are then automatically executed when you run your query. 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 2

3 Data preparation performed in a virtual environment utilizes the compute power of the server in which your data is stored. This approach reduces the costs and complexity of managing data migration processes. It also leverages the fact that compute speed continues to rapidly increase, whilst the cost of that compute power plummets. So why not make use of the benefits of faster and more powerful computing, rather than trying to wrangle limited data on a desktop environment? Further, moving data preparation practices from a desktop environment into the metadata layer of your analytics solution also enables those processes to be uniformly reflected across your data as a single source of truth. Virtualized integrated data preparation empowers IT to maintain data governance and control, data analysts to accurately integrate more data sources into your BI environment in less time, and business users to trust the validity of the data and accuracy of their decision-making. I am convinced Yellowfin and CBIG Consulting share some important thinking around virtualized data preparation and the usage of cheap compute power, rather than traditional staging of data, to deliver business users what they want. Henry Lindsay Smith Principal CBIG Consulting - APAC EMEA 7 Steps to Virtualized Data Preparation 3

4 7 Critical Capabilities of Virtualized Data Preparation This section highlights the seven steps to preparing data for exploration, reporting and analysis; via a virtualized data preparation module integrated within your analytics platform. Undertaking data preparation in this manner avoids having to move data from one environment to another, with selected data preparation rules applied to your data when submitting queries on-the-fly. Leaders in data preparation are also more likely to have a centralized system of record that connects with disparate applications and other discrete data sources. They use processes and tools to help centralize information, and feed decision-makers with cleaner, more useful data. Judith Niederschelp Managing Director Aberdeen Group Europe Model Profile Clean Shape Enrich Secure Publish page 5 page 5 page 6 page 7 page 7 page 8 page 9 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 4

5 1. Model Directly connect to and model any of your data sources held within your analytics platform. Choose the attributes you want to make available to end-users for analysis. Apply any conditions to tables to ensure only relevant data is returned. Data modeling performed within a virtualized data preparation module integrated into your analytics platform defines the logic by which queries will be run, delivering consistency across all data sets used for enterprise reporting and analysis. Virtualized data modeling ensures your data preparation processes and reporting logic can be reused throughout your BI environment. 2. Profile Confirm that the data being presented to users for analysis is complete, consistent and correct with data profiling and data preview capabilities. Easily profile your entire data set at once, or selected rows and columns, for the types of analysis you want to perform. See the number and distribution of records profiled, including statistics on the values within each column selected (such as minimum, maximum, median, average, empty and distinct values or outliers). 7 Steps to Virtualized Data Preparation 5

6 Effectively and efficiently prepare data for analysis. Embrace guided best practice metadata modeling that dynamically suggests ways to clean, shape and enrich your data based on the contents of each column profiled. Even build data quality reports with trigger-based alerts which automatically notify you when data quality metrics fall outside predefined thresholds to enable continuous data quality monitoring. Deliver more trustworthy data sources and analysis throughout the business in less time. Despite me knowing that it s really important for data to be prepared well in order to facilitate user acceptance and visibility, there s also that tradeoff of time. To literally be able to do my data profiling in one click with Yellowfin, that s incredibly powerful. Steve Remington Principal Consultant Minerra - Melbourne Singapore 3. Clean Apply filters, formatting and case statements to clean all your data used for enterprise analytics in one place from data warehouses, to cloud applications and spreadsheets. Ensure the correct values, formats and patterns are being displayed. Use data cleansing to identify then append, standardize or remove inaccurate, incomplete or irrelevant data. Improve the quality and trustworthiness of your data by ensuring consistency within, and between similar, data sets throughout your analytics environment. Safeguard the validity, accuracy, completeness, consistency and uniformity of your data to avoid costly errors and uphold the quality of your data-driven decision-making. 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 6

7 4. Shape Shaping your data into consistent structures and formats is critical, but often involves manual processes and separate tools not designed for the influx of today s data volumes and sources. Remove the bottleneck. Shape your data quickly and spend more time analyzing and acting on it with integrated data shaping capabilities. Use calculated fields, case statements and binning to quickly transform data into your desired formats for analysis. Create aggregations, new fields, custom groupings and more to deliver clean and consistent data sets that can be analyzed and combined to quickly drive your analytics initiative forward. Easily leverage your organization s plethora of disparate data sources for competitive advantage. 5. Enrich Add context to your data, deliver deeper insights and unearth interrelationships between your data sets with data enrichment. Use automated enrichment recommendations to easily merge prepackaged geospatial and demographic data with existing location-based data. Add value to your data by quickly detecting correlations between your business data and spatially relevant third-party data. Effortlessly geocode address data and key demographic statistics, to zip code level on-the-fly, to quickly produce stunning maps and actionable insights. 7 Steps to Virtualized Data Preparation 7

8 6. Secure Enterprise data needs to be both secure and accessible. Few people require rights to see all data, but many need access to certain subsets of data to support their specific business role. Use access filters to ensure particular user types and individual users can only access the data that they are permitted to see. The same balance between security and accessibility is required for data analysis. Ensure that only authorized users have the rights to access and edit your metadata layer in an approved workflow. 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 8

9 7. Publish Provide access to your clean, prepared and consistent data for analytics with a single click. Give analysts across your organization uniform access to a single source of truth for all data and data preparation processes via a governed metadata layer. Role-based access and collaboration capabilities also facilitate the sharing and reusing of data set preparations between data analysts. Use in-platform approval workflows to ensure the accuracy of all data sets from all data sources before they become part of your live reporting environment. x 7 Steps to Virtualized Data Preparation 9

10 7 Benefits of Virtualized Data Preparation Yellowfin delivers an enterprise-ready integrated data preparation module that supports the needs of data analysts, IT and business users. Empower data analysts to quickly and independently prepare data for report building and analysis, enable IT to maintain data governance and consistency, and deliver business users deeper insights from more data sources in less time. I think Yellowfin s Data Preparation Module is very useful functionality to have integrated in a BI tool. It gives you the opportunity to learn and explore your data in the preparation phase of your BI project. Peter Michael Sorensen Partner & Senior Consultant Viteco ApS - Denmark Providing data preparation processes in this manner accelerates data provisioning. Give data analysts the freedom to work the way they want, and business users fast access to the analytical insights they need, while enabling IT to efficiently manage cost and security considerations. The benefits of Yellowfin s virtualized approach to data preparation are significant. Yellowfin enables organizations to reduce the time needed to provision new data sets for analysis, enhances data quality and manageability, and provides increased trust through robust data lineage. Better still, all this is achieved at significantly lower cost and complexity compared to using separate data preparation and analytics tools Reduced costs page 11 Improved data quality page 12 Increased trust through better data lineage Increased data security page 13 Reduced time to deploy and reduced time to value page 14 Real-time data access page 14 Increased governance page 15 page 13 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 10

11 The quest for competitive advantage and the increasing volume, variety and velocity of data have placed more pressure on organizations to rethink traditional methods of preparing data for reporting, analysis, sharing and use. David Stodder Senior Director of Research TDWI Here are the top seven benefits you can expect from embracing Yellowfin s virtualized approach to integrated data preparation for analytics: 1. Reduced costs Organizations do not have to procure multiple tool sets for data preparation and analysis. Through a single integrated product, Yellowfin clients benefit from having all the functionality they require to go from data preparation to decision-making in one place without the need to pay for multiple product licenses, extra servers and infrastructure, or additional training costs to learn two applications. Organizations also avoid further expenses and complexity associated with the integration and management of data migration and disjointed data preparation practices. A range of independent industry research from firms such as TDWI, IDC and Blue Hill Research indicates that most data analysts spend more time preparing data than analyzing it. In fact, Blue Hill Research quantified the dollar cost of inefficient data preparation practices. Blue Hill Research compared the national average salary of US data analysts with average US working hours. Then, they took that average hourly wage and compared it to the average amount of time data analysts spend on data preparation using traditional ad-hoc methods, such as custom scripts or Microsoft Excel. The findings suggested that organizations could save US$22,000 per data analyst per year by implementing purpose-built data preparation technology. Imagine what further cost savings could be achieved using purpose-built data preparation functionality that was both integrated and virtualized within your analytics platform. 7 Steps to Virtualized Data Preparation 11

12 2. Improved data quality Whenever you move data, you run the risk of introducing data quality Then, address issues immediately with Yellowfin Smart Tasks and issues. By connecting directly to your data source, and preparing your closed-loop decision-making workflows. Yellowfin Smart Tasks data virtually, you mitigate these risks and ensure that your users have improve accountability and instantly turn insight into action by access to data they can trust. In addition, Yellowfin enables you to automatically generating tasks from Yellowfin broadcasts. If data falls set-up automated data quality reports. Be notified anytime your data outside predefined thresholds, a task is instantly created, assigned fails to meet data quality thresholds with trigger-based alerts. and given a deadline for completion. 7 Steps to Virtualized Data Preparation 2016 Yellowfin International Pty Ltd 12

13 3. Increased trust through better data lineage Everyone wants to know where their data comes from especially when questioning the validity of that data. Having a complete endto-end view allows you to diagnose and rectify issues faster. If you extract data from one system, prepare it in another, and then move it to a third for reporting, your users lose visibility and control. Under such circumstances, there is no mechanism that enables end-users of your reporting application to understand where the data came from, or, how it was manipulated through the data preparation process. With Yellowfin s fully integrated data preparation module, users can see the end-to-end data transformation process from data source to dashboard. Even toggle between business descriptions and native database table or column names in Yellowfin s metadata layer, enabling you to easily track data back to the specific database table or schema from which it came. 4. Increased data security Data preparation isn t just about the shape and quality of your data; it s about ensuring the security of that data as well. Moving data between applications increases the risk of unauthorized data access. Safeguard your data preparation and analytics processes in one integrated platform. Yellowfin s centralized IT-curated data security mechanisms ensure that only those users with appropriate access permissions to the data sources can prepare data for analysis. Further, fine-grained row level security also ensures end-users of that analysis can only access the subsets of data for which they have permission. 7 Steps to Virtualized Data Preparation 13

14 5. Reduced time to deploy and reduced time to value The more manual steps and applications you add into your data pipeline, the longer it takes to deliver new data sets for analysis to your end-users. It s not just the fact that you have to move data between multiple tools you also have to learn how to use each of those applications. Yellowfin s fully integrated data preparation module reduces the time it takes to deliver data for analysis and insights to the business. According to itbusinessedge.com, organizations that undertake Big Data projects without cohesive data preparation technology and strategy spend 46 percent of their time preparing data and 52 percent of their time chasing data quality and consistency issues. That doesn t leave a lot of time for building reports or undertaking analysis. Working in a single integrated application also improves cohesion between analysts responsible for data preparation and those responsible for analytics, leading to better, faster outcomes. 6. Real-time data access Traditional data preparation extracts data from its source, manipulates it, and then pushes it to an analytics tool for exploration, analysis and report building. This process takes time a lot of it. Industry research shows that data analysts who use traditional data preparation processes form a serious bottleneck, spending 60 to 80 percent of their time wrangling data instead of delivering new analytics-ready data sets or performing data analysis. And, 37 percent of respondents to TDWI s Improving Data Preparation for Business Analytics report were dissatisfied with how easily they could find and use relevant data for BI and analytics projects. So, if data freshness or real-time analytics is important to you, then traditional data preparation processes simply won t satisfy your business requirements. Only a virtualized integrated data preparation layer can achieve the outcome you need. 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 14

15 7. Increased governance Publishing approved data sets is one thing. Being able to track and audit their usage is another. When data preparation is fully integrated into a BI and analytics platform, you get the full benefits of the data governance and control incorporated within that platform. Not only do you benefit from the workflow processes, collaboration features, usage reports, as well as task and data quality management capabilities, you can also harness Yellowfin s dozens of functional, role, content and data-based security permissions. More than that, a virtualized data preparation module ensures any data transformations performed will be uniformly reflected across all content based on that metadata layer from reports and charts, to dashboards and Storyboards. Ensure consistency, governance and trust across all your analytics content. Know that the right people can access the right BI content, at the right time, with Yellowfin s governance capabilities no exceptions. 7 Steps to Virtualized Data Preparation 15

16 Summary Organizations throughout all major industries are now looking to leverage data to improve decision-making around all facets of their operations and strategies from processes, to products and services. The swiftness with which organizations are seeking to exploit increasingly large, numerous and complex data sources demands a new approach to business analytics. An approach that recognizes that data inputs and outputs are inextricably linked. It no longer makes sense to conduct data preparation in one application, and then perform data exploration, analysis and report building in another. Not only is a multi-application approach to data preparation for analytics slow and costly, migrating data between tools adds unnecessary complexity and security risks to the process. Further, the self-service approach to data preparation driven and enabled by a multi-tool process negates the ability to scale and govern those practices. Continuing to entrench the unwanted outcomes of these typical data preparation methods such as nonrepeatable manual processes, a lack of trust and islands of disparate data simply makes no sense in the age of Big Data. Robust, uniform and repeatable data preparation processes are vital for producing accurate and insightful analytics. Yellowfin s data preparation capabilities uniquely support the needs of IT, data analysts and business users in this quest. Yellowfin empowers IT to effectively and efficiently govern enterprise data and securely manage access to both data sources and analytical outputs. Yellowfin empowers data analysts to consistently perform best practice data preparation, and thereby build better analytics content and deliver deeper insights, in less time. And, Yellowfin empowers business users to quickly make accurate data-driven decisions, from more data sources, with confidence. Prepare for analytics success with Yellowfin s virtualized and fully integrated Data Preparation Module. Discover how to get your analytics project up and running fast with virtualized data preparation. Simply visit and click Try It Free. Virtualized data preparation, integrated at the metadata layer of an analytics platform, overcomes the challenges and undesirable byproducts of traditional data preparation. Going from data source to dashboard in one integrated virtualized environment is the ideal approach: It efficiently delivers a single, consistent and governable source truth for all data preparation practices and analytics content. Try for yourself The seven critical capabilities of virtualized data preparation allow organizations to easily model, profile, clean, shape, enrich, secure and publish all data desired for BI straight into a single analytics environment. A virtualized and integrated approach to data preparation also delivers organizations seven core benefits, facilitating reduced costs, improved data quality, increased trust through better data lineage, increased data security, reduced time to deploy, real-time data access and increased governance. 7 Steps 2016 Yellowfin to Virtualized International Data Preparation Pty Ltd 16