Datameer for Data Preparation: Empowering Your Business Analysts

Size: px
Start display at page:

Download "Datameer for Data Preparation: Empowering Your Business Analysts"

Transcription

1 Datameer for Data Preparation: Empowering Your Business Analysts As businesses strive to be data-driven organizations, self-service data preparation becomes a critical cog in the analytic process. Self-service data preparation allows organizations to evolve from being IT-led to IT-enabled, empowering the business analysts to answer more questions in greater detail than ever before. The modern analytic process is a virtuous cycle in which data and information feed each other. A day in the life of a progressive business analyst begins with a high priority question, followed by integrating data, preparing it, performing analysis, drawing some conclusions and iterating until the answers are revealed. To support this, self-service data preparation must go well beyond the typical integrating and cleansing data. It is a continuous operational process where business analysts are empowered with datasets, yet have the power to further blend and enrich the data as needed. The end goal is a collaborative process that is frictionless and operational, yet properly governed.

2 Datameer for Data Preparation Datameer offers the complete set of capabilities you need for collaborative, operational data preparation pipelines to serve the needs of both your data and business analysts. With its data preparation functionality, Datameer allows you to: Build rich domain-specific datasets Organize the data for analysis Collaborate and share datasets Empower analysts to further enrich data Enable analytic enrichment Govern and secure datasets Operationalize data pipelines Create Rich Datasets for Analysis The data complexity in modern business can be difficult for analysts to navigate. Both data and business analysts need an easy-to-use environment to define rich, domain-specific datasets that feed the deeper, diagnostic analysis the business requires. Datameer has over 70 data connectors, allowing you to bring together data from practically any system in any format, including unstructured data such as social media, web logs or machine data. Whether it s residing in applications, sitting on servers or already in Hadoop, Datameer can retrieve the data. The Datameer Flipside provides instantaneous data profiling of your data at any step in the data preparation and analytic process, so you have an accurate understanding of your data. With profiling data from the Flipside, you can quickly determine the next data preparation steps for your data. Datameer s familiar spreadsheet UI and drag-and-drop functions allow analysts to combine, cleanse and transform data as needed. They can explore the data and apply analytic enrichment or virtual columns that help pre-discover insights for other downstream business analysts. PAGE 2

3 Flexible Data Organization To find deeper diagnostic insights, big data analytics go beyond the standard slicing-and-dicing of information to find hidden patterns and trends in the data. This requires organizing the data during the data preparation phase in different ways than the typical dimensions and metrics used by standard analytics. Datameer provides built in functions that make it easy organize data for these more advanced analytics including: Sessionization Custom binning Time windowing Statistical grouping These help drive general analytical use cases such as clickstream and time-series analysis, as well as specific ones such as fraud, preventive maintenance, buying patterns and more. Collaborate With Full Governance Organizations need a highly collaborative environment for effective datapreparation pipelines and deeper analytic insights. Sharing both analytic datasets and content enables analysts to cast wide question nets while ensuring data and results are in line with corporate accuracy standards and compliance requirements. Datameer keeps a full catalog of datasets, models and derived content, and enables sharing with virtual team members across the organization. Fine-grained, role-based security is applied to these items, ensuring full control over the data and models. To maintain compliance and consistency, Datameer maintains full lineage, auditing the definition of and any changes to artifacts and calculations, supporting even the most rigid compliance processes. Datameer also includes a deep suite of encryption and obfuscation features, allowing you to keep data such as Personally Identifiable Information (PII) safe and secure. PAGE 3

4 Empower Further Enrichment For the business analyst, data blending, conditioning and enrichment are an important part of the cycle to understand data relationships and the meaning of the analytics. Once certain conclusions are drawn, the analyst may see the need to use additional data or explore it in different ways, requiring additional data preparation. Datameer s self-service data preparation approach allows business analysts to build their own distinct models and use their own data or existing curated datasets. Unique dynamic modeling capabilities don t require pre-built schemas, automatically deriving the models as more data is added or different enrichment is applied. For example, a common customer behavior model could be created by a domain expert data analyst, then shared with business analysts in marketing, sales and customer service. The downstream analysts can link their own models to the original, extending as needed. The original model would be read only to maintain its integrity and validity, and new extensions would further enrich the model to the specific needs of that business unit. Enable Analytic Enrichment When curating common datasets or preparing final datasets for analysis, domain experts will use the data discovery process to find additional analytical nuggets of information that are helpful to other analysts. This requires the application of analytic functions and algorithms that add new virtual columns showcasing these analytical nuggets. Datameer provides a rich array of capabilities to enrich data sets with analytics including: Path construction Graph analytics Statistical functions Text mining Sentiment analysis One can also apply algorithmic analytic enrichment via Datameer Smart Analytics for clustering, decision trees, data dependencies and recommendations. PAGE 4

5 Operationalize Answering modern business questions is not a one-time affair. As new data arrives, answers to the same question need to be continuously fed to business teams so they can perform timely, accurate actions. Datameer gives you a full suite of capabilities to operationalize your data preparation pipelines to deliver results to downstream analysts and business teams. This includes: Sophisticated job scheduling and management Adaptable data retention and management policies A scalable intelligent execution framework Full lineage, auditing and logging Complete security, including encryption and masking Conclusion Data preparation goes well beyond the simple tasks of integrating and cleansing data. It is highly intertwined with the analysis process and must give analysts the ability to apply manipulation, grouping and analytic functions to organize and enrich data for faster data discovery. While self-service is highly important, data preparation is not a standalone task. It is a collaborative process utilizing the expertise of various analysts across the organization in a governed manner. Datameer empowers business analysts with both curated datasets and the capability to prepare and enrich data on their own so they can ask more questions and solve more problems in a frictionless environment that is secure and governed. Irrespective of the organization of your analyst teams centralized, virtual or distributed Datameer has the complete suite of data preparation capabilities needed for modeling, enrichment, collaboration, governance and operationalization. To learn more, please visit our website at PAGE 5

6 Datameer Data Preparation Features and Capabilities Capability Datameer Features Benefit(s) Connectivity & Integration 70+ data connectors Structured, semi-structured and unstructured data Any data locations: application, database, Hadoop, filesystem, cloud and more Use and integrate any data, in any format and in any location to drive richer datasets and deeper analysis Profiling Flipside for inline data profiling and histograms Instantaneous data profiling at any point in the data preparation pipeline Cleansing Transformation Organization Data Enrichment Analytic Enrichment Algorithmic Enrichment Removal of nulls, blanks and duplicates Filtering outliers Expression rules for complex cleansing Ebbing out malformed data Column splitting Column and row pivoting Advanced text parsing Working with lists Sessionization Custom binning Time windowing Statistical grouping Expression language to enrich datasets inline Easy addition of external data including profiles, demographics, geo-spatial and more Path construction Graph analytics Math and statistical functions Text mining functions Sentiment analysis Smart Analytics for enrichment viaclustering, decision trees, data dependencies and recommendations Easy cleansing of dirty data and irregularities Easy transformation of complex datasets Multiple, flexible methods to organize data specifically to analytic needs Combine additional data sources or knowledge into models for deeper analysis Enrich data with advanced analytics for easier downstream analysis Automatically provide hidden insights to analyst teams Validation Validation of data preparation logic on full big data sets Easy assurance of consistency across the entire dataset Visualization Collaboration Governance and Security Operationalization Built-in visualization Connectivity to third-party visualization tools Sharing of datasets, analytic models and visualizations Graphical data lineage Granular, role-based security Encryption Masking Lineage Auditing Smart Execution Job scheduling Data retention policies Data management policies Auditing In-line visualization to identify trends during data preparation Feed already-existing visualization and BI tools to reuse investment and skills Collaboration and sharing for faster analytic cycles Fully governed and secure environment that meets compliance requirements Operationalize your data pipelines for regular feeds to analysts, applications and business teams PAGE 6