Big Data Live selbst analysieren Hands on Workshop zu IBM InfoSphere Big Insights Harald Gröger Wilfried Hoge Gerhard Wenzel IBM 2013 IBM Corporation
Agenda 15:00-15:10 Einführung IBM Big Data Plattform und BigInsights 15:15-15:25 Lab 1: Managing your big data environment 15:25-16:05 Lab 2: Analyzing big data with BigSheets 16:05-16:10 Demo BigSheets Highlights 16:10-16:20 Demo Textanalyse Highlights
Was ist Big Data? Volume Variety Velocity Veracity Data at Scale Terabytes to petabytes of data Data in Many Forms Structured, unstructured, text, multimedia Data in Motion Analysis of streaming data to enable decisions within fractions of a second. Data Uncertainty Managing the reliability and predictability of inherently imprecise data types.
Die IBM Big Data Zonen-Architektur Real-time Analytics Intelligence Analysis Data in Motion Ingestion and Integration Streams Integrated Exploration Decision Management Data at Rest ETL, Quality, MDM Landing, Analytics and Archive Warehouse / Marts BI and Predictive Analytics Data in Many Forms MapReduce Navigation and Discovery Hadoop Information Governance, Security and Business Continuity
Was ist Hadoop? Apache Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. MapReduce - The framework that understands and assigns work to the nodes in a cluster. HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes Scalable add nodes without changing data formats, how data is loaded, how jobs are written, or the applications on top Cost effective massively parallel computing on commodity servers with sizeable decrease in storage cost, which makes it affordable to model all your data Flexible schema-less, can absorb any type of data, data from multiple sources can be joined and aggregated in arbitrary ways enabling deep analyses Fault tolerant loss of a node results in work redirect to another location of the data and continues processing
Umfang der IBM BigInsights Hadoop-Distribution Enterprise class Quick Start Edition New for V2.1. Free. Non-production only Apache Hadoop Basic Edition Free download - Jaql - Integrated install Enterprise Edition Sold by # of terabytes managed PureData for Hadoop - Appliance simplicity Enterprise ready - Integrated web console - Administrative tools, security - RDBMS, warehouse connectivity - Enterprise Integration - Performance Optimization - Pre-built applications Analytics included - Visualization Capabilities - Spreadsheet-style tool - Big SQL - Text analytics - Eclipse development -- Accelerators PureData for Hadoop brings BigInsights as an appliance form factor to the market Breadth of capabilities 6 2013 IBM Corporation
Generelle Informationen Name Hostname der VM = bivm Login Benutzer = biadmin Kennwort = biadmin
Tutorial - Managing your Big Data environment Dauer ca. 10 Minuten Start BigInsights Web Console über Desktop Icon, dann weiter mit Chapter 2 / Lesson 1 / Schritt 3 (Seite 4).
Tutorial - Analyzing Big Data with BigSheets Dauer ca. 40 Minuten Alle Prerequisites sind bereits erfüllt. Die Daten sind heruntergeladen und importiert. Start im Files Tab der BigInsights Web Console mit Lesson 1 / Schritt 3 (Seite 14), (hdfs/biginsights/sheets/watson_data_preloaded) Ende nach Lesson 6 / Schritt 3 (Seite 21).
Console Demo
BigSheets Demo Blog News Spreadsheet Format From unstructured text to formatted spreadsheets and charts Chart
Text Analytics Demo unstructured text Labels / Examples AQL Regex / Dictionary generate From unstructured text documents to text analytics result table text highlight AQL Candidates create combination of regex and dictionaries plus distance, case,... AQL Filter Result Table result table duplicates, irrelevant candidates,...
Thank You! 2013 IBM Corporation