IBM Software The rise of machine data: Are you prepared?

Size: px
Start display at page:

Download "IBM Software The rise of machine data: Are you prepared?"

Transcription

1 IBM Software The : Are you prepared? Take action in the Internet through data-in-motion analytics

2 The : Are you prepared? The Internet What to do about the rise of the machine data: Analyze it Capitalize on real-time analytics of The connected car: Innovating for sustainability Uncovering actionable insights through the IBM big data platform Resources

3 The : Are you prepared? The Internet Fueling the big data phenomenon Millions of people rely on the Internet to socialize, collaborate and conduct business worldwide. But these numbers are eclipsed by the multitudes of Internetconnected objects more than 10 billion wirelessly connected devices today, with over 30 billion devices expected by This network of linked devices has the potential to transform the world, according to the Internet (IoT) concept. Coined at a time when was cutting edge and many people were still figuring out how to use the Internet (famously pegged by the On the Internet, nobody knows you re a dog cartoon in the New Yorker 2 ), the IoT was conceived as a network of objects equipped with radio-frequency identification (RFID) chips and similar technologies so that the objects could communicate and interact with each another. At first, proponents of IoT focused on its benefits to the retail sphere: how it could help improve inventory management, reduce waste and keep up with client demand. Now, however, the thousands of sensors, monitors and other technologies in the IoT are leading to the rise of the machine: they generate huge volumes of data that is applicable to many other industries and a broad range of business activities. 3

4 The : Are you prepared? Machine data typically comes from logs or sensor output. But machine data can be broadly defined as any information that is automatically created from a computer process, application or other machine without the intervention of a human. Such sources include health monitors, security cameras, computer networks, call detail records, financial instrument trades and more. Consider, for example, a Boeing 747 aircraft. The airliner generates data across hundreds of parameters every second, which means a three-hour flight creates Sensor sensibility Sensors are devices that detect or measure a physical property and then record or otherwise respond to that property, such as acoustics, vibration, chemicals, electric current, electric potential, magnetism, radio frequencies, environment, weather, moisture, humidity, flow, fluid velocity, light, imaging and photons, just to name a few. more than two million pieces of data. Since most planes take multiple flights per day, the amount of data aggregated over weeks, months and years is astonishing. The bottom line: data automatically generated in the IoT provides a fantastic fuel source for the era of big data. Speaking of fuel, the McKinsey Global Institute estimates that the automotive industry will be the secondlargest generator of data by This estimate is not surprising, since some plug-in hybrid vehicles generate 25 GB of data in just one hour. Just as unsurprising is the leader in machine data: the utilities industry, with its bevy of smart meters, usage trackers, geographic sensors and other monitoring technologies. 4

5 The : Are you prepared? What to do about the : Analyze it Emerging big data technologies enable a new generation of applications designed to analyze large volumes of multistructured, often-inmotion machine data to gain insights. Performing analytics on machine data can help you answer the following questions: Do you have real-time visibility into your business operations, such as customer experience and behavior? Are you able to analyze all your machine data and combine it with existing security data to predict and take action on a security threat? Are you proactively monitoring endto-end infrastructure, such as wireless networks, smart grids or manufacturing supply chains, to optimize expensive resources and deliver services when and where they are most needed? Big data has always been with us. What s new is our ability to capture and analyze more of it to achieve results faster. Handling machine data involves several specific challenges. Formats vary and are complex, plus few standards exist. A mix of streaming and at-rest data complicates the correlation and visualization of data sets. Machine data is also likely to be time sensitive, with a mixture of data refresh rates. In addition, the data may or may not have context. The solution? Continuous, extremely fast analysis of massive volumes of data in motion. Big data has always been with us. What s new is our ability to capture and analyze more of it to achieve results faster. 5

6 The : Are you prepared? Capitalize on real-time analytics of 3. Analytic toolkits and accelerators. Tools should facilitate sophisticated analytics, such as geospatial, voice, image and text, and also update models on the fly. Real-time analytic platforms consist of three core components. 1. Development environment. Developers should be able to easily and rapidly build applications and connect to new data sources. Drag-and-drop editors, wizards, visualization tools, and runtime monitoring and debuggers are a must. 2. Scale-out architecture. The supporting infrastructure should adapt to rapidly changing data formats, types and messaging protocols. It should also read from and write to a vast number of data sources. A massively parallel architecture is designed to deliver unlimited compute potential. Of the three components, the analytic toolkit is the most challenging to get right and the most important. 6 4 The connected car 5 Uncovering

7 The : Are you prepared? Accelerating real-time analytic processing InfoSphere Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from real-time sources. InfoSphere Streams is designed to handle very high data throughput rates, up to millions of events per second. A market leader in providing sophisticated analytics for IoT, IBM received the 2013 Ventana Research award for Operational Intelligence in the IT Innovation category for InfoSphere Streams. Capability Cleansing, filtering, aggregation and analytics for sensor and log data Ultra-high velocity runtime to support fast data Support for and analysis of unstructured data such as text, audio and video Powerful analytics to model real-world events with data from sensors and social media Industry solutions Value Verify veracity of data as it streams into your organization Save and persist only pertinent data Keep pace with continuing growth of sensor capabilities React in real time to real-world-aware devices Leverage toolkits and samples for more rapid time to value Use new insights from social media to better understand customers Reuse existing tools and models built in IBM SPSS, SAS, R, Matlab, Java and C++ Predict future events from current sensor data with advanced cognitive and predictive tools Gain customer insight to better target prospects through social media Utilize IBM Research-developed analytic accelerators Deploy InfoSphere Streams inside a variety of applications, including operational detection, intelligent transportation and predictive insights IBM InfoSphere Streams provides all three of these core components, including more than 10 toolkits and accelerators, to enable a variety of analytics out of the box. Geospatial: Provide high-performance processing of geospatial data Geospatial analysis requires complex mathematics such as set theory and geospatial geometry. It is used for location intelligence and location-based services for security and surveillance, geographic information systems, traffic patterns and more. The city of Dublin, Ireland, uses InfoSphere Streams to analyze 50 bus locations per second for its fleet of roughly 1,000 buses. 7

8 The : Are you prepared? Text: Parse unstructured text to detect meaning and understand context In the IoT, text analytics can be used to derive meaning from vast amounts of text, including social data, to determine sentiment and identify illegal or suspicious activities. InfoSphere Streams can complete unstructured text analysis faster and more precisely than traditional methods. It analyzes millions of messages in real time 4 and helps ensure data is trusted and secured. Time series: Detect and predict patterns and anomalies in real time Knowing the order of events can have profound impacts, for example in predicting the path of a natural disaster or picking the next best stock trade. InfoSphere Streams helps insurance companies plan for natural disasters and enables real-time public alerts. It also performs real-time analysis of sensor data collected from the Hudson River, one of the most instrumented bodies of water in the world. Telecommunications: Process call data in real time Telecommunications service providers continue to experience a huge growth in smartphone and mobile device use. Growing text and data usage creates a deluge of context- and time-sensitive data. InfoSphere Streams enables telecommunications providers to analyze billions of call data records per day to detect fraud, ensure high asset utilization and create accurate customer profiles for heightened customer service and retention. Using InfoSphere Streams, Sprint reduced storage costs by 90 percent. 8

9 The : Are you prepared? The connected car: Innovating for sustainability Here s a real-world example from the transportation industry to show how InfoSphere Streams makes real-time analytics possible: enabling connected vehicles. Increased globalization drives more integration within automotive companies Consumer, regulatory and environmental requirements drive the creation of collaborative partner ecosystems to promote innovation The automotive industry is being reshaped by changing markets and new technologies (see Figure 1). Alliances are forming as consumer and environmental requirements create the need for more collaborative partner ecosystems. Increased globalization is driving more integration within and between automotive OEMs and their suppliers. Sophisticated consumers are increasing demand for innovative and sustainable vehicles. Sophisticated consumers demand innovative and sustainable vehicles New technologies and capabilities make vehicles more intelligent Rapidly integrating enterprises drive increasingly dynamic operations Figure 1. Dramatic forces affecting the automotive industry require new approaches. 9

10 The : Are you prepared? As a result, automotive leaders are seeking new ways to quickly launch sustainable vehicles, capitalize on services opportunities, optimize the global value chain and transform the retail environment. Modern vehicles generate myriad data: driver weight, car speed, weather conditions, road status, geospatial positioning, fuel levels, tire pressure and more. Many people may be uncomfortable knowing that their weight is out on the IoT, but that data point has been used to save lives: knowing the weight of the driver and passengers enables the car to intelligently deploy airbags during a crash. Moreover, connected cars can give you a personalized passenger experience. Let s say every morning Dad drops off his kids at school, goes to a coffee shop and then heads to work. Dad s experience could be more pleasant if he knew what time to leave to optimize his commute time or if he received a coupon on his phone for a free coffee at a shop on his route. Suppose Dad has been going to a fastfood restaurant every morning as well. His insurance company may choose to send him information on healthy eating habits or a discount for a personal trainer. Here s another example: Mom has to run errands after work. She might want to know the optimal path between her office and five stores, given the time of day and traffic and weather conditions. She might also appreciate alerts about the stores closing times, so she can adjust her route accordingly. If Mom gets distracted, a connected car could automatically deploy the brakes when it senses a car nearby. 10

11 The : Are you prepared? Commercial freight companies also benefit from connected vehicles. Trucking companies use connected vehicles to assess the health of drivers and understand weather and road conditions. This data can inform their response to minimize insurance charges; for example, they may choose not to deploy a truck under harsh road conditions since insurance premiums are higher during inclement weather. They also use machine data to improve driver safety and vehicle-to-vehicle communication. Wouldn t it be useful if windshield wipers automatically turn on when raindrops fall or if an emergency vehicle was automatically routed to the scene of an accident and advised of its severity level? Data from connected vehicles gives automobile manufacturers valuable insight into fuel efficiency, engine performance, temperature and more, helping them engineer longer-lasting, more fuel-efficient cars. Cars that monitor their own health can automatically alert drivers and service departments when service is needed, giving drivers proactive updates and enabling service teams to order parts in advance. Moreover, service departments can quickly analyze warranty claims, diagnostic and fault codes, customer usage data and service records with traceability through the entire value chain. Dealers and car manufacturers can also analyze social media data in response to promotions, ads or trade shows. 11

12 The : Are you prepared? InfoSphere Streams enables analytics for these and other use cases, empowering a broad ecosystem including car manufacturers, retailers, insurance companies, trucking companies and consumers to be safer and be more productive on the road (see Figure 2). Real-time analytics are used both during and after the manufacturing process to achieve exceptional outcomes, including: Profitable aftermarket services and products Improved, interactive driving experience and safety by real-time analysis of weatherbased data or road-congestion alerts Integrated vehicle data available to third parties such as insurance companies, retailers and emergency medical services Improved quality and functionality of future products Optimization of the global value chain to improve the environment Connected vehicle Business development Alliances and partners Sales and marketing Supply-chain manufacturing Service and warranty Warehouse and inventory Engineers Product development Sales, marketing and customer service Figure 2. Data from connected vehicles helps improve the efficiency of the extended automotive experience. Monetize telematics Differentiate the driving experience Manage the supply chain to optimize inventory Improve product quality Predict parts failures Improve the design of future products Improve customer care Generate more leads and vehicle sales Improve sales of aftermarket products and services 12

13 The : Are you prepared? Uncovering through the IBM big data platform InfoSphere Streams plays an important role in the IoT, particularly in the connected car scenario. But the primary value of the IBM approach is in the comprehensive big data portfolio available to support action in the IoT (see Figure 3). Stream computing InfoSphere Streams Visualization The IBM big data platform is designed to help you gain insight from big data by delivering enterprise-class data management and advanced analytics. The platform supports ad hoc data exploration, discovery and unstructured analysis as well as structured, repeatable tasks to improve business insight regardless of the volume, variety, velocity or veracity of data. IBM MessageSight Figure 3. Connected vehicle reference architecture for the IBM big data platform. Software development Requirements management Data Warehouse IBM PureData Systems Discovery Hadoop System IBM InfoSphere BigInsights 13

14 The : Are you prepared? The IBM big data platform effectively manages and analyzes data in its native format unstructured, structured, at rest or in motion. To accelerate time-to-value, all platform components are pre-integrated. By leveraging this extensible set of capabilities, you can start with a single project using one capability and add others as needed. Components of the IBM big data platform include the following: IBM InfoSphere BigInsights for Hadoop brings the power of the Apache Hadoop framework to the enterprise. The open source Hadoop software is used to reliably manage large volumes of structured and unstructured data. InfoSphere Streams helps you turn burgeoning, fast-moving volumes and varieties of data into actionable information and business insights. IBM Watson Explorer provides federated discovery, search and navigation over a broad range of data sources, helping you get started quickly with big data initiatives and gain more value from your information. IBM PureData System for Operational Analytics part of the IBM PureSystems family is an expert integrated data system designed and optimized specifically for the demands of an operational analytic workload. IBM PureData System for Analytics is a high-performance, scalable, massively parallel system that enables you to gain deep insight from your data and perform analytics on enormous data volumes. IBM MessageSight is a full-featured messaging appliance designed for machine-to-machine (M2M) and mobile environments. It processes large volumes of events in near real time to deliver the performance, value and simplicity you need to accommodate a growing number of mobile devices and sensors. 14

15 The : Are you prepared? Resources The massive amount of machine data available today originating from IT machines, sensors, meters and more requires complex analysis and correlation across different types of data sets. Companies that can perform this analysis across a variety of data (and do so with speed and accuracy) stand to reap rewards in business efficiency, customer satisfaction and strategic success. The IBM approach to big data takes all of this into account to help you gain business insights and realtime visibility into the customer experience. To learn more about Internet, InfoSphere Streams and the IBM big data platform, check out these resources: Big data at the speed of business: Operations analysis use case Big data in action: Industry examples Read the InfoSphere Streams Playbook for deeper technical content about InfoSphere Streams Want a free copy of InfoSphere Streams Quick Start edition? Click here 15

16 Copyright IBM Corporation 2014 IBM Corporation Software Group Route 100 Somers, NY Produced in the United States of America March 2014 IBM, the IBM logo, ibm.com, BigInsights, IBM Watson, InfoSphere, PureData, PureSystems, and SPSS are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. 1 More Than 30 Billion Devices Will Wirelessly Connect to the Internet of Everything in ABI Research press release, May 9, Peter Steiner. The New Yorker wiki/file:internet_dog.jpg 3 James Manyika, et al. Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute. June technology/big_data_the_next_frontier_for_innovation 4 IBM Redbooks Solution Guide. Turning Big Data into Actionable Insight with IBM InfoSphere Streams. August The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON- INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation. Please Recycle IMM14142-USEN-01