Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO

Similar documents
Engaging in Big Data Transformation in the GCC

White Paper Describing the BI journey

USING BIG DATA AND ANALYTICS TO UNLOCK INSIGHTS

E-Guide BIG AGENDAS FOR BIG DATA ANALYTICS PROGRAMS

Louis Bodine IBM STG WW BAO Tiger Team Leader

THE DATA WAREHOUSE EVOLVED: A FOUNDATION FOR ANALYTICAL EXCELLENCE

The Benefits of Modern BI: Strategy Companion's Analyzer with Recombinant BI Functionality

NICE Customer Engagement Analytics - Architecture Whitepaper

I D C T E C H N O L O G Y S P O T L I G H T

In-Memory Analytics: Get Faster, Better Insights from Big Data

Realising Value from Data

WHITE PAPER Microsoft SQL Server 2005: Bringing Business Intelligence to the Masses

The Banking Sector in the Age of Digital Transformation

I D C M A R K E T S P O T L I G H T. S i l o s a n d Promote Business Ag i l i t y

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

DATASHEET. Tarams Business Intelligence. Services Data sheet

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Converting Big Data into Business Value with Analytics Colin White

Integrated Social and Enterprise Data = Enhanced Analytics

How Data Science is Changing the Way Companies Do Business Colin White

Next Generation Services for Digital Transformation: An Enterprise Guide for Prioritization

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

IT Decision Makers Get Information Workplace Platforms But Strategies And Implementations Are Just Beginning To Break Silos

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science

Data Warehouse Trends Report

Big Data: A BIG problem and a HUGE opportunity. Version MAY 2013 xcommedia

WHITE PAPER. Loss Prevention Data Mining Using big data, predictive and prescriptive analytics to enpower loss prevention.

Global Headquarters: 5 Speen Street Framingham, MA USA P F

The Industry Leader in Data Warehousing, Big Data Analytics, and Marketing Solutions

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

IDC MaturityScape Benchmark: Big Data and Analytics in the United States

Developing a Strategy for Advancing Faster with Big Data Analytics

Government Business Intelligence

Managing Data to Maximize Smart Grid Benefits

Operating in a Big Data World. Thinking about ROI

White Paper. SAS IT Intelligence. Balancing enterprise strategy, business objectives, IT enablement and costs

Customer Experience and Analytics Maturity Model.

5th Annual. Cloudera, Inc. All rights reserved.

Common Customer Use Cases in FSI

Big Data The Big Story

Six Critical Capabilities for a Big Data Analytics Platform

An Effective Convergence of Analytics and Geography

Evolution to Revolution: Big Data 2.0

Retail Business Intelligence Solution

The Importance of good data management and Power BI

The Future-Ready Enterprise

Information On Demand Business Intelligence Framework

Modern Payment Fraud Prevention at Big Data Scale

Operationalizing Analytics

CORE APPLICATIONS ANALYSIS OF BUSINESS-CRITICAL ADABAS & NATURAL

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

Luxoft and the Internet of Things

Connected Banking Through Enhanced B2B

Business Insight and Big Data Maturity in 2014

T E C H N O L O G Y S P O T L I G H T

INTELLIGENT SUPPLY CHAIN REINVENTING THE SUPPLY CHAIN WITH AI THE POWER OF AI

Data. Does it Matter?

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

I D C V E N D O R S P O T L I G H T. I n f ormation Discove r y a n d K n ow ledge D i s c o ve r y in the Era of Cognitive Ap p l i c a t i o n s

DATA, DATA, EVERYWHERE: HOW CAN IT BE MONETIZED?

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

Business is being transformed by three trends

Simplifying Data Protection with Next-Generation Converged Infrastructure

Hybrid Data Management

Big Data s Big Impact on Businesses. Webconference : Jan 29, 2013

I n d u s t r i a l IoT Platforms Pave t h e W a y f o r t h e S m a r t F a c t o ry

Moving From Contact Center to Customer Engagement

InfoSphere Warehousing 9.5

PORTFOLIO MANAGEMENT Thomas Zimmermann, Solutions Director, Software AG, May 03, 2017

E-Guide REAPING THE BENEFITS OF BIG DATA AND REAL-TIME ANALYTICS

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

AI in ITSM. Automate your IT to deliver great experience.

An Overview of the AWS Cloud Adoption Framework

Democratized Artificial Intelligence: A Pathway for Building Intelligent Products & Reinventing Business Models

RETAIL ANALYTICS KX FOR RETAIL SOLUTIONS

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved.

DIGITAL TRANSFORMATION (DX)

A complete service guide for MICROSOFT DATA ANALYTICS ENABLEMENT

GUIDEBOOK ADAPTIVE INSIGHTS

E-Guide THE EVOLUTION OF IOT ANALYTICS AND BIG DATA

InfoSphere Software The Value of Trusted Information IBM Corporation

TDWI Analytics Principles and Practices

White Paper. Checklist For Achieving BI Agility: How To Create An Agile BI Environment

Analytic Workloads on Oracle and ParAccel

Leveraging Effective Application Discovery, Delivery, Change, and Quality Strategies for Digital Transformation


The Future of the Enterprise Core

Software Defined is the new Black. Craig McKenna Director, Cloud & Cognitive Data Solutions IBM Systems, Asia Pacific

Designing an Analytics Strategy for the 21 st Century

When big business meets big data, a dynamic approach to analytics is essential

When the Status Quo Means Getting Left Behind. Accelerating Analytics Platform Adoption through Evolving Technology

Getting Big Value from Big Data

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Investor Presentation. Fourth Quarter 2015

El Nuevo Entorno del BI & Analytics: Tecnologías, Roles y Resultados

3 STEPS TO MAKE YOUR SHARED SERVICE ORGANIZATION A DIGITAL POWERHOUSE

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Transcription:

w h i t e p a p e r Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO September 2011 By Philip Carter Sponsored by

white paper Big Data Analytics: Brave New World of Big Data The Big Data Era has arrived multi-petabyte data warehouses, social media interactions, real-time sensory data feeds, geospatial information and other new data sources are presenting organisations with a range of challenges, but also significant opportunities. IDC believes that as CIOs start to adopt the new class of technologies required to process, discover and analyse these massive data sets that cannot be dealt with using traditional databases and architectures, it will become clear that the real value will be derived from the high-end analytics that can be performed on the increasing volumes, velocity and variety of data that organisations are generating or Big Data analytics. One of the key differences between analytics in the traditional mode, and what we are dealing with in terms of the Big Data era is that we are gathering data that we may or may not need and from the perspective of analysis, this means we don t know what we don t know hence, the variables and models are likely to be entirely new, requiring a different infrastructure strategy and perhaps most importantly, new skill sets. The objective of this white paper is to explore the initial impact that Big Data is having on organisations, particularly the IT departments which is being forced to re-assess architectures, delivery models and future roadmaps. It will explore the following areas in more detail: Defining Big Data. This is not in the context of the quantity or threshold that actually quantifies Big Data (as this is changing all the time, and will be applied differently, depending on the vertical and market segment), but more in terms of a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture, discovery and/or analysis. Hadoop, Mapreduce, Key Value Store? There is a lot of hype around the new technologies that are being used by the market to deal with the Big Data phenomenon. We will highlight some of these and their relative importance. The Value of Big Data in Analytics. The bottom line here is that it is getting more complicated to process and analyse these 1

large and growing data sets and it essentially requires a re-assessment of the broader information management strategies for the majority of organisations that have started their business analytics journey. Why Big Data Analytics is Important (and Different). Many have asked the question what is new with this trend? This section will highlight the traditional use of business analytics in the old pre-big Data world, versus Big Data analytics in the New World. This will also look at the various use cases that IDC expects to see being most commonly used across a variety of industries. The Skill Factor the Rise of the Data Scientist. With the raft of new technologies and organisational structures that need to be put in place as the Big Data phenomenon becomes a reality, there will be increasing demand for data scientists the next-generation analytical professionals who are able to extract information from large data sets and then present value-added content of business value to non-data experts who also have the unique skill of understanding the new models that need to be put in place. Mapping out the Big Data Analytics Journey. The Big Data analytics journey will be an iterative one it is therefore important to map this out in the context of a broader framework. This section aims to do exactly that, and also provide some recommendations to CIOs as they embark on this exciting journey into the brave new world of Big Data analytics. Situation Overview The Rise of Business Analytics Much has been written on how the amount of data in the world is exploding in volume. According to the recent IDC Digital Universe study, the amount of information created and replicated will surpass 1.9 zettabytes (1.8 trillion gigabytes) in 2011 growing by a factor of 9 in just five years. Big data is a dynamic that seemed to appear from almost nowhere. But in reality, Big Data is not new and it is moving into mainstream and getting a lot more attention. The growth of Big Data is being enabled by inexpensive storage, a proliferation of sensor and data capture technology, increasing connections to information via the cloud and virtualised storage infrastructure, as well as innovative software and analysis tools. It is no surprise then that business analytics as a technology area is rising on the radars of CIOs and line-of-business (LOB) executives. To validate this, as part of a recent survey of 5,722 end users in the US market, business analytics ranked in the top five IT initiatives of organisations. The key drivers for business analytics adoption remained conservative or defensive. The focus on cost control, customer retention and optimising operations is likely a reflection of the continued economic uncertainty. However, 2

top drivers vary significantly by organisation size and industry. Similarly, IDC surveyed 693 European organisations in February 2011 where 51% of respondents said that BI and analytics are high-priority technologies. In emerging markets such as Asia/Pacific, the focus is very much on capturing the next wave of growth. According to more than 1000 CIOs and LOB executives that were interviewed as part of the Asia/Pacific C-Suite Barometer in February 2011, business analytics was rated as the number one technology area that would enable their organisations to gain a competitive edge in the year ahead. Figure 1: The Rise of Business Analytics Q: You (CIO/CTO) mentioned harnessing ICT to gain competitive advantage which of the following technologies or solutions would be your leading choice to better harness ICT? TOP 5 Business intelligence/ analytics Network Social media/ online channel Collaboration (including video, mobility,) Cloud computing/ services 0 5 10 15 20 25 30 35 % Source: IDC, 2011 With more businesses in Asia investing in IT to ride the hyper growth wave in emerging markets, they are harnessing analytics-led solutions to gain better customer insights, manage risk and financial metrics more effectively, and at the same time, strive for unique market differentiation. Historically, organisations have made significant investments in applications with the objective of automating business processes and capturing data to improve operational efficiency. Many of these projects are still ongoing, but what is becoming increasingly clear to the senior management of these entities is that they (and their business managers) have not been able to get hold of the right information (mainly due to poorly integrated systems and questionable data quality) at the right time (due to performance and scalability issues) to the right stakeholders within their organisations for the critical decision-making capabilities needed to drive the necessary business impact. And where they are unable to do this, the line of business is procuring and deploying their own solutions in a new wave of shadow IT investments focusing on business analytics, thereby forcing CIOs to re-examine these issues with a specific focus on driving better IT-business alignment. These are taking place even without the Big Data dynamic in the picture which when added, creates the perfect storm for Big Data analytics to take centre stage. 3

A Note on Terminology: BI or Analytics? We have some challenges when defining and using terminology for business analytics. Because the BI market is mature, many terms have been around for a long time and have either become obsolete or have been redefined over the years. For example, the term BI itself is sometimes used in a narrow sense (only query, reporting, and analysis [QRA] technology) and at times, in a broad sense to refer to the whole of what IDC calls business analytics (including data warehousing and analytic applications in addition to front-end tools). The term analytics is relatively new and its meaning is often unclear does it refer to advanced analytics including predictive analytics, optimisation and forecasting, or analytic applications? In some submarkets, such as Web analytics, the term analytics simply means a dashboard on top of some data. For the purpose of this white paper, we interpret BI to mean either QRA tools or BI across the board (in its narrow definition), or business analytics (in its broad definition) in IDC terminology. We interpret analytics to mean either advanced analytics (data mining, statistics, optimisation and forecasting) or analytic applications (FPSM, CRM and marketing analytics, supply chain analytics, etc.). Business Analytics is a combination of the above (and also includes data warehousing technologies) and this is highlighted by IDC s Business Analytics Taxonomy for 2011 (see figure 2 below): Figure 2: IDC Business Analytics Taxonomy Performance Management & Analytic Applications Business Intelligence Tools Financial Performance & Strategy Management Budgeting, Planning, Consolidation, Profitability, Strategy Management CRM Analytic Applications Sales, Customer Service, Contact Centre, Marketing, Web Site Analytics, Price Optimisation Query, Reporting, and Analysis Tools Dashboards, production reporting, OLAP, ad-hoc query Supply Chain Analytic Applications Procurement, logistics, inventory, manufacturing Production Planning Analytic Applications Demand, supply, and production planning Services Operations Analytic Applications Financial services, education, government, healthcare, communications services, etc. Workforce Analytic Applications Advanced Analytics Tools Data mining and statistics Content Analysis Tools Spatial Information Analytics Tools Data Warehouse Management Platform Data Warehouse Management Data Warehouse Generation Data extraction, transformation, loading; data quality Source: IDC, 2011 4

Defining Big Data Big Data is not so much about the content that is created, nor is it even about consumption. It is more about the analysis of the data and how that needs to be done. It is not really a thing, but instead a dynamic/activity that crosses many IT borders. IDC defines Big Data in this way: Big Data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis. Figure 3: Defining Big Data Data Volumes Unstructured Data (Video, rich media etc) Data = Big, Complex, High Velocity & Wide Variety Semi-Structured (e.g. Weblogs, social media feeds) Time Source: IDC, 2011 The Volume. One is embodied more in the structured data realm. Some of this is held in transactional data stores and is linked to the ever-present electronic trail that individuals and businesses create in the wake of rapidly increasing online activity. Sensory data (machine-to-machine) contribute to this area too. The other is in existing data warehouses or data marts, which have over time grown to petabyte scale. The Variety. The other aspect of this Big Data phenomenon is the need to analyse semi-structured and unstructured data. Text, video and other forms of media will require a completely different architecture and technologies to perform for the required analysis. For example, if you look at the social media phenomenon, many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube. This dynamic becomes more complex in Asia with local social media sites like RenRen in China and Nate in Korea. The Velocity. There will also be demand to analyse this data on a more regular basis for example, taking into account all transactions rather than a sample to obtain a more complete view of risk on a trade in real time. In summary, Big Data refers to data sets whose volume, variety, velocity and complexity make it impossible for current databases and architectures to store and manage. IDC intentionally does not define Big Data as larger than a certain threshold (i.e. terabytes), mainly since this threshold would be a moving target depending on the sector, as well as the fact that it will obviously grow over time. More important is the value that organisations can derive from this phenomenon and the resulting need to rethink their information strategies to extract the value. 5

Other Definitions: Hadoop, Mapreduce, Key Value Store With the focus on Big Data going mainstream, a range of new technologies have hit the market. The table below gives an overview of these technologies, with associated context (note that the list is not exhaustive). Table 1: Big Data Technologies (Terminology) Technology Big Table Cassandra Data Warehouse & Analytical Appliance Distributed System Google File System Hadoop HBase MapReduce Non-relational database/ Key Value Store Context Proprietary distributed database system built on the Google File System. Inspiration for HBase. An open source (free) database management system designed to handle huge amounts of data on a distributed system. This system was originally developed at Facebook and is now managed as a project of the Apache Software foundation. Consists of an integrated set of servers, storage, operating system(s), database, business intelligence, data mining and other software specifically pre-installed and pre-optimised for data warehousing. Multiple computers, communicating through a network, used to solve a common computational problem. The problem is divided into multiple tasks, each of which is solved by one or more computers working in parallel. Improved price:performance ratio, higher reliability and more scalability. Proprietary distributed files system developed by Google: part of the inspiration for Hadoop. An open source (free) software framework for processing huge data sets on certain kinds of problems on a distributed system. Its development was inspired by Google s MapReduce and Google File System. It was originally developed at Yahoo! and now managed as a project of the Apache Software Foundation. An open source (free) distributed, non-relational database modeled on Google s Big Table. It was originally developed by Powerset and is now managed as a project by the Apache Software Foundation as part of Hadoop. A software framework introduced by Google for processing huge data sets on certain kinds of problems on a distributed system. Also implemented in Hadoop. A non-relational database is one that does not store data in tables (rows and columns) in contrast to a relational database. Key Value Stores allow for the management of schema-less (nosql) entities. Although some of these terms will be used throughout this white paper, the focus is not to examine them in too much detail because as one IT executive recently mentioned to know the technology is one thing, but to apply it in the right environment is something entirely different. The new technology needs to be tied back to business requirements as much as possible not just examining the technology for the sake of it. Having said that, most IT executives are not aware of the technologies and trends developing in this area and where they are aware of it, their strategy is to put a couple of people in their enterprise architecture team to experiment with the new technologies (i.e. in memory, Hadoop, MapReduce, Key Value Stores etc) that are being used to deal with the Big Data phenomenon. 6

Big Data Analytics: The Old World vs. The New Era Many have asked the question what is new with this trend? This section highlights the traditional use of business analytics in the old pre-big Data world, versus Big Data analytics in the Brave New World. This will also look at the various use cases that IDC expects to see being used most commonly across a variety of industries. The majority of IT organisations have progressed in terms of their infrastructure architectures over time; from predominantly mainframe-based environments in the 1980s to a focus on clientserver in the 1990s and the Web at the turn of the century, to what is now popularly known as private cloud. This supposed state of nirvana constitutes a consolidated, virtualised set of infrastructure resources (server, storage and network) that can be self-provisioned in an automated fashion by business users complete with SLAs that have the security, performance, availability and cost profiles transparent to all in the form of a service catalog. Very few organisations, if any, have achieved this state of infrastructure nirvana, and are still battling with a spaghetti-like tangle of compute resources in their datacenter. And now, we have this external force of Big Data as mentioned earlier that is forcing CIOs to rearchitect their infrastructure particularly in the context of how analytics capabilities are deployed in an enterprise-wide fashion. Below is an overview of the changes that IDC sees happening in the infrastructure world that is increasingly impacting the Big Data analytics world: Table 2: Old World vs. New Era (Big Data Infrastructure) Old World New Era Tenancy Infrastructure Silos Pooled resources Architecture Performance tuned Linear scalability (linked to distributed parallel processing and in memory storage) Delivery Model On Premise Hybrid (with cloud bursting capabilities) and widespread use of the appliance 7

Based on IDC s research in this space, here are three suggestions for CIOs in dealing with these issues: Cloud Bursting. The private cloud journey will line up well with the enterprisewide analytical requirements highlighted earlier, but CIOs need to ensure that workload assessments are conducted rigorously and that risk is mitigated where possible. Critical to this approach will be the evaluation of cloud bursting capabilities from external vendors (i.e. Infrastructure as a service), particularly as organisations start to leverage more real-time analytics environments, to ensure that the use of infrastructure resources maps closely to demand and that there are no issues in terms of performance and availability. Analytical Appliance. In terms of delivery models, IDC has seen significant performance benefits from analytical appliances for customers that are dealing with the impact of Big Data. In addition, since the software is optimised and pre-integrated with appliances, the deployment timeframes are typically shorter. As part of a recent global survey of CIOs, 10% of the respondents indicated that they will be looking at analytical appliances as a delivery model in 2011. IDC also believes that the demand for reference architectures will rise as CIOs look to integrate these appliances within existing data warehousing environments. In line with this increased adoption of the analytical appliance as a delivery model, IDC believes that IT departments will allocate less budget towards technical skills (i.e. installation, configuration and management), and more on the high-end analytical skills needed to help drive the necessary business impact across multiple functions. Enterprise Architecture. Enterprise analytics needs an enterprise architecture that scales effectively with growth and the rise of Big Data analytics means that this issue needs to be addressed more urgently. Organisations need to look at creating a high performance analytical environment that leverages in-database analytics, parallel processing as well as in-memory storage to deal with the increased volume, velocity and variety of data. Particularly, in terms of dealing with unstructured data, more attention needs to be paid to Hadoop an open source software framework set up by Apache that allows for the distributed processing of large data sets across clusters of computers. However, there will be an ongoing tension between global standards and local requirements and the use of Hadoop would be a good example of this. Another would be the ability to process mixed workloads (e.g. analytical and operational) in the same infrastructure environment such as the appliance that was mentioned earlier. CIOs need to consider ways in which they can deliver value in terms of solving specific business problems, while at the same time, being cognizant of global architecture standards and specifications. While certain global governance models will not allow for the usage of some of these technologies in a production environment, business expectations will force IT departments to re-assess the way the enterprise architecture agenda is utilised at a local level. 8

The bottom line here is that it is getting more complicated to process and analyse these large, complex and growing data sets and it essentially requires a re-assessment of the broader information management strategy for the majority of organisations that have started their business analytics journey. But the impact is potentially enormous. If you look at optimising the price on every item in a global retail chain or detecting fraud in real time you get a sense of the type of problems that Big Data analytics can be used to solve. Table 3: Old World vs. New Era (Big Data Analytics) Old World New Era Data Sets Predefined All-encompassing and iterative Data Velocity Batch Proactive and dynamic (real-time where appropriate) Data Analysis Predominantly Historic Predictive, Forecasting & Optimisation However, despite the clear potential of such analytics it is important to understand that it will not necessarily be relevant or applicable to every use case. IDC believes that these use cases can be best mapped out across two of the Big Data dimensions namely velocity and variety as outlined below: Figure 4: Potential Use Cases for Big Data Analytics Real time Credit & Market Risk in Banks Fraud Detection (Credit Card) & Financial Crimes (AML) in Banks (including Social Network Analysis) Event-based Marketing in Financial Services and Telecoms Markdown Optimization in Retail Claims and Tax Fraud in Public Sector Data Velocity Predictive Maintenance in Aerospace Social Media Sentiment Analysis Demand Forecasting in Manufacturing Disease Analysis on Electronic Health Records Traditional Data Warehousing Text Mining Video Surveillance/ Analysis Batch Structured Semi-structured Unstructured Data Variety 9

A better sense of the potential impact of deploying Big Data analytics to drive high value impact can be derived by exploring these use cases in more detail: Real-time Fraud Detection in Banks. Involves the ability to detect, prevent and manage fraud across multiple products, lines of business and channels for a bank. This requires the ability to capture the history for different types of entities (e.g. card, account, customer, terminal ID or IP address) involved in transactions, amplifying accuracy in detecting customer behaviours that fall outside the norm during point-of-sale (POS) transactions. This information can be used by multiple predictive models, for fraud detection and credit risk assessment. Markdown Optimisation in Retail. The ability for retailers to optimise prices for a wide range of products in real time based on demand forecasting scenarios (that include the impact of promotions, seasonality and important calendar events) has a major impact on margins. These capabilities can also be augmented by social media sentiment analysis to ascertain customer demand for certain products on a more real-time basis. Disease Analysis on Electronic Health Records. As healthcare services evolve, analysts can get hold of a patient s entire medical history in electronic format. This will present a major opportunity for Big Data analytics. For example, in the case of a disease such as diabetes, the ability to correlate patient medical history with dietary data (potentially from market basket analysis in retail) and optimised exercise schedules will provide medical practitioners with new insights that they had only previously dreamt of. The Skill Factor As highlighted earlier, IDC believes that the real value from Big Data will be derived from the high-end analytics that can be performed on the increasing volumes, velocity and variety of data that organisations are generating. In Asia (outside some of the MNCs because this is mainly being driven out of the US and Europe), most organisations are not aware of the type and level of skills that are required. IDC also believes that this is linked to the general lack of awareness and skill available historically in the high-end analytics arena (regardless of the Big Data phenomenon). High-end analytics will require new sets of skills in two key categories: Technical skills. For the new class of technologies required to process, discover and analyse these massive data sets that cannot be dealt with using traditional databases and architectures (i.e. in memory, Hadoop, MapReduce, Key Value Stores etc). Some of these technologies will be delivered as an appliance and skills to better understand how the software interacts with the hardware to leverage the data will be required. The new type of business analyst/ statistician. One of the key differences between analytics in the Old World and what we are dealing in terms of the Big Data era is that we are gathering data that we may or may not need and from the perspective of analysis, this means we don t know 10

what we don t know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way the analytical power users approach their work by creating a Sandbox Mentality where discovery is always the starting point. Generally, a background in data mining and statistics would be a good starting point for this type of analysis. Moving forward, there will be increasing demand for data scientists the next-generation business analyst with strong statistical skills who are able to extract information from large data sets and then present value to non-analytical experts but with the unique skill of understanding the new algorithms and analytical models that will have the most significant business impact in the short term. Globally, IDC is seeing a lot of interest in this more analytically inclined skill set. Roles and responsibilities have not been defined but it basically fits in with the earlier comments in terms of we don t know what we don t know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. It requires a very out-of-the-box type and creativity in terms of the analytics that needs to be done on these new data types and structures. For example, if you look at the social media phenomenon (contributing to the semi-structured and unstructured data part of Big Data), many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube (massive amounts as you can expect). This dynamic becomes more complex in Asia with local social media sites like RenRen in China and Nate in Korea. Currently, IT is not the first port of call for the chief marketing officer since it lacks the skills to understand what needs to be done (and in many cases, is still trying to work out what role it should play in the policy or governance of the use of social media). So the make-up of the IT department needs to be re-assessed in terms of technical, business and relationship skills. The maturity model below highlights how IDC sees these skills (both technical and business) mapping out in the context of the organisations that have adopted business analytics over time with a view to how this could evolve in the era of Big Data analytics: 11

Figure 5: The Big Data Analytics Maturity Model Phase Old World New Era Impact Pilot Departmental Analytics Enterprise Analytics Big Data Analytics Staff Skills (IT) Little or no expertise in analytics basic knowledge of BI tools Data warehouse team focused on performance, availability and security Advanced data modelers and stewards key part of the IT department Business Analytics Competency Centre (BACC) that includes data scientists Staff Skills (Business/IT) Functional knowledge of BI tools Few business analysts limited usage of advanced analytics Savvy analytical modelers and statisticians utilised Complex problem solving integrated into Business Analytics Competency Centre (BACC) Technology & Tools Simple historical BI reporting and dashboards Data warehouse implemented, broad usage of BI tools, limited analytical data marts In database mining, and limited usage of parallel processing and analytical appliance Widespread adoption of appliance for multiple workloads. Architecture and governance for emerging technologies Financial Impact No substantial financial impact. No ROI models in place Certain revenue generating KPIs in place with ROI clearly understood Significant revenue impact (measured and monitored on a regular basis) Business strategy and competitive differentiation is based on analytics Data Governance Little or none (Skunk works) Initial data warehouse model and architecture Data definitions and models standardised Clear master data management strategy Line of Business (LOB) Frustrated Visible Aligned (including LOB executives) Cross-departmental (with CEO visibility) CIO Engagement Hidden Limited Involved Transformative % of Customers (IDC Estimates) 20% 65% 10% 5% In terms of capturing and developing the right skills in the era of Big Data analytics, the creation of a Business Analytics Competency Centre that sits across the business and IT departments will be critical. IDC believes that this type of structure not only clarifies the roles and responsibilities of key stakeholders for this transformation, it also drives internal visibility, provides a mechanism for education as well as bridging the IT/business gap (and the marketing and sales teams in particular as key individuals from these departments will need to be represented) since improving decision making amongst front-office staff will be the primary focus of these projects. In conjunction with the skills dimension, IDC believes that this structure should be involved in the following areas: Technology identification/deployment Business case creation and ROI justification Data governance frameworks with clear policies and guidelines around master data management, data quality and data models Ensure IT/Business alignment by involving the critical stakeholders at the right time Involve the CIO as the supporter of the necessary transformation from an IT perspective that will in turn create the necessary business impact Very few organisations have reached the level of maturity that can truly harness the potential that Big Data analytics represents and practically speaking, it is a major challenge to have ticked off all the relevant boxes, but this transformation is a necessary one in order for organisations to truly differentiate themselves in the current economic environment. The CIO (and the IT department) needs to play a critical role in this transformation. The next section highlights some suggestions that IDC believes should be taken into account in the context of this journey. 12

The CIO Big Data Analytics Checklist Architect for the Future. Historically, a lot of work in analytics has been focused on workarounds due to the limited scalability of the underlying hardware. As a result, many IT departments would create materialised views or pre-calculated data structures so that business users could work off these without impacting the performance of the systems that were processing the underlying data. Clustering, parallel processing and in-memory technologies mean that all that underlying data can now be used in the analytical environment. However, it is important not to fall into the same trap of blindly adding capacity based on availability. There is a need to assess multiple delivery models (i.e. cloud particularly for bursting capabilities, analytical appliances as well as the traditional client/server or 3-tiered Web architecture approach) on a case by case basis, as one size will definitely not fit all. Create a Sandbox Mentality. One of the key differences between analytics in the traditional old-school batch mode and what we are dealing with in terms of the Big Data era is that we are gathering data that we may or may not need and from an analysis perspective, this means we don t know what we don t know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way that analytical power users go about developing their models by creating more of a Sandbox Mentality where a discovery process is always the starting point, particularly in terms of drawing linkages between unstructured, semi-structured and structured data. As part of this, new types of skills will need to be brought on board to understand social media nuance (i.e. more likely to be from Gen Y, Z or even the Millennials). Not Too Much Tinkering. Whenever a new set of cool technologies hits the market, there is a tendency for IT departments to tinker which impacts the immediate business benefits. So while a certain amount of experimentation is a good thing (as outlined in the context of the Sandbox Mentality highlighted earlier Hadoop and Mapreduce definitely fit into this category), CIOs need to be careful that not too much time is wasted on experimentation versus delivering business value. Get the Team Right. The first step in this process involves the CIO assessing his/ her own IT department to examine relevant skill levels and organisational structures. In some cases, it will necessitate an internal transformation to get the business to take notice of the change. It then requires that the right people are empowered to execute the IT analytics strategy with the relevant processes and governance structures in place to enable them to effectively deliver the business expectations. Part of this will require a much deeper understanding of the capabilities of the underlying analytics technology for the CIO, but it will also involve working with LOB executives to hire the right type of analytically minded managers and knowledge workers who can leverage the underlying technological capabilities at the most optimal levels. Take Analytics to the Enterprise. The majority of IT projects in this space have been focused on building a data warehouse combined with a variety of BI tools to surface the underlying information to the end users. However, in terms of sophisticated analytics functionality, the lack of IT skills meant that these projects have been largely departmental and tactical in nature, leading to a silo-ed mentality. As a result, to assess something such as risk-adjusted profitability (combining financial, credit scoring and customer data) would be impossible. This needs to change; and it requires a different level of IT/business collaboration to do so, with the CIO personally focused on an enterprise-wide approach in deploying analytics to ensure that these projects are successful. 13

Governance and Enablement. This is where existing investments made in data warehousing technologies, if done correctly, will pay dividends. The data models and reference architecture that IT has in place will ensure that data definitions and standards are consistent across the various business departments. Further work needs to be done in the master data management (MDM) space in terms of bridging the operational and analytical gap around data governance but fundamentally, this platform should provide the necessary management and control that IT requires. When it comes to business enablement, IDC sees a new class of projects emerging that combines business analytics with business process management capabilities more specifically, decision management software components that include tools for rule management, data mining, query and reporting, complex event processing (CEP), collaboration, BPM suites, search, and content analysis. IDC believes that IT departments that can complement previous investments in data warehousing and business intelligence technologies with a better understanding of the decisionmaking process in each of their organisations and the underlying decision management software will be best placed to manage the IT governance versus business enablement dilemma. Conclusion Despite the varying levels of maturity and adoption of business analytics, businesses are definitely gearing up for the utilisation of more advanced solutions and offerings in this space. In line with this, organisations need to plan strategically and build a robust roadmap before adopting business analytics. The new generation of business managers is more aware of the benefits of competing on business analytics and will be looking to drive adoption of this technology area more aggressively. Moving forward, IDC believes that a new approach is required to proactively effect the necessary change, with a specific focus on the following areas: Elevating the status of the CIO to that of one with more transformative impact on the organisation by playing an integral role in the deployment of the enterprise analytics strategy and ensuring that these technologies have the expected business impact An assessment of alternative delivery models (such as the appliance, in memory and Hadoop for Big Data) Capturing higher-level LOB attention and visibility as the next wave of business analytics projects are integrated with complex event processing (CEP) and business activity monitoring (BAM) technologies to drive a new class of projects that IDC defines as decision management The role of the CIO is gradually becoming much more important in the boardroom and is playing a key role in the purchase behaviour of advanced applications such as business analytics. Moreover, the CIO and the IT department need to leverage a broader set of business analytics capabilities to create a new information management strategy that deals with the emerging Big Data dynamic as well as delivering improved decision-making capabilities to the business stakeholders across the organisation. 14

#AP14962U ABOUT THIS PUBLICATION This publication was produced by IDC Go-to-Market Services. IDC Go-to-Market Services makes IDC content available in a wide range of formats for distribution by various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee. COPYRIGHT AND RESTRICTIONS Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. For permission requests, contact the GMS information line at 65-6829-7757 or gmsap@idc.com. Translation and/or localization of this document requires an additional license from IDC. For more information on IDC, visit www.idc.com. For more information on IDC GMS, visit www.idc.com/gms. IDC Asia/Pacific, 80 Anson Road, #38-00 Fuji Xerox Towers, Singapore 079970. P. 65.6226.0330 F. 65.6220.6116 www.idc.com. Copyright 2011 IDC. Reproduction is forbidden unless authorized. All rights reserved.