HADOOP USERS ZERO IN ON BUSINESS BENEFITS OF BIG DATA

Size: px
Start display at page:

Download "HADOOP USERS ZERO IN ON BUSINESS BENEFITS OF BIG DATA"

Transcription

1 E-Guide HADOOP USERS ZERO IN ON BUSINESS BENEFITS OF BIG DATA SearchData Management

2 Mainstream T his expert e-guide explores the history of, starting at its inception and gazing ahead towards the uncertain future, as well as providing examples as to how has helped enterprises with their capabilities. Learn how makes it feasible to process and use collected by companies great and small. PAGE 2 OF 12

3 Mainstream HADOOP ARCHITECTURES PROPEL WIDER MOVE TO DATA-DRIVEN ANALYTICS Craig Stedman, Executive Editor SearchBusinessAnalytics turned 10 this year by some historical measures: when it became an Apache subproject and was given a name, when the first code was released, when the first users deployed that code. To mark the occasion(s), executives from vendors are giving the distributed processing framework gift-wrapped accolades for its impact on data management and processes over the past decade. Not surprisingly, the celebrants include some of the people who played central roles in getting off the ground. Doug Cutting, co-creator of the technology and now chief architect at distribution vendor Cloudera, said architectures have enabled businesses to become much more data-driven -- and not on the periphery of organizations, but in the center. Fellow co-creator Mike Cafarella, a computer science professor and CEO of startup Lattice Data, chimed in to say that before, companies were leaving huge amounts of really interesting [] work on the table PAGE 3 OF 12

4 Mainstream because of the processing limitations of relational databases. There s more. Almost any enterprise you find that cares about its data is somewhere on a journey with, said Sean Suchter, whose web search technology team at Yahoo became the first production user of in 2006; Suchter is now CEO of performance management startup Pepperdata. Raymie Stata, chief architect for search and advertising systems at Yahoo 10 years ago and now head of cloud services provider Altiscale, lauded for giving programmers and analysts direct access to all of the data in the enterprise, bypassing the high priests of data who could slow everything down in traditional data warehouse environments. You could be forgiven for taking words of praise from its progenitors with a grain of salt. In this case, though, there s merit in the meritorious views toward. can t be credited with starting the business world down the datadriven path; data warehouses and business intelligence (BI) systems began finding their way into companies more than two decades ago. And selfservice BI tools that put power in the hands of business users emerged in the mid-2000s. But architectures have taken things to a different level, opening up new types of data for analysis and making it more feasible PAGE 4 OF 12

5 Mainstream -- technically and economically -- to collect, process and use all the information flowing into organizations. Take Uber, for example. The ride-sharing company was in danger of stalling out on until it deployed a data lake last year along with the Spark processing engine and other technologies. We had data sets that weren t available [for analysis] within the company before -- now they are, said Vinoth Chandar, a senior software engineer at Uber. The environment has become the source of truth for all data, he added, noting that Uber is looking to make every decision data-driven. General Electric s GE Power Services unit is another organization that s using a -based system architecture -- front-ended by self-service BI software -- to create a more data-driven culture. Chief enterprise architect Don Perigo said GE Power Services went from 120 workers using a conventional BI and reporting system four years ago to 22,000 users of the platform. Executives set a goal of 50% utilization in individual business units -- in some departments, Perigo said, the adoption rate is up to 98%. The University of Texas MD Anderson Cancer Center envisions the same sort of thing happening there. Right now, a lot of its data is just dark -- we can t get to it and we can t use it, said Bryan Lari, director of institutional PAGE 5 OF 12

6 Mainstream and informatics. The goal is to get to where everyone, from executives to admins, are using data to drive decisions. The vehicle: a cluster that began operation in March. The 10-year milestones come at a time when the future of as we ve known it is in question. Spark is pushing aside the MapReduce engine in many architectures, and possible data storage alternatives to the Distributed File System -- the framework s other original core component -- are springing up. may morph into a different set of components, or it could slowly fade from the scene, its throne usurped by other tools that have grown up around it. But even if the latter happens, will have accomplished far more than Cutting likely imagined it would when he famously named the technology after his son s stuffed elephant a decade ago. And the data-driven environments it has fostered will remain -- which is worthy of some congratulatory pats on the back. PAGE 6 OF 12

7 Mainstream MAINSTREAM HADOOP USERS ZERO IN ON BUSINESS BENEFITS OF BIG DATA Craig Stedman, Executive Editor SearchBusinessAnalytics The need to prove the applications and platforms has taken center stage in a growing number of mainstream organizations, and it isn t always an easy task for IT and managers. For example, a deployment wasn t a slam-dunk decision for Blue Cross Blue Shield of Michigan. For a lot of organizations like ours, has not yet become a core foundation of running the business, said Beata Puncevic, director of, data engineering and data management at the medical insurer. When you go in and talk to a lot of [executives] about investing in a platform, it completely does not resonate with the challenges of the day. At Blue Cross and other healthcare businesses, those challenges include low profit margins that don t leave a lot of money for technology innovation, plus resource and skill-set issues, and a relatively conservative culture, according to Puncevic. As a result, she and her colleagues had to put in some extra PAGE 7 OF 12

8 Mainstream effort to get approval and funding for a data lake that went into use in May. Puncevic set up a team to develop an ROI framework for the data lake project, with metrics on the projected benefits based on before and after calculations. In building the business case, she also focused on three IT-related improvements: reducing data processing and management costs, enabling more insightful, and creating a more agile and adaptable technology architecture. In addition, Puncevic said she worked to obtain corporate-level funding for the initial rollout and subsequent project phases, so we don t have to worry about getting funding from individual business units for different aspects of the initiative. The strategy worked, and the Detroit-based insurer is on a path to fully constructing the platform over the next three to five years. The benefits of are potentially tremendous for the healthcare industry as a whole, Puncevic said at Summit 2016 in San Jose, Calif., last week. Besides lower IT expenses, she cited an opportunity to reduce healthcare costs, while also improving the quality of patient care and boosting preventive medicine efforts -- all through better. PAGE 8 OF 12

9 Mainstream ON THE ROAD TO BIG DATA BENEFITS The value of is definitely real for Progressive Casualty Insurance Co. and its auto policy customers, said Brian Durkin, an innovation strategist in the company s enterprise architecture group. Progressive uses a cluster partly to power its Snapshot program, which awards discounts to safe drivers based on operational data collected from their vehicles. The insurer has handed out more than $560 million worth of discounts since launching the program in 2008, Durkin said in another conference session. It s not some little science experiment that we re running, he said. We re fully invested in it, and it means a lot to our customers. To track participating drivers and calculate discounts, huge volumes of data get processed and analyzed in the cluster, which, like the one at Blue Cross, is based on the Hortonworks distribution. Progressive has collected data on 2.4 billion trips, and it retains all of the information. For analyzing driving patterns to identify bad habits drivers can be alerted to, it s the older data that s more valuable, Durkin said. So, we have to keep everything and analyze everything. Crunching the data requires a lot of processing resources, and Progressive has deployed various advanced tools for its data scientists to use, PAGE 9 OF 12

10 Mainstream including SAS, the R programming language and H2O. But business executives have been willing to foot the bill, said Pawan Divakarla, data and business leader at the Mayfield Village, Ohio, insurer. It s a very data-driven company, he said. We want people to have intuition and ideas, but they need to prove them out with data. HADOOP S HIGHER-VALUE PROPOSITION Retailer Macy s Inc. runs a mix of BI and applications off of a Hortonworks-based system to support marketing, merchandising, product management and other business operations. On a daily basis, thousands of business users access hundreds of BI dashboards fed by the cluster -- making it a key component in decision making, said Seetha Chakrapany, director of marketing and customer relationship management systems at Cincinnati-based Macy s. You don t want to just see as a cheap storage solution, Chakrapany said. Its value is much higher than that. is still maturing and has a lot of rough edges, he cautioned, saying new users should expect some instability and missing IT management functionality. If you come in with the typical IT mindset that this has to be PAGE 10 OF 12

11 Mainstream rock-solid, it s not going to be the right [technology]. Nonetheless, he said he thinks could truly be an enterprise data platform for Macy s. But Chakrapany isn t taking the benefits of and based BI applications for granted. Last year, he set up a team of evangelists to sell the merits of the environment internally and lobby more business units to use it. His group also tracks the business benefits generated by the platform, in both qualitative and quantitative ways. We don t want to just be counting the number of users, the number of queries, how much data [is analyzed] -- those are just numbers, Chakrapany said. The key piece is, what has this done for the business? PAGE 11 OF 12

12 Mainstream FREE RESOURCES FOR TECHNOLOGY PROFESSIONALS TechTarget publishes targeted technology media that address your need for information and resources for researching products, developing strategy and making cost-effective purchase decisions. Our network of technology-specific Web sites gives you access to industry experts, independent content and analysis and the Web s largest library of vendor-provided white papers, webcasts, podcasts, videos, virtual trade shows, research reports and more drawing on the rich R&D resources of technology providers to address market trends, challenges and solutions. Our live events and virtual seminars give you access to vendor neutral, expert commentary and advice on the issues and challenges you face daily. Our social community IT Knowledge Exchange allows you to share real world information in real time with peers and experts. WHAT MAKES TECHTARGET UNIQUE? TechTarget is squarely focused on the enterprise IT space. Our team of editors and network of industry experts provide the richest, most relevant content to IT professionals and management. We leverage the immediacy of the Web, the networking and face-to-face opportunities of events and virtual events, and the ability to interact with peers all to create compelling and actionable information for enterprise IT professionals across all industries and markets. PAGE 12 OF 12