Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing

Size: px
Start display at page:

Download "Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing"

Transcription

1 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing A Frost & Sullivan White Paper

2 frost.com Creating an Environment that Supports Big Data... 4 Traditional Storage Solutions Don t Deliver... 5 Hyperscale, or Scale-Out, Storage Offers a Solution to the Big Data Problem... 6 What to Look for in a Hyperscale Storage Provider... 8 Questions to Consider Before Deploying Hyperscale Storage... 9 Conclusion...11 contents

3 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing The term Big Data is getting a lot of play these days, and justifiably so. As companies look for ways to stay ahead of the competition, collecting and analyzing the terabytes of data their employees and customers create and act upon can deliver significant advantages, especially in the areas of R&D, manufacturing, strategic planning, product marketing, and customer engagement. But Big Data presents a challenge for IT because much of the most useful information is unstructured data that is produced and often resides in a host of applications including , chat, Microsoft Office documents, and social media. These unstructured elements also differ from structured data in that they require instant analytics, as opposed to the batch processing that is often applied to structured data, which is typically handled overnight and on weekends when IT staff can manage and control access and resources. Finally, Big Data is ideally available not just to IT analysts, but to other employees in the organization, which requires immediacy and enterprisewide access. One critical element when solving the Big Data challenge is storage, which in this case must not only be able to handle extremely large amount of data and scale as the content grows daily but also provide the input/ output operations per second (IOPS)necessary to deliver data immediately to analytics tools and eventually to the end users who need it. Enter hyperscale computing, which leverages vast numbers of commodity servers with direct-attached storage (DAS) to ensure scalability, redundancy and failover. While traditionally the choice of very large web-based businesses, hyperscale storage is becoming a viable option for businesses that want the ability to handle large capacity and deliver low latency for Big Data analytics. In a recent Frost & Sullivan survey of 880 IT decision-makers, respondents listed their top data center concerns: 43% CAPITAL BUDGET RESTRAINTS 34% GROWTH OF DATA STORAGE REQUIREMENTS 31% 31% AGING/INEFFICIENT SERVERS AND EQUIPMENT HIGH COST OF MAINTAINING DATA CENTER This paper identifies the benefits and challenges of Big Data,outlines the storage needs Big Data imposes, and suggests best practices for choosing a provider and deploying the right storage technology for 2014 and beyond. 3

4 frost.com CREATING AN ENVIRONMENT THAT SUPPORTS BIG DATA We are living in an age of unprecedented digital data creation. Thanks to the rise of cloud-based communications, virtualization, social media, mobile devices and applications that enable everyone, everywhere to create and consume content 24/7, the amount of digital information produced and, ideally, available to employees is growing exponentially. But this content is only as good as its analytical use. Today, most of the unstructured data residing in corporate systems whether on premises or in the cloud go unused. That has significant impact on organizations, which are missing the opportunity to analyze information from a wide range of sources to alter everything from product development to customer engagement to competitive positioning. Companies are generally quite good at collecting, storing and analyzing structured data including that which populates enterprise resource planning (ERP), sales force automation (SFA), and customer relationship management (CRM) systems. But just as those systems literally changed the business landscape at the turn of the most recent century, Big Data which treats unstructured information such as s, comments, blogs, and rich media content like audio and video files as rigorously as those earlier systems approached structured data will open new opportunities and markets for those companies that get it right. Before they can analyze this wealth of information, however, companies must get a handle on it from collection and storage to provisioning and access. Figure 1 Enterprise Data Sources: Broad $ ONLINE/DIGITAL TRANSACTION-BASED OTHER/EXTERNAL COMMUNICATION & MESSAGING NETWORKS & SYSTEMS ORGANIZATIONAL Source: Frost & Sullivan 4

5 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing FROST & SULLIVAN HAS IDENTIFIED SEVERAL KEY BIG DATA TRENDS: 1. The term Big Data is often used to describe the sheer volume of data, which can easily overwhelm many existing data infrastructures. The larger challenge, however, is dealing with many data, referring to new types of data that existing relational databases were never designed to manage because the new data types did not exist or were not known when relational databases were created. 2. Frost & Sullivan has identified nearly 40 sources of data that enterprises must contend with (and should be considering, if they are not already doing so), and more than a dozen additional sources that telecommunications providers must manage. 3. Companies in all industries are grappling with Big Data, from the energy business to retail, healthcare to financial services. The information ranges, too, from climate patterns to buying trends, disease progression to risk analysis and compliance. 4. It is not enough to capture, store and manage all relevant data; it is increasingly critical to leverage the information using real-time analytics, so that organizations can take advantage of their vast knowledge store to drive decision making, influence product strategies and positively impact their bottom line. 5. This next-generation data requires next-generation storage solutions to enable extraction, transformation, and loading (ETL) of data outside of a traditional system. The ability to access, understand, and effectively act on data and even to shape the data to create desired outcomes is quickly becoming an essential tool for today s global organizations. If businesses want to succeed in increasingly competitive and global markets, they must capture all relevant information from all available sources and leverage it to support, shape, and drive business objectives. TRADITIONAL STORAGE SOLUTIONS DON T DELIVER The change in data collection and analysis has put pressure on IT departments and their vendors to change the way they approach storage. Companies today need solutions that offer better scalability, performance and provisioning without increasing infrastructure or management costs. Today s organizations are spending an enormous amount on storage, and even then, they are not getting the performance they need from their investments. As they add capacity, they are not really adding scalability because additional traditional storage solutions can handle more data, but they can t scale up or out, and they add enormous complexity to the overall infrastructure, which is increasingly designed to look like an ad-hoc collection of servers, processors, memory and interfaces. None of which makes it easy or even possible to serve up the stored data in a way that makes it usable by the employees who need it. 5

6 frost.com HYPERSCALE, OR SCALE-OUT, STORAGE OFFERS A SOLUTION TO THE BIG DATA PROBLEM Software-based hyperscale storage increases performance, capacity and throughput by leveraging a loose assortment of hardware resources and effectively delivers parallel computing for the 21st century. Hyperscale storage offers many of the same benefits of scale-out server architectures, which have made high-performance computing cost effective. Hyperscale computing environments work with multi-petabytes of storage and tens of thousands of servers that, along with its direct-attached storage (DAS),form a network of physical devices. In the event of a failure, workfails over to another server, and the faulty hardware can be removed for repair. Because data is spread across multiple servers, if a single server fails, it doesn t affect service or impact users. The use of multiple servers also distributes processing power, delivering the high performance necessary for managing the large workloads demanded by Big Data and its real-time analytics requirements. Hyperscale storage delivers multiple benefits including: Flexible deployment; Significantly lower costs; Ability to handle unstructured data; Open, extensible architecture; Reliability and redundancy; Better user experience. The goal of hyperscale storage solutions is to combine large numbers of servers and leverage their disk, CPU, and I/O resources to enable a virtualized storage pool that delivers high performance and can be managed centrally and, often, automatically. This enables on-demand scalability, which is especially critical for processing and, ultimately, analyzing unstructured data in real time. The benefits of hyperscale storage go beyond supporting Big Data. As more employees need faster and better access to enterprise information and documents from anywhere, a cloud-based file-share system is critical. But enterprise-grade file-sharing systems require a reliable, versatile and agile back-end storage solution to support read-only or write-only and read-write capabilities, as employees go from downloading and uploading files to actually working on them in collaborative environments. In these cases, files sizes will range from a few megabytes to scores of megabytes, especially as companies use more video and multimedia files throughout the organization. But the total data stores will quickly grow to terabytes, as hundreds or thousands of users take advantage of these newly effective content services; and usage will be concurrent, meaning hundreds or thousands of users will be accessing and working with the information at any given time. All of which demands hyperscale computing that can deliver high throughput for long periods of time and support a wide variety of clients and devices. 6

7 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing Hyperscale storage is not new; cloud providers and heavy users of virtualization have been relying on it for its scalability in their virtual and hosted environments. The same capabilities that make hyperscale storage so effective for those use cases make it ideal for handling unstructured data, which is also growing rapidly and presenting many of the same computing challenges. It is also ideal for private cloud and hybrid cloud infrastructures where companies deliver services to their end users through a corporate-controlled cloud. As they need even more performance, companies can easily add more servers. And, because software-based solutions can run on commodity servers, costs remain reasonable allowing organizations of all sizes to support today s demanding data processing. DATA DIFFERENCES MATTER Structured data can fit into predefined data tables and rows, consists of standard text numbers and letters, and exhibits these characteristics: Can be generated from online forms, check boxes, and other clearly defined parameters; zip codes, phone numbers, metadata, operational and sensor data. Can be generated, managed, and optimized through enterprise resource planning (ERP), customer relationship management (CRM), and similar productivity solutions. Exists in relational databases and can typically be queried with systems query language (SQL) data management tools. Unstructured and semi-structured data have no easily identifiable or consistent structure or parameters and are not easily captured or categorized in traditional RDBs. These newer data types: Emanate from digital sources such as web sites, text messages, , XML files, social media, corporate documents, online video, and other rich media. Can be captured from mobile, social, video, clickstreams, and other interactive data. Cannot typically be queried with traditional SQL tools; require keyword- and context-based NoSQL (not only SQL) databases and database tools, some of which are accessible through keyword searches, and others that serve merely as data stores. 7

8 frost.com HADOOP: NOT A BIG DATA PANACEA, BUT MAKING BIG DATA MORE AFFORDABLE Hadoop, an open source distributed computing framework created by the Apache Software Foundation and based on Google s MapReduce data framework, is fast becoming the most popular solution in the Big Data world. When data is too large for a single database, when it becomes cost-prohibitive to index data updates, and when an organization has many simultaneous users, Hadoop can help. It enables an organization to take a grid computing approach to Big Data, allocating and reallocating data as necessary across a company s data infrastructure for maximum efficiency. As important as any other feature of Hadoop is its ability to integrate traditional data (structured) with the newer forms of data (unstructured and semi-structured) that companies are facing today. But Hadoop s greatest contribution to the advancement of Big Data solutions may be its impact on cost. Deploying a Hadoop cluster can reduce the cost of a Big Data implementation from hundreds of millions of dollars to around $1 million. WHAT TO LOOK FOR IN A HYPERSCALE STORAGE PROVIDER Many vendors claim to support hyperscale storage and computing, but it s important to choose providers that can deliver on both the hardware and software. Indeed, the best option is often to go with two providers working in partnership one with the software expertise and one to deliver the most appropriate, cost-effective and high-performance hardware. Companies should look for providers that can offer a unique set of critical capabilities including elastic scalability; integrated solutions that eliminate storage silos; centralized, simplified global management;cost effective systems that deliver clear ROI even at scale; and multi-protocol client support. Let s take a look at the key criteria: Elasticity. A software-based storage solution treats the data as independent from the hardware, making it easy to grow or shrink volumes as needed, with no service interruptions. Scalability. Any effective storage solution must support not just terabytes, but petabytes of data, right from the start. High performance and availability. By spreading data out across the server infrastructure, a hyperscale solution should eliminate hot spots, I/O bottlenecks and latency; and offer very fast access to information. Automatic replication should deliver a high level of data protection and resiliency. Standards based and open source. An effective storage solution must support any and every file type, while also benefiting from the collective collaboration inherent in any open-source system, which results in constant testing, feedback and improvements. Leading-edge architecture and design. Any hyperscale storage solution should include a complete storage OS stack; operate in user space; come with a modular, stackable architecture; store data in native formats; and avoid metadata indexing, instead storing and locating data algorithmically. 8

9 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing According to the vendors, a 5 PB installation based on Red Hat Storage and HP ProLiant SL4540 Gen8 servers can deliver savings of 30 percent including infrastructure, space, power, and cooling costs over a comparable JBOD configuration over a three-year period. These benefits come from: 50% less space 61% less power 75% less administration time 63% fewer cables HP and RedHat Team Up for Scale-Out Storage that Works HP ProLiant SL4540 Gen8 servers loaded with Red Hat Storage are a purpose-built solution that lets organizations effectively store and manage petabyte-scale data. Red Hat Storage is a software-defined scale-out file and object storage solution for private cloud or datacenter, public cloud, and hybrid cloud environments. Red Hat Storage is open source and designed to meet unstructured, semi-structured, and big data storage requirements when combined with HP ProLiant SL4540 Gen8 servers. Red Hat Storage enables organizations to combine large numbers of HP ProLiant SL4540 Gen8 servers into a high-performance, virtualized, and centrally-managed storage pool. The scale-out architecture of this solution provides massive linear scalability. The HP ProLiant SL4540 Gen8 Server s efficient converged design delivers the right combination of capacity and performance in the least amount of space and at a low cost, with enterprise class reliability and manageability. It includes HP SmartMemory for performance improvements and enhanced functionality; HP Smart Array RAID controllers; HP Agentless Management; and HP Intelligent Provisioning. The ProLiant SL4500 Server features a converged and balanced architecture that is ideal for handling Big Data. Red Hat Storage running on the HP ProLiant SL4540 Gen8 server can easily serve as a large file and object store;is available as a service;and can be used as a near-line archival solution for backup and archiving. QUESTIONS TO CONSIDER BEFORE DEPLOYING HYPERSCALE STORAGE Although the power of Big Data is currently getting a lot of attention, the reality is that for a Big Data implementation to work, analysts must have access to extremely large unstructured data arrays often internal and external and the company must have the systems and people in place to organize, access, and run analytical queries against that information. That requires a new type of storage infrastructure. When considering investing in hyperscale storage, it is critical to determine the business impact; this includes assessing the effort required to get there and the return on investment, now and into the future. Questions for a company looking at a Big Data hyperscale storage solution include: What are the company s expectations for a Big Data implementation including ROI? Can a hyperscale storage system resolve the pressing problems that are either holding the company back or, once resolved, would rocket it ahead of the competition? 9

10 frost.com How will you evaluate the effectiveness of a hyperscale storage system over time? Will you need to invest in new technology skills or experts? What else is there to invest in to complete the value chain, processes, and application so it will really make an impact? What is the market reality; what will it do for me? Are company stakeholders and leadership taking a serious approach to Big Data, or are they imagining a magical, fix-all scenario that s unlikely to occur? Are competitors, suppliers and buyers pushing for a Big Data implementation; and, if so, what is their objective? Is the timing right for an investment in a hyperscale storage system? Figure 2 Top 5 Business Values Derived from Effectively Managing Big Data Value Data-driven organization Description/Benefit Makes an organization data-driven through policies and practices designed to transform actionable information into organizational action. Data decisions Forces the business to make decisions about which KPIs and areas of data it wants to focus on. While this is essential from a data standpoint, it has a ripple effect throughout the business because, in the process, the organization decides what is truly important. No (data) fear Addresses the reality that most of what people are talking about when they say Big Data is unstructured and semi-structured data, which do not fit easily into tables, or surrender themselves to standard SQL queries/lookups. By deploying systems and best practices that can render unstructured and semi-structured data as readily manageable as structured data, an organization no longer has to fear the data it captures, analyzes and puts to good use to build and support the business. Agile Organization Means the organization has truly learned to collect, assimilate, normalize, analyze and act on data from all sources both online and offline. Data for all Represents the culmination of the dream that a company s most important data need not be locked away and available exclusively to IT or data analysts, but instead, be accessible to all employees. This empowers people with the data they need to better perform their jobs. Source: Frost & Sullivan 10

11 Rethinking Storage in an Always-On World: Big Data Puts the Spotlight on Hyperscale Computing CONCLUSION Data is at the heart of business and life in 2014; survival and success for both organizations and individuals increasingly depends on the ability to access and act on all relevant information. The key is to craft effective strategies to distill raw data into meaningful, actionable analytics. The payoff can include better, smarter, faster business decisions creating a truly agile organization and empowering people with the data they need. Any analysis of Big Data will require a new kind of storage solution, one that can support and manage unstructured and semi-structured data as readily as if it were structured information. Hyperscale computing and storage leverages vast numbers of commodity servers with direct-attached storage (DAS) to ensure scalability, redundancy and scale, and provide the input/output operations per second (IOPS) necessary to deliver data immediately to analytics tools and the end users who use them. Companies ready to embrace this highperformance yet cost-effective solution will see business benefits, including better and faster decision making, improved customer relationships and retention, and a clear competitive advantage. 11

12 Auckland Bahrain Bangkok Beijing Bengaluru Buenos Aires Cape Town Chennai Colombo Delhi/NCR Detroit Dubai Frankfurt Iskander Malaysia/Johor Bahru Istanbul Jakarta Kolkata Kuala Lumpur London Manhattan Miami Milan Mumbai Moscow Oxford Paris Pune Rockville Centre San Antonio São Paulo Sarasota Seoul Shanghai Shenzhen Silicon Valley Singapore Sophia Antipolis Sydney Taipei Tel Aviv Tokyo Toronto Warsaw Washington, DC Silicon Valley 331 E. Evelyn Ave. Suite 100 Mountain View, CA Tel Fax San Antonio 7550 West Interstate 10, Suite 400, San Antonio, Texas Tel Fax London 4 Grosvenor Gardens London SW1W 0DH Tel +44 (0) Fax +44 (0) GoFrost myfrost@frost.com Frost & Sullivan, the Growth Partnership Company, works in collaboration with clients to leverage visionary innovation that addresses the global challenges and related growth opportunities that will make or break today s market participants. For more than 50 years, we have been developing growth strategies for the Global 1000, emerging businesses, the public sector and the investment community. Is your organization prepared for the next profound wave of industry convergence, disruptive technologies, increasing competitive intensity, Mega Trends, breakthrough best practices, changing customer dynamics and emerging economies? For information regarding permission, write: Frost & Sullivan 331 E. Evelyn Ave. Suite 100 Mountain View, CA 94041