Stay Grounded With Cloud Data Management

Size: px
Start display at page:

Download "Stay Grounded With Cloud Data Management"

Transcription

1 Stay Grounded With Cloud Data Management Database managers shouldn t be afraid of moving processes to the cloud, but they should ensure best practices go with them and take into account challenges with the new environment. MANAGEMENT TAKES CROSS- PLATFORM BUSINESS INTELLIGENCE

2 EDITOR S NOTE Data Management Must: Good Cloud Cover Managing and analyzing data in the cloud has yet to achieve widespread adoption, but the cloud user base is becoming sizable. As of January 2016, 31% of 635 survey respondents to an ongoing TechTarget survey said they were looking to deploy new BI purchases either in software as a service (SaaS) or public cloud setups. That finding is similar to what a TDWI survey conducted in May 2015 found. In that case, 35% of 309 respondents said their organizations had active cloud data management or analytics deployments. Cloud platforms promise some alluring benefits: most notably, lower IT costs and increased flexibility for scaling systems up or down as data processing needs change. But they don t magically wash away the need for effective data management practices. Just like on-premises info, data in the cloud needs to be cleansed, integrated and governed to ensure that it s accurate and consistent. For example, during a presentation at the 2016 TDWI Executive Summit in Las Vegas, Naveen Jain director of digital analytics at First Tech Federal Credit Union said that applying data governance processes to a new Hadoop data repository running in the Microsoft Azure cloud was one of his top priorities. In this guide, we examine cloud data management issues and technologies to aid IT managers who are planning or considering deployments. First, consultant David Loshin explores the data integration challenges of cloud systems and discusses a possible technology solution. Next, reporter Jack Vaughan details a social media startup s use of a hybrid cloud and on-premises architecture. Consultant David A. Teich closes with a look at the merits of cloud-based BI. n Craig Stedman Executive Editor, SearchDataManagement 2 STAY GROUNDED WITH MANAGEMENT

3 TOOLS Cloud Data Management Takes Cross-Platform Turn The economics of cloud computing have enabled many organizations to invest in IT initiatives and business applications that remained tantalizingly out of their reach when on-premises deployments were the only option. In essence, using the cloud transforms IT costs from capital expenditures on hardware and software to ongoing operational expenses. It also streamlines cash flow and potentially lowers costs by allowing a company to pay only for the technology it really needs and to expand its IT systems and its budget only as necessary. In addition, the organization need not worry about its hardware becoming outdated, as the cloud provider can be tasked with continually upgrading the systems within its environment. Organizations attracted by the promise of those benefits are using cloud computing technologies in a number of different ways, primarily centered on these three uses: 1. Using cloud services as a straightforward replacement for on-premises IT systems. In this scenario, the IT team retains responsibility for the end-to-end design, development, testing, implementation and management of cloudbased applications. That reduces technology acquisition outlays while allowing IT to keep full control of the application platform. 2. Using software as a service (SaaS) applications, such as Salesforce. In addition to reducing capital equipment costs, the SaaS approach simplifies implementation and management of the application software supporting key corporate functions, such as sales, marketing, customer service, finance and human resources. 3. Using fully managed platform as a service (PaaS) environments. In PaaS setups, the cloud 3 STAY GROUNDED WITH MANAGEMENT

4 TOOLS provider also handles the design, deployment and management of your back-end processing and data management resources. Despite all of the benefits the cloud offers, though, there s a significant potential drawback: the proliferation of platforms, applications, tools and locations in which your corporate data resides. While cloud systems provide increased convenience, lower costs and faster time-to-value for users, they also establish a new pattern of data distribution that not only spans different systems, but also crosses organizational and administrative boundaries. This data diffusion leads to a number of questions about managing and using data in the cloud. For starters, what kind of control do you have over the models and metadata for the various data sets that are being managed in cloud-based systems? Going further, how can all of that data be accessed, and what are the synchronization requirements for enabling information in different data sets to be used in a coordinated way, no matter where it s located? Such questions are particularly pertinent for business intelligence, reporting and analytics applications. Methods must be implemented to facilitate data integration across different cloud platforms, applications and data stores and on-premises systems, too while also providing a workable user interface for business analysts, data scientists and other BI and analytics users. In fact, that effectively defines one possible solution to the problem: software products that support cross-platform data integration. These tools, also known as self-service data preparation software, provide connectors to mainstream relational database management systems and newer NoSQL databases; they can also link to Hadoop clusters and data lakes to access information in the Hadoop Distributed File System and related data repositories. In addition, they can ingest unstructured text files and structured XML and JSON documents, plus streaming data from sources such as social networks, website clickstream logs and stockmarket data feeds. And yes, they can connect to SaaS applications and cloud services to pull together data generated there and combine it with other information as needed. The cross-platform tools possess three other 4 STAY GROUNDED WITH MANAGEMENT

5 TOOLS key attributes. First, they re able to direct data to any selected platform, a big difference from traditional data integration tools that pull information from source systems into a single staging area. Second, they support easy access to data via end-user BI and data visualization tools, no matter where the required information resides. Third, they provide semantic cataloguing of available data sets; corresponding business metadata that provides details about data elements, definitions and structures; and associated business rules needed to enable data integration processes. All this indicates that cross-platform data integration technologies are more than just souped-up extract, transform and load (ETL) software mapped to a mix of internal and external data sources, both on-premises and in the cloud. These tools blend a variety of features including metadata management and data connectivity, federation and virtualization to provide a uniform way to access, query and visualize data. Cloud environments with widely dispersed data sets may have met their data management match. David Loshin 5 STAY GROUNDED WITH MANAGEMENT

6 REAL WORLD Startup Uses NoSQL Hybrid Cloud to Scale to Infinity Looking to avoid data latency on its interactive Web platform and deal with growing processing demands, Spot.IM a social media and online community startup based in Tel Aviv, Israel turned to a hybrid system architecture that mixes on-premises and cloud deployments of the Redis NoSQL database. Spot.IM s namesake service enables websites to add real-time commenting, interactive chat, newsfeeds and other community-oriented features in an attempt to better engage site visitors. The service must be able to nimbly scale up and down as usage levels fluctuate, according to Ishay Green, the company s CTO. But he said Spot.IM s technology platform often has to handle a very large number of user requests and making people wait for it to respond isn t a viable option. The startup, which counts Entertainment Weekly magazine s website among its customers, seeks to minimize latency in order to make the transition from webpage viewing to interactive dialogue and commenting appear seamless for users of sites that are running the service. Everything is about latency today, Green said. To meet the need for speed, Green and his colleagues at Spot.IM decided to field a combination of Redis Labs Enterprise Cluster (RLEC) and Redis Cloud, a pair of companion NoSQL products from software vendor Redis Labs Inc. RLEC, a clustered system that can support multiple Redis databases, is installed on-premises at Spot.IM. The Redis Cloud managed service runs on the Amazon Web Services cloud. Green said Spot.IM is using the mixed NoSQL setup to serve 400,000 to 1 million user requests per day, with the expectation that it eventually will be able to handle more than 1 billion requests per month. The basic idea is that our main database acts like a cache in 6 STAY GROUNDED WITH MANAGEMENT

7 REAL WORLD real time, he said, adding that he has yet to encounter limits with the system s scalability. It allows me to scale to infinity. Although Redis is freely available as an open source technology, Green said Spot.IM opted for the commercial version of the database from Redis Labs because of its ease of configuration, high availability features and the scalability the software can provide. One of a wide variety of built-for-purpose NoSQL database engines, Redis was designed to support high-speed processing operations involving key-value pairs of data. The database was created by Italian developer Salvatore Sanfilippo, who carried the work on Redis forward as an open source effort while working at IT vendors Pivotal Software and VMware. To continue with that, Sanfilippo joined Redis Labs in mid-2015 as the head of open source Redis development. Redis is an in-memory database that provides advanced caching, a trait that makes it particularly useful for Web applications. Although the technology is often categorized as a key-value data store, it isn t a conventional key-value implementation in the sense that it s Our main database acts like a cache in real time. It allows me to scale to infinity. ISHAY GREEN, CTO, Spot.IM specially tuned to support common data structures, such as strings, lists and hashes. Redis Labs has touted processing performance as high as 1.5 million operations per second a reachable level partly due to the fact that the database is written in C, according to Leena Joshi, the vendor s vice president of product marketing. Like other NoSQL players, Redis Labs has also forged a connector to the Apache Spark analytics engine, which could help enhance the database s utility in analytics applications. It has increasingly found market traction, joining fellow NoSQL technologies such as MongoDB and Apache Cassandra in the top 10 of DB-Engines ranking of database management system popularity. And in October 2015, consulting and market research company Gartner put Redis Labs in 7 STAY GROUNDED WITH MANAGEMENT

8 REAL WORLD the leaders section of a Magic Quadrant ranking of operational database management system vendors. It was the first time the company made it into Gartner s annual database Magic Quadrant report; Redis Labs was joined in the top quadrant by 10 other vendors, including relational database market leaders Oracle, Microsoft and IBM and several other NoSQL software developers. Spot.IM s Green cited the commercial Redis software s ability to scale automatically as an advantageous feature versus the open source version of the database, which he said is limited to a single-thread model. Getting the scalability support as a built-in part of Redis Cloud and RLEC frees up time for development team members at Spot.IM to address other tasks, he added. Jack Vaughan 8 STAY GROUNDED WITH MANAGEMENT

9 SERVICES Cloud Boosts Business Intelligence Process Business intelligence vendors have a history of telling prospective business users that self-service BI software is a panacea that will get IT out of their hair and let them analyze data on their own easily. And I have a bridge to sell you But the problem business users have isn t really with IT it s with the time it takes to think of an analysis and then apply that idea to data. Because of previous technology limitations, that often required a heavy IT presence in creating and running analytical queries. That s changing, though, in a way that not only benefits end users, but also helps IT to better handle the massive amounts of data now being generated and the increased demands for data analytics from line-of-business managers. The two key overlapping parts of that change are the expanding capabilities of BI and analytics tools and the growth of cloud computing a confluence that is making business intelligence in the cloud more feasible. Much of the discussion surrounding the growth of cloud applications centers on large data volumes, fast processing, relatively low price points and simplified implementation. But there are also more indirect reasons to migrate to the cloud. One benefit that too often is presented as an Oh, also feature is improved control over application consistency. If you need to manage servers in multiple regions or PCs and mobile devices scattered everywhere, a cloud deployment, especially when combined with HTML5, makes it much easier for IT to see and keep track of who is using what software. It also makes upgrades and patches a lot easier. In addition, cloud installations can help organizations maintain suitable controls for data accuracy and both regulatory and contract compliance as conditions on the ground change. A corollary to the ability to control 9 STAY GROUNDED WITH MANAGEMENT

10 SERVICES change is the ability to more rapidly provide for it. Compared to on-premises applications, that can mean smaller, more frequent updates being rolled out to business users. Many cloud computing vendors let their clients choose when to migrate to new versions of cloud services. It also becomes much faster to provide new functionality to users in various locations: IT just has to flick a switch, and it s available to everyone. A NATURAL WAY TO GO BI is a process that adapts well to cloud computing. From data aggregation to distribution of analytical findings, doing business intelligence in the cloud can help simplify and streamline the BI process particularly if the data being analyzed is in the cloud to begin with. Because of technical advances, BI and analytics systems now can look at larger data sets and also run far more complex analytics applications in less time than they could in the past. That means a lot more can be understood about business strategies and operations, both historically and predictively. And the cloud is clearly part of the advancing technology making that possible. As mentioned, one of the benefits of cloud systems is their support for inexpensively storing large amounts of data. That could include social media data and third-party market information, plus the exponentially growing volumes of sensor data coming from the Internet of Things. Much of that data is already in the cloud, so why spend the time and resources needed to move it on-premises for analysis? On the other end, having reports, charts and other data visualizations centrally located in cloud servers makes it much easier to certify them for accuracy and then to distribute them inside an organization. The distribution process can be as basic as changing permission levels on the information to make it more widely available. NOT JUST DIY Even so, all of that work must still be supervised by IT to ensure that data sources are properly accessed and utilized and that analyses that work OK in tests are still accurate and 10 STAY GROUNDED WITH MANAGEMENT

11 SERVICES run efficiently in production applications. And of course, IT also needs to provide the overarching security that links cloud BI tools into the full organizational security structure. The self-service BI marketing message became bifurcated in recent years. Tech-savvy business analysts were addressed with powerful tools that let them run complex BI and analytics applications, while knowledge workers were the focus of self-service offerings that didn t amount to much more than spreadsheets with some drag-and-drop tools for creating simple reports. But maybe we should just stop talking about self-service BI. The business user will never be able to manage, access and investigate corporate data alone ultimately, it s not a do-it-yourself thing. Companies need some structure around information and how it s accessed and used. Combining BI tools and techniques with cloud computing technologies is enabling IT and its business clients to work together in a way that wastes less time on both sides while letting each focus on its own area of expertise. Synergistic, organic and empowered are some of the adjectives that come to mind when thinking about the changing relationship between IT and the business on BI initiatives. But one thing I know for sure is that BI is changing for the better thanks to the cloud. David A. Teich 11 STAY GROUNDED WITH MANAGEMENT

12 ABOUT THE AUTHORS DAVID LOSHIN is the president of Knowledge Integrity Inc., a consulting, training and development company that works with clients on business intelligence, big data, data quality, data governance and master data management initiatives. Loshin writes for many industry publications, including several TechTarget websites, and is the author of numerous books. Visit his website or him at loshin@knowledge-integrity.com. DAVID A. TEICH is principal consultant at Teich Communications, a technology consulting and marketing services company. Teich has more than three decades of experience in the business of technology. him at davidt@teich-communications.com. JACK VAUGHAN is senior news writer for SearchData- Management. Previously, he was editor in chief for SearchSOA. Before joining TechTarget in 2004, he was editor at large at Application Development Trends and ADTmag.com. him at jvaughan@techtarget.com and follow him on STAY CONNECTED! today. Stay Grounded With Cloud Data Management is a SearchDataManagement.com e-publication. Bridget Botelho Editorial Director Ron Karjian Managing Editor Moriah Sargent Associate Managing Editor Craig Stedman Executive Editor Jacqui Biscobing Site Managing Editor Linda Koury Director of Online Design Martha Moore Senior Production Editor Doug Olender Publisher dolender@techtarget.com Annie Matthews Director of Sales amatthews@techtarget.com TechTarget 275 Grove Street, Newton, MA TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group. About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts. COVER: FOTOLIA 12 STAY GROUNDED WITH MANAGEMENT