GUIDE TO DATA WAREHOUSING

Size: px
Start display at page:

Download "GUIDE TO DATA WAREHOUSING"

Transcription

1 GUIDE TO DATA WAREHOUSING

2 OVERVIEW Implementing a data warehouse is a strong step toward managing data on an enterprise level, either for management purposes or business intelligence efforts. Some believe that popular Hadoop cluster systems will supersede data warehouses altogether, but, as elaborated on in the Hadoop Integration section of this guide, Hadoop s applications are distinct from, and potentially even aided by, the use of data warehouses. When looking to implement such a system, it s important to look at the most up-to-date list of considerations. This guide covers the factors one should consider when selecting and implementing a data warehousing solution. 1. TextureX Motherboard Circut green copper stock Tech Texture by TextureX used under CC BY / desaturated from original 2. CGRB server Room by Shawn O Neil used under CC BY / desaturated from original 3. datebfkyss by Willi Heidelbach used under CC BY / desaturated from original 4. Stairs by thomas Leuthard used under CC BY / desaturated from original 5. Power Mast Framework by Markus Grossalber used under CC BY / desaturated from original 6. 60/365 by Brenda Gottsbend used under CC BY / desaturated from original 2

3 USES FOR A DATA WAREHOUSE The primary application of a data warehouse is when data from many sources needs to be consolidated into one location for analysis and business intelligence (BI) purposes. This could mean that you have point-of-sale (POS) data, enterprise resource planning (ERP) data, and accounting data stored in separate systems, all of which you want to analyze together. This could help you report on the state of your business, or find data correlations between the different systems. Another scenario is when a central hub is needed to allow the efficient transfer of data across dozens or potentially hundreds of enterprise systems. For example, patient records from an electronic health record (EHR) system may need to be fed into a billing software. A data warehouse can collect and standardize the data from the EHR, then transfer it to the billing software. This is better than transferring data directly between the EHR and the billing software, because, if the billing software was changed out for a new system, the transfers between both the EHR and the billing software would have to be reprogrammed, and potentially reformatted. In the data warehouse scenario, if the billing software was replaced, data transfers from the EHR to the data warehouse would be unaffected. Only the transfer from the billing software to the data warehouse would have to be redone. As you can imagine, if data from several sources were all supposed to be fed in and out of the billing software, then the benefit of the data warehouse would be even greater. The primary application of a data warehouse is when data from many sources needs to be consolidated into one location for analysis and business intelligence (BI) purposes. 3

4 WHEN YOU NEED TO UPGRADE TO A DATA WAREHOUSE GUIDE TO DATA WAREHOUSING The amount of data that warrants a data warehouse is less dependent on the sheer volume of data in giga, tera, or petabytes, than on the number of sources that need to be integrated. As described above, the functions of a data warehouse are primarily to bring data from multiple sources together under one roof either to be analyzed in that centralized location or to be transferred more efficiently between systems. If the number of systems between which data needs to be transferred or aggregated is a mere handful, and the quantity of data in them is only tens of gigabytes, then it may be more appropriate to have a simple database with the data from each system stored in tables. This volume of data is small enough to be analyzed in that single database. Additionally, the transfer of data between such systems can be done either manually or with automated processes developed by analysts. Such a solution will require significantly less capital and resources than would be demanded by a data warehouse implementation. Quantifying a minimum data requirement is difficult. Data warehouse implementations are unique to each scenario. Beyond situational descriptions of typical business needs, quantifying a minimum data requirement is difficult. Data warehouse implementations are unique to each scenario. Reviewing the following considerations and case study can help you get a feel for whether a data warehouse is the right solution for your organization. 4

5 FEATURES AND BUYING CONSIDERATIONS SCALABILITY Data warehousing solutions are largely defined by their ability to expand in volume, incorporate new data sources, and add new capabilities as an enterprise undergoes planned and unplanned growth. GUIDE TO DATA WAREHOUSING ACCESS Different solutions offer different methods of access either from only the data warehouse, your internal network, or even from the web. Most utilize online analytical process (OLAP) protocols. You want to consider whether data is immediately accessible after load to the data warehouse, or if there is extensive data latency. This affects your ability to work with real-time data. Likewise, performance monitoring is important in determining whether you can perform an ETL load at the same time as a data mining procedure, or if you ll need to plan your ETL pulls out of the way (such as 3AM on a Sunday morning) to avoid interfering with the performance of your data analysis. Lastly, some data warehouses have limits on the number of simultaneous users, and setting permission levels to limit user access may be an important consideration for your organization. INTEGRATIONS An integration in data warehouse terms is a pre-built system for transferring data from, and sometimes to, a particular data source, such as SalesForce, or QuickBooks. Integrations for various data sources has a large impact on the speed of deployment and ease of expansion. If the data warehouse provider has integration with a new data source (such as a new CRM program) you d like to feed into your warehouse, then it s as simple as turning it on. Without a prior integration, developers will have to create the schema for the ETL process. This process takes time and quickly becomes expensive. 5

6 In the case that custom integrations are necessary, it s important to consider whether the provider s developers do all custom integrations, or if there are APIs or even GUI interfaces available for custom integration schemas and mapping performed by your developers. Some data warehousing providers have tools and interfaces designed so that even non-it personnel can create a schema for a simple database. TEXT ANALYTICS Text analytics, or the analysis of unstructured data, is common among data warehousing solutions. Inevitably, some solutions are more capable when it comes to generating structured data from unstructured data (through the identification of meta data, and patterns). Likewise, some systems are more time-efficient at the analysis of unstructured data, which can make a significant difference in performance. DATA MINING Basic data mining functionality is often standard in a data warehousing solution since the warehouse is where you will be drawing data from for your company-wide business intelligence purposes. Some data warehousing vendors, however, do provide features for more advanced data modeling, scoring, standard reporting from templates, and even visualization at the source of the data. The extent of such capabilities that you need out of your data warehouse are dependent on what external BI solution you will be implementing to analyze your data, and whether you will be extracting smaller subsets of the data for your analysis or pulling from the entire data set. 6

7 HADOOP INTEGRATIONS Hadoop integration is important when the volume of data your enterprise collects and analyzes is so large that it becomes infeasible or too costly to feed it into one central location. Hadoop uses parallel processing abilities to analyze data at the source, across multiple servers. One of the sources at which Hadoop performs parallel data processing can be a data warehouse, which is why Hadoop does not entirely eclipse data warehousing technology, but rather complements it. SUPPORT If the data warehouse will be storing and backing up data from your entire company, and if certain data transfers must be running at all times, then you will want to have 24/7 phone and tech support. If your data warehouse is particularly important, then you will want to review the provider s service level agreement (SLA) for uptime guarantees, as further described below. CLOUD VS. ON-PREMISE The main considerations with data warehousing when deciding between cloud or on-premise solutions are speed of deployment and uptime/maintenance. The second greatest factor in speed of deployment after integration is physical installation. Data warehouses on the cloud require both less technical resources such as servers, as well as IT maintenance. A cloud data warehouse solution with pre-built integrations for all your data sources can be set up quite rapidly. However, with a cloud-based data warehouse you are at the mercy of the provider s uptime and maintenance schedules, not to mention your enterprise s Internet connection. On-premise solutions allow you to be totally responsible for your own uptime and outages. Depending on the priority level of the data being handled and the transactions the data warehouse will administer between systems, you may want to prioritize control over convenience, or vice-versa. 7

8 DATA WAREHOUSING IN USE For years the Volvo Car Corporation collected a lot of data from the numerous digital sensors in each of their cars. Complementing the devices that trigger signals like the Service Engine Soon light are hundreds of meters and data loggers. All these data points are collected from the car s central computer when you take it in for servicing and diagnostics. Increase in Volume of Actionable Data Volvo identified that this data in cohort with hardware, software and functional specifications, vehicle diagnostics, and warranty claims would allow them to reduce costs, improve the accuracy of warranty reimbursement, and improve the quality of their products in response to real-time data. Before After To perform the intended analysis, Volvo first had to consolidate all their data sources into one location. Volvo teamed up with Teradata to implement a data warehouse. As a result, the volume of actionable data at the disposal of Volvo analysts increased from 364 gigabytes to 1.7 terabytes. The data was made accessible to over 300 individuals across product design, manufacturing, quality assurance, and warranty departments. Volvo experienced immediate improvements in query performance, user access, targeted cost reduction, and an increase in accuracy through warranty claim analysis, as well as improvements in the quality of current and future production lines resulting from broader diagnostic analysis. Data was made accessible to 300 individuals across 4 departments Visit CTOAdvice.co to learn more about data warehousing, read more professional guides, or subscribe to our newsletter. 8