Bringing Nagios IT Monitoring From Good to Great

Size: px
Start display at page:

Download "Bringing Nagios IT Monitoring From Good to Great"

Transcription

1 Bringing Nagios IT Monitoring From Good to Great

2 Contents Introduction 3 Overcoming the Multiple Instance Nightmare 5 Avoiding Swivel Chair Management 7 Get Your Weekends Back 10 Avoid the Speeding Bus! 12 Securing Your Monitoring System 15 Conclusion 17 2

3 Bringing Nagios IT Monitoring From Good to Great How is your IT monitoring solution working? Be honest now. Is your monitoring keeping up with the growth and scale of your organization? Does it have the capabilities and security that makes sense to not only you, but to your entire team? If you happen to be monitoring with Nagios, you may have started out with basic capabilities that were good enough. For example, you can monitor the basic status of devices in your IT infrastructure. Is it up or down? Is it broken? Are people across the company about to knock down my door? are some of the questions you might be able to answer when monitoring with Nagios. But are those questions really the only things you would like to be able to answer with your monitoring solution? With IT becoming increasingly important to the business, IT departments must be able to do more to ensure maximum availability of critical business applications and do so with great efficiency. Unfortunately, trying to do more with Nagios has created challenges for many IT teams. Reporting, configuration, security, and user interface have just not kept up with the new challenges. When trying to monitor more, many organizations find themselves having to implement multiple instances of Nagios to cover different groups of infrastructure, causing inefficiency and making problem identification much more difficult. 3

4 If this sounds like you, what can you do? Strapped for budgets, resources, and hours, replacing a monitoring system isn t the easiest of tasks. This is especially true for any highly customized Nagios implementation. With plugins, custom scripts, and integrations, migrating away from Nagios can be quite a project. So what s the middle ground? Can you keep the good work you have done with plugins and integrations, but still overcome the challenges? What are the capabilities for to take your monitoring system to an enterprise class level? In this guide, we ll discuss some of the gaps that you might be experiencing in your Nagios monitoring system and some solutions that might be right for you. 4

5 Overcoming the Multiple Instance Nightmare You have multiple instances of Nagios Versatility and extendibility to monitor technologies is one thing. Being able to monitor efficiently at scale is a whole different story. Expansion of your IT infrastructure comes with a large amount of new hosts and devices that need to be monitored. Scaling your monitoring with Nagios inevitably means spinning up multiple disconnected instances. Instead of viewing your IT infrastructure from one master location, you re spending too much time and energy correlating data from each instance. Every time a new disconnected instance is spun up, there is another thing that has to be checked and managed. With every new instance means more difficulty in maintaining the uptime of your IT infrastructure. Stuart Hodgson of the University of Northampton explains his team s experience with Nagios, We looked at Nagios because members of my team were familiar with it. However we quickly realized that it wasn't supportable and wouldn't scale with our environment. Why use a monitoring system that doesn t scale to cover your growing IT infrastructure? 5

6 Overcoming the Multiple Instance Nightmare Solution: Migrate and Scale Your Monitoring Solution to Grow as You Grow Opsview Monitor has a proven and fully supported master-slave architecture, allowing it to scale to support up to 20,000 devices. Each slave system effectively monitors a section of the IT infrastructure, with all the information coming together in a master view, making it much easier to visualize and monitor your entire IT environment. Migration is straight forward. Opsview is 100% compatible with all 3,000+ Nagios plugins. On top of that, Opsview provides you with over 100 preconfigured, supported Opspacks, which let you monitor a myriad of technologies straight out of the box. That means whatever technology you might encounter during your growth is not only covered with plugins, but also covered with an Opspack that can be instantly added and monitored within minutes, saving you both time and effort. What You Get Nagios Plugins Opsview Opspacks Plugin Service Checks Service Groups Variables Host Templates 6

7 Avoid Swivel Chair Management Avoiding Swivel Chair Management As we said before, working within multiple instances of Nagios will be costing you both time and effort. How familiar is this scenario? You re working off of one instance to monitor a certain area of your environment. Now turn to the monitor beside that one to bring up Zone B of your IT infrastructure. Different login, different password, different machine, and potentially a completely different monitoring solution! Now repeat the process until you re all the way down to Zone Z of your IT infrastructure. How many monitoring solutions do you use right now? It s not surprising to see IT departments with five, six, seven or more to monitor their IT infrastructure. That means seven different dashboards, seven sets of data, seven different alert strategies, seven different reports, etc., all across the organization. Collecting data from across your IT infrastructure over a multitude of different monitoring solutions can create confusion, data loss, and even discrepancies of what is actually working in your environment. Running and checking each monitoring instance means more time spent in the monitoring system, and less time and attention dedicated toward maintaining service uptime. According to the Ponemon Institute, an average cost of just one minute of downtime is $7,900, which means nearly $480,000 per hour. A more then 40% increase in just three years. 1 7

8 Avoid Swivel Chair Management Solution: Creating a Single Pane of Glass With a single master view of your entire IT infrastructure, within that master view you can easily create a single pane of glass to see your entire IT infrastructure and critical business services. Any host, server, router, or switch can be monitored. Your entire IT infrastructure can be corralled into not only one instance, but data can be displayed in a way that makes sense to your team in the business. With comprehensive graphs as well as drag and drop dashboards, you can easily visualize your entire IT infrastructure in a single pane of glass. With Hashtags, you can easily group all your service checks based on what makes sense to you and your organization. You can then utilize those Hashtags and create a meaningful dashboard to quickly identify any issues. You can take that idea a step further with Business Service Monitoring (BSM). Create business services that contain components and thresholds. If one of the components exceeds the threshold, your IT operations team is alerted to the potential of service impact. You can then investigate that business service to quickly pin point the root cause of the issue, allowing the team to resolve it before users are impacted. You can also use BSM to provide dashboards and companion reports to IT and business leaders, showing how well IT is doing in maintaining the uptime of critical business services. 8

9 Get Your Weekends Back Nagios is hard to set-up, difficult to configure, with a hard to navigate user interface Setting up a Nagios instance, configuring the device, and getting it finalized for monitoring can be a daunting task. Most Nagios configuration, set up, and installation takes place within the command line. Configuration times need to be done on an individual level. No bulk configuration editing means that new provisioning can deprive you of an entire weekend because of your monitoring system. You typically need a senior IT person with extensive Nagios knowledge and experience to configure and maintain Nagios. On-boarding new team members with Nagios can take months. All of this adds up to inefficiencies in both time and effort. New IT projects needed to drive the business go unmanned. With the pressure to do more with existing resources, applying talented IT resources to work inefficiently can be very costly. Over 70% of staff time is spent supporting existing services, usually because of incomplete information, repetitive manual work, and management through spreadsheets. 2 9

10 Get Your Weekends Back Solution: Drive IT Efficiency with the Right Monitoring Capabilities Dramatically improve efficiency with Opsview Monitor. Navigate easily through Opsview Monitor with keyboard short cuts an intuitive menu structure. With context sensitive navigation, every host group can be easily investigated in just two clicks. With any host grouping you can see all of the device info, actions being performed, notes, notifications, status history, or events that may have occurred during a given time frame. With context sensitive navigation, you can even graph and troubleshoot a problem in one single area without having to navigate away from what you were working on. Instead of taking inventory of every host within your infrastructure, Opsview will find it for you with autodiscovery. Combining that with bulk configuration changing means you can edit entire areas of your infrastructure all at once any changes are needed. This allows you to spend less time maintaining the monitoring system, and more time maintaining critical business service components. 10

11 Get Your Weekends Back Combine these aforementioned features with easy to create visuals and a full library of ready-made report templates. Create reports and analyze trends to evaluate the state of your IT infrastructure. Opsview Monitor enables you to become more informed with consistent and thorough data. This leads you to spending less time with monitoring maintenance, while making more informed decisions for your IT infrastructure. 11

12 Avoid the Speeding Bus The Nagios Knowledge Gap and Reliance on Limited Expertise Nagios requires substantial work to set up, use plugins, and configure all of which needs to be done by an experienced sysadmin with extensive Nagios knowledge. If they are out sick for a day, it s painful; if they leave the company, it is devastating. To say the least, it requires a certain amount of expertise when working with Nagios. On top of that, the rarity of this expertise is compounded by the custom nature of the system. Most Nagios implementations are like snowflakes. Every single one of them is completely different. There are some extremely smart and talented people on your team whose time might be taken up by formulating ways to overcome Nagios s limitations. So who knows what at your organization? Is it centralized to one team? Or even worse, does all the knowledge regarding your monitoring system lie within one single person who happens to be your lone Nagios expert? Consider the worst case scenarios. What s the backup plan if this person leaves, or worse, gets hit by a speeding bus? (metaphorically speaking of course!) 12

13 Solution: Avoid the Speeding Bus! Avoid the Speeding Bus Having a monitoring system that is easier to understand and work with is actually a lot closer than you might imagine. With Opsview, you keep your plugin and integration work and you can continue to use community-provided plugins to expand coverage, however, Opsview takes this to the next level with Opspacks. Opsview Opspacks are plugins on steroids. They provide not only the plugin, but also the service checks, host template, service groups, and even the variables to monitor. With these capabilities, they are much easier to implement. Opspacks cover the latest technologies that you may need: Amazon Web Services, Docker, and a full range of Microsoft technologies just to name a few. Opspacks are completely certified and documented so you don t need a degree from Nagios University to use the product. On top of that, there is a whole team of experts to rely on for help or guidance that can be contacted any time of day for any reason. Opsview s Customer Success Team maintains a great than 90% customer satisfaction rating. They will even call you to check-in and make sure everything is being monitored exactly how your organization needs. If you happen to be new to IT monitoring, Opsview offers services and instructional classes that your entire IT team can benefit from. Instead of being at the whim of one person, make your entire team monitoring experts with Opsview Monitor. 14

14 Securing Your Monitoring System Lack of Enterprise Level Security More than half of these breaches come from negligent employees, mismanagement of passwords, and system glitches. Some of these stem from simple security flaws in the system, including things like clear text passwords which are prevalent in Nagios. To compound this problem, commonly using multiple instances most likely means that departments rely heavily on single login credentials that everybody in the department shares! This can create a security nightmare! With multiple instances, lack of access governance, and clear text passwords, getting an open look into any potential Nagios system is as easy as doing a little shoulder surfing. Although on the surface, a monitoring system might not be the number one place as an entry point into an IT infrastructure; more and more hackers are targeting them. A monitoring system tends to provide a blueprint to the entire IT infrastructure and although hacking a monitoring system might not give you credit card data, having a full blueprint to your IT infrastructure is like a map to the treasure. According to IBM s 2015 Cost of Data Breach Study, the average cost of a single data breach in the United States is $6.5 million. 3 15

15 Securing Your Monitoring System Solution: Securing Your Monitoring System Securing your passwords in a set location under encryption decreases the probability of a data breach or hack by 70%. With Opsview s Security Wallet, all passwords are secured using AES 256 encryption. Opsview also secures all communication and data with TLS and HTTPS to ensure that any communication between the Opsview master, collector, or any host is completely encrypted. This ensures that if a data breach were to ever occur, it would not be due to your monitoring system. Along with role based access, accounts can be provisioned and controlled easily. Therefore, on-boarding and off-boarding are done seamlessly in the admin interface to proactively stop any potential data breach due to forgotten passwords, or worse, a single login that happens to be shared across entire departments. Proactively using a monitoring system as a security advantage instead of a potential target should be the goal of a great monitoring infrastructure. Securing Your Monitoring System Can Decrease the probability a Data Breach by 70% 16

16 Conclusion Many enterprises have evolved to multiple instances of Nagios to monitor their growing IT infrastructure. Maintaining and using the Nagios solution has become very costly in terms of people and time. Nagios monitoring challenges can impact service availability, while the cost of downtime continues to grow. Nagios may seem like an inexpensive monitoring solution, but inefficiencies and service downtime drives up the true cost. It is important to take into account all aspects of your monitoring solution to ensure that you drive IT efficiency and highly available services. Opsview can help you do this. Opsview Monitor can scale to 20,000 hosts. It includes an enterprise class feature set, including dashboards, comprehensive graph center, enterprise security, and even the added peace of mind of an entire Customer Success Team. With 100% compatibility with any Nagios plugin or configuration, bringing your Nagios instance from good to great is a lot easier than starting from scratch. Your IT monitoring goal is a lot closer than you might think. TRY IT FOR FREE Sources 1 Source: Ponemon Institute: 2013 Cost of Data Center Outages 2 Source: Computerworld: How to balance maintenance and IT innovation 3 Source: Ponemon Institute: 2015 Cost of Data Breach Study: Global Analysis 17 All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies