4 ways to improve app monitoring with DevOps
Author: Alois Reitbauer
Once a company begins using a DevOps mindset to run and deliver its applications, process inefficiencies can become obvious.
Companies begin to realize that while product innovations are shipped to production faster, and value is delivered to customers quicker, operational problems become more frequent and eventually slow down innovation.
It’s at this point that organizations begin to rethink how they can get rid of inefficiencies by moving from a “is everything up and running?” monitoring approach to actively supporting and improving their DevOps processes.
We performed a survey among a large number of DevOps practitioners to learn how they turned their monitoring practices into valuable components of their fast-paced product delivery cycles. Following are the top best practices that enable such change.
1. Consolidate your tool set
Most organizations today rely on four or more tools to collect all the monitoring data they need. This didn’t happen intentionally. Innovation in the monitoring space is moving fast and new ways of monitoring surface every year.
This has forced many companies to use a wide range of tools to satisfy all their monitoring needs. While this has given them insight into all aspects of application monitoring, it’s also created an operational problem. These tools aren’t aware of each other — and a lot of work must be invested into aggregating their data and producing actionable insights.
These shortcomings have given rise to so-called Full-Stack or all-in-one application performance monitoring tools. The key to the success of these tools is that they not only provide information regarding all aspects of monitoring, they link the information together so it can be interpreted in a consistent way.
Full stack monitoring tools provide a single, consistent view into what could previously only be seen using four or more tools.
2. Enrich data with contextual information
Most monitoring tools today focus on exposing individual data streams — and performing analytics on those streams, for example, anomaly detection. This makes data interpretation a time-consuming manual process. To arrive at actionable conclusions, data must first be manually enriched with contextual information. Unfortunately, enriching data with contextual meaning manually isn’t an approach that scales up to large systems due to the limitations of human operators.
Having contextual information automatically available is a prerequisite for effectively prioritizing issues. There is a substantial difference between infrastructure problems that affect users and those that don’t. Companies like Ruxit have made the collection of such contextual information a native part of their monitoring solutions.
And when you can enrich data monitoring with contextual information, you can make more informed decisions about application health.
3. Rethink the reporting of operational problems
Simple metrics-based alerts no longer do the trick. Even while the best alerting practices can help minimize wrong alerts, operators are confronted with a large number of operational alerts simply because systems are built from numerous smaller components.
When something breaks, many individual components tend to break along with it. This is a natural side effect of fine-grained systems.
The best way to cope with this challenge is to use contextual information to intelligently group alerts that have a contextual relationship into higher level problems. It just makes way more sense to report a single combined backend/frontend problem than to report one problem for the frontend and a second problem for the backend.
Visualizing complex application problems as dependencies of individual components increases problem resolution speed.