Valuing Observability

Published on August 19, 2024

You will have Observability. Although you might call it “monitoring.” There are so many gimmicks to what Observability is and does and it is easy to get lost in the noise. (Like the Three Pillars concept.) Which can easily translate into an Observability setup for your teams that doesn’t add value. Or worse, drains resources from the business. Observability is a practice that produces value for your customers and your business.

The Observability Value Spectrum — Where are you on the Observability Value Spectrum?

Working with teams over the past decade, most teams are usually found at one extreme or the other. I’ve been at both sides in the past! But rarely are teams in the middle, and that’s where the value is! That’s where successful teams are.

On the left, what does this look like? This is under-provisioning of your Observability needs where most time is spent removing data from telemetry.

Strict allow listed only telemetry
Vendor costs seem very high and you can’t stay within your budget
Developers don’t use Observability because of too many restrictions
SREs and DevOps cannot or have not built a set of standard dashboards to guage over all systems health
No Service Level Objectives
Uncaught or unknown outages
MTTR is high
Observability is “hard” for folks to grasp and understand
Seems like every team needs a statistician

Or, the right side is equally frustrating. There are definitely some cases where Observability spend just isn’t a blocker.

Cannot filter or drop any data – it might all be important
Leadership cannot or will not help you develop policies for Observability data
Unstructured data, unable to adopt any schema across teams
Poor or no standardized namespacing attributes or resources in data
Unorganized, broken, and many, many dashboards with no standard examples
Building a systems health dashboard is difficult due to namespacing issues
Volume of incoming data just “uncontrolled”
The “high cardinality problem” shows up everywhere
Vendors or internal solutions cannot keep up with incoming data

When I work with teams, I try to move them toward the center of this spectrum of Observability. Measuring success is measuring the value that any and all teams get from the Observability service, along with some telltale signs of good SRE practices.

Today’s global scale and distributed software platforms are hugely complex with many moving parts. These parts are not likely to stay the same day to day as your cloud service provider switches out failed hardware, new deploys happen, or shifts in response to customer demand. This environment cannot be managed without real time basic data analysis. Your business needs these skills, and this practice in action to succeed.

Valuing Observability

Categories