Valuing Observability

Valuing Observability

August 19, 2024

You will have Observability. Although you might call it “monitoring.” There are so many gimmicks to what Observability is and does and it is easy to get lost in the noise. (Like the Three Pillars concept.) Which can easily translate into an Observability setup for your teams that doesn’t add value. Or worse, drains resources from the business. Observability is a practice that produces value for your customers and your business.

The Observability Value Spectrum

Where are you on the Observability Value Spectrum?

Working with teams over the past decade, most teams are usually found at one extreme or the other. I’ve been at both sides in the past! But rarely are teams in the middle, and that’s where the value is! That’s where successful teams are.

On the left, what does this look like? This is under-provisioning of your Observability needs where most time is spent removing data from telemetry.

  • Strict allow listed only telemetry
  • Vendor costs seem very high and you can’t stay within your budget
  • Developers don’t use Observability because of too many restrictions
  • SREs and DevOps cannot or have not built a set of standard dashboards to guage over all systems health
  • No Service Level Objectives
  • Uncaught or unknown outages
  • MTTR is high
  • Observability is “hard” for folks to grasp and understand
  • Seems like every team needs a statistician

Or, the right side is equally frustrating. There are definitely some cases where Observability spend just isn’t a blocker.

  • Cannot filter or drop any data – it might all be important
  • Leadership cannot or will not help you develop policies for Observability data
  • Unstructured data, unable to adopt any schema across teams
  • Poor or no standardized namespacing attributes or resources in data
  • Unorganized, broken, and many, many dashboards with no standard examples
  • Building a systems health dashboard is difficult due to namespacing issues
  • Volume of incoming data just “uncontrolled”
  • The “high cardinality problem” shows up everywhere
  • Vendors or internal solutions cannot keep up with incoming data

When I work with teams, I try to move them toward the center of this spectrum of Observability. Measuring success is measuring the value that any and all teams get from the Observability service, along with some telltale signs of good SRE practices.

Today’s global scale and distributed software platforms are hugely complex with many moving parts. These parts are not likely to stay the same day to day as your cloud service provider switches out failed hardware, new deploys happen, or shifts in response to customer demand. This environment cannot be managed without real time basic data analysis. Your business needs these skills, and this practice in action to succeed.