Observability Isn't Tools, It's How You Think

Published on January 15, 2026

When an incident hits, most teams don’t lack data. They lack observability. They lack clarity.

Observability isn’t tooling. It’s not about vendors. It’s not dashboards, metrics, or traces. Observability is a practice. The practice of knowing what we understand about production systems. The practice of reasoning from evidence under uncertainty.

What I’m about to walk through isn’t a framework for observability. It IS observability. This is how we practice knowing what we understand about production systems.

The Five Steps ARE Observability

These five steps define observability as a practice:

Ask falsifiable questions
Understand your measurements
Design valid comparisons
Interpret evidence
Build knowledge

This is epistemics applied to production systems. How we know what we understand when systems are failing. How we move from confusion to knowledge.

We’ve all seen this: dashboards everywhere, alerts firing, and the same questions keep coming up. What’s going on? Is it the database? Did the deploy break something?

Step 1: Ask Falsifiable Questions

The first failure in most incidents isn’t technical. It’s epistemic.

We ask questions that feel urgent but aren’t answerable:

“Is the database slow?”
“Is the network acting up?”
“Could it be Kubernetes?”

These questions cannot be proven wrong. Which means they can’t be answered, only argued.

What Makes a Question Falsifiable?

A falsifiable question sounds like this: For checkout requests in us-east-1, did the schema migration at 14:05 increase p95 database latency compared to the previous deploy?

Now we have:

Scope: checkout requests in us-east-1
Comparison: before and after the schema migration
Measurable outcome: p95 database latency

Tools That Help

Deploy markers, version labels, trace annotations, these don’t answer questions. They make better questions possible.

Deploy markers in your dashboards, versioning labels in your Prometheus metrics (like the build_info metric), Kubernetes labels and annotations, and tracing all enable this practice.

Common Mistake

If you can’t say what evidence would prove your question wrong, it’s not ready yet. That’s not a valid alert. That’s not a valid comparison.

Step 2: Understand Your Measurements

Once we have a question, the next mistake is assuming our data is telling us the truth.

A metric is not reality. It’s a lossy measurement with assumptions baked in.

Before you use a signal to answer a question, you need to understand three things:

What it measures
What it hides
When it lies

The Latency Example

Take latency. From where to where? Retries included? Client or server side? Sampled or complete? Worse, an average?

Latency from the load balancer might well include retries, but latency from your application likely doesn’t. Is this sampled? Is this a full distribution of our latency measurements? Or worse, is it an average and just actively lying to you?

Tools That Help

Metric definitions (the help text in defining Prometheus metrics), histogram awareness, sampling configs. This is boring stuff, but it’s where most incidents go sideways.

Common Mistake

Green dashboards during outages are often a warning sign, not reassurance.

Absence of evidence is not evidence of absence.

Step 3: Design Valid Comparisons

Numbers by themselves don’t explain systems. Differences do.

Observability data is almost always relative. The question is: Compared to what?

Good comparisons isolate one variable and hold everything else as constant as possible.

Examples of Good Comparisons

v1.4.7 vs v1.4.6
Canary vs baseline
One availability zone vs another

Tools That Help

Canary deploys, version tags, high-cardinality dimensions. These enable clean comparisons. This is why we do canary deploys. Versioning tags, using high-cardinality labels on your logs and your traces allow us to slice and dice and ask specific questions that isolate specific variables.

Common Mistake

Comparing “now” to “yesterday” without acknowledging traffic differences, deploys that have happened, or seasonality.

If everything is different, nothing is explanatory.

Step 4: Interpret Evidence

Once you’ve designed a comparison, the temptation is to jump to conclusions.

But evidence doesn’t eliminate uncertainty. It changes it.

Interpreting evidence means asking: Does this increase confidence? Decrease it? Refute the hypothesis? Or tell us nothing?

Four Ways Evidence Affects Certainty

Consistent: increases confidence in the hypothesis
Competing: decreases confidence (suggests alternative without refuting)
Refutes: contradicts the hypothesis directly
Inconclusive: no change to uncertainty

Worked Example

Say we hypothesize the schema migration caused the latency spike. The metrics show p95 increased at 14:05, but the issue started at 14:00.

This doesn’t refute the migration, but it’s competing evidence that decreases our confidence.

If reads stayed flat but writes spiked, that would refute a schema migration affecting both.

Tools That Help

Multiple signal types (metrics, logs, traces) help cross-check reality. Being able to correlate multiple signal types, traces, logs, metrics, logs from different systems.

Common Mistakes

Correlation is not causation. Just because two events happened at the same time doesn’t mean they caused each other.

Narrative lock-in. If you have evidence pointing in one direction, and all your SRE buddies are converging on a different conclusion, stick to your evidence. Don’t just conform.

Premature closure. Coming across some evidence that looks reasonable, slapping down a deploy, the graphs look better, and walking away. That doesn’t teach us anything and probably didn’t solve the problem either.

If your data can’t surprise you, you’re not really interpreting it.

Step 5: Build Knowledge

Most incident response frameworks stop here. Fix the issue. Write the postmortem. Move on.

Here’s why that fails: postmortems document what happened, but not how we knew.

Knowledge isn’t timelines or dashboards. It’s what remains when the graphs are gone.

What to Capture

Capture these things in your postmortems:

What hypotheses were wrong and which ones were correct
Which signals misled us or led to confirmation of a hypothesis
Which comparisons worked and which comparisons led to that signal
Why decisions were made under uncertainty (perhaps time crunch, but know why decisions were made with less than full certainty)

Common Mistake

If learning doesn’t compound, observability becomes just an expense.

Without building knowledge, you’re not practicing observability.

The Practice of Observability

Let’s recap.

Observability isn’t about more data. It’s about knowing what we understand.

These five steps ARE observability:

Ask falsifiable questions that can be proven wrong
Understand your measurements (what they measure, hide, and when they lie)
Design valid comparisons that isolate variables
Interpret evidence carefully without jumping to conclusions
Build knowledge that makes the next incident easier

This is how we practice reasoning from evidence under uncertainty. This is how observability becomes leverage, not a tax.

Need help training your team on these practices? Want to get in front of your CFO before that six-figure observability bill hits? Contact me today.

Observability Isn't Tools, It's How You Think

Observability Isn't Tools, It's How You Think

The Five Steps ARE Observability

Step 1: Ask Falsifiable Questions

What Makes a Question Falsifiable?

Tools That Help

Common Mistake

Step 2: Understand Your Measurements

The Latency Example

Tools That Help

Common Mistake

Step 3: Design Valid Comparisons

Examples of Good Comparisons

Tools That Help

Common Mistake

Step 4: Interpret Evidence

Four Ways Evidence Affects Certainty

Worked Example

Tools That Help

Common Mistakes

Step 5: Build Knowledge

What to Capture

Common Mistake

The Practice of Observability

Tags

Categories