Independent Observability Architect

Observability isn't tax. It's leverage. When it's designed right. But too many teams pay too much for noise, drown in alert fatigue, and build dashboards nobody trusts. I've spent 25 years fixing this. Whether you're an engineer looking to level up or a team ready for hands-on help, I've got you covered.

$2.5M saved 25 years experience Fortune 500 150TiB logs/day 8M+ samples/sec at scale
Get the Book Preview

Available June 1st — The SRE On-Call Review Practice

The SRE On-Call Review Practice book cover

The SRE On-Call Review Practice

OBSERVABILITY PRACTITIONER SERIES • BOOK 1

JUNE 1ST

A practical framework for combating alert fatigue and rebuilding on-call trust. Nobody had written the book I kept wishing existed. I finally wrote it down.

Free Preview

Print and PDF editions available June 1st

More about the book and the series

Observability Architecture and Cost Optimization

Free tools and books from 25 years of hands-on experience building observability systems at scale.

FREE & OPEN SOURCE prometheus-alert-generator.com

Prometheus Alert Generator

Production-ready SLO alerting and error budget rules in minutes, not hours. No account. No install. Paste-ready YAML.

  • Multi-window burn rate alerts — fast burn (critical) and slow burn (warning)
  • Error budget tracking over 7, 30, or 90-day windows
  • Riemann Sum technique for high-cardinality environments — battle-tested at Fortune 500 scale
  • Liveness and availability monitoring included

How it works and the math behind it

FREE PREVIEW Series #1

The SRE On-Call Review Practice

A practical framework for combating alert fatigue and rebuilding on-call trust

Preview includes full table of contents:

  • Three responses to any alert
  • Alert standards and hygiene
  • When to silence an alert
  • The hidden cost of alert fatigue
  • Weekly alert review meetings
  • Distributed on-call practices
  • Measuring progress

Part of the Observability Practitioner Series

Your feedback shapes the final book. Get the preview, see what's covered, and let me know what resonates.

Independent Observability Architect

Prometheus, Grafana, ClickHouse, OpenTelemetry, OpenSearch, Datadog & Splunk

25 years building observability systems that don't break - and don't break the bank. Battle-tested at Fortune 500 scale.

Led Observability at Fortune 500 Companies:

  • Successfully migrated away from Splunk, saving $2.5 million annually
  • Supported 300+ engineers
  • 1 Billion+ active time series in Prometheus
  • 8M+ samples/second, 150TiB logs/day at scale
  • Implemented Thanos and Mimir for Prometheus clustering

Built systems that actually work:

  • HIPAA, GDPR, FedRAMP compliant Observability Platforms
  • Architected Thanos/Grafana cluster: 1B+ unique time series
  • Open source contributor: Graphite, Prometheus, Thanos
  • Built StatsRelay (multi-million UDP packets/second capacity for StatsD)

Recognition:

Technology-agnostic expertise:

Prometheus • Grafana • Thanos • Mimir • Loki • Tempo • Datadog • ClickHouse • OpenSearch • Splunk • OpenTelemetry • Graphite • StatsD • InfluxDB • Honeycomb • Coralogix

Office Hours

I keep a few slots open each month for conversations worth having: architecture questions, career decisions, tool choices, on-call problems you can't quite name yet. Not a sales call. Just two engineers talking through a hard problem.

Book a Slot

Prefer a different way to connect?

Email: jjneely@cardinality.cloud

Connect on LinkedIn

YouTube: @cardinalitycloud

Podcast: operations.fm