Independent Observability Architect

Observability isn't tax. It's leverage. When it's designed right. But too many teams pay too much for noise, drown in alert fatigue, and build dashboards nobody trusts. I've spent 25 years fixing this. Whether you're an engineer looking to level up or a team ready for hands-on help, I've got you covered.

$2.5M saved 25 years experience Fortune 500 150TiB logs/day 8M+ samples/sec at scale

New Book • The SRE On-Call Review Practice

Get the Book

The SRE On-Call Review Practice

OBSERVABILITY PRACTITIONER SERIES • BOOK 1

AVAILABLE NOW

A practical framework for combating alert fatigue and rebuilding on-call trust. Nobody had written the book I kept wishing existed. I finally wrote it down.

Get the Free Preview Buy the PDF Buy the Paperback

Why I wrote it, what's in it, and what's coming next

Learn Observability & SRE

Level up on your own. Free resources from 25 years of hands-on experience. No signup required.

FREE

YouTube Channel

Deep dives into Prometheus, Grafana, OpenTelemetry, and observability architecture. Real-world examples from production systems.

FREE

Blog & Guides

Technical articles on SLOs, alerting strategies, cardinality management, and cost optimization patterns.

FREE

Prometheus Alert Generator

Free tool to build Prometheus alerting and SLO rules with proper syntax. No signup required.

Observability Architecture and Cost Optimization

Free tools and books from 25 years of hands-on experience building observability systems at scale.

FREE & OPEN SOURCE prometheus-alert-generator.com

Prometheus Alert Generator

Production-ready SLO alerting and error budget rules in minutes, not hours. No account. No install. Paste-ready YAML.

Multi-window burn rate alerts — fast burn (critical) and slow burn (warning)
Error budget tracking over 7, 30, or 90-day windows
Riemann Sum technique for high-cardinality environments — battle-tested at Fortune 500 scale
Liveness and availability monitoring included

How it works and the math behind it

Open Tool

Send feedback

BOOK 1 Series #1

The SRE On-Call Review Practice

A practical framework for combating alert fatigue and rebuilding on-call trust

Preview includes full table of contents:

Three responses to any alert
Alert standards and hygiene
When to silence an alert
The hidden cost of alert fatigue
Weekly alert review meetings
Distributed on-call practices
Measuring progress

Part of the Observability Practitioner Series

Paperback on Amazon. PDF edition from store.cardinality.cloud. Free preview includes the full table of contents and first chapter.

Get the Free Preview

Buy the PDF Buy the Paperback

Independent Observability Architect

Prometheus, Grafana, ClickHouse, OpenTelemetry, OpenSearch, Datadog & Splunk

25 years building observability systems that don't break - and don't break the bank. Battle-tested at Fortune 500 scale.

Led Observability at Fortune 500 Companies:

Successfully migrated away from Splunk, saving $2.5 million annually
Supported 300+ engineers
1 Billion+ active time series in Prometheus
8M+ samples/second, 150TiB logs/day at scale
Implemented Thanos and Mimir for Prometheus clustering

Built systems that actually work:

HIPAA, GDPR, FedRAMP compliant Observability Platforms
Architected Thanos/Grafana cluster: 1B+ unique time series
Open source contributor: Graphite, Prometheus, Thanos
Built StatsRelay (multi-million UDP packets/second capacity for StatsD)

Recognition:

Gertrude Cox Award recipient for innovative teaching with technology
Host of Cardinality Cloud YouTube channel
Host of operations.fm podcast
Conference speaker: Monitorama PDX 2023, Monitorama PDX 2019, All Things Open 2020
Industry thought leader

"In 2 paragraphs, Jack gave us more data than we had learned in the 3 years we were doing this. It wasn't what we were doing or needing to cut to save money. It was how we were doing it and what we deemed to be important versus what was actually important. After working with Cardinality: We were getting MORE valuable data and spending 37% less money to do it."

- Brandon Peskin, Turtle Systems, LLC

"Jack has been instrumental in helping bring our vision for KindHabitLabs to life. His expert skills have guided us through complex business challenges and transformed our ideas into real, functioning solutions."

- Trevor Shick, CEO, KindHabitLabs Inc

"Jack has proven his technical expertise through data migrations, including transitioning our data from Splunk to OpenSearch. His attention to detail and organizational skills made these migrations smooth and successful, minimizing disruptions and maintaining data integrity. His combination of analytical prowess, teamwork, and management skills is remarkable."

- Beth DeHart, Palo Alto Networks

Technology-agnostic expertise:

Prometheus • Grafana • Thanos • Mimir • Loki • Tempo • Datadog • ClickHouse • OpenSearch • Splunk • OpenTelemetry • Graphite • StatsD • InfluxDB • Honeycomb • Coralogix

Office Hours

I keep a few slots open each month for conversations worth having: architecture questions, career decisions, tool choices, on-call problems you can't quite name yet. Not a sales call. Just two engineers talking through a hard problem.

Book a Slot

Prefer a different way to connect?

Email: jjneely@cardinality.cloud

Connect on LinkedIn

YouTube: @cardinalitycloud

Podcast: operations.fm