Independent Observability Architect
Observability isn't tax. It's leverage. When it's designed right. But too many teams pay too much for noise, drown in alert fatigue, and build dashboards nobody trusts. I've spent 25 years fixing this. Whether you're an engineer looking to level up or a team ready for hands-on help, I've got you covered.
Free consultation to 10x your Observability
Calculate Your Observability ROI vs Full Time Hire
Should you level up your existing team or hunt for that rare SRE who actually understands observability at scale? Run the numbers.
Learn Observability & SRE
Level up on your own. Free resources from 25 years of hands-on experience. No signup required.
Observability Architecture, Cost Optimization, and Consulting Services
You're not alone. I audit observability architectures to find where costs are exploding and reliability is suffering. You get a concrete roadmap with high-leverage optimization opportunities for Datadog, Splunk, Prometheus, Grafana, ClickHouse, OpenTelemetry, and OpenSearch stacks. You keep the roadmap whether we continue working together or not.
Observability Cardinality Audit
Get an Observability Roadmap that gives you confidence to make decisions
Your roadmap includes:
- Current state mapping: where your money goes
- Signal and alert gaps you're missing
- Cost-to-insight misalignment
- High-leverage optimization options (3-5 specific opportunities)
- Clear decision thresholds and tradeoffs
Start with a free discovery call. No pitch, no pressure.
You keep this roadmap whether or not we continue working together.
The SRE On-Call Review Practice
A practical framework for combating alert fatigue and rebuilding on-call trust
Preview includes full table of contents:
- Three responses to any alert
- Alert standards and hygiene
- When to silence an alert
- The hidden cost of alert fatigue
- Weekly alert review meetings
- Distributed on-call practices
- Measuring progress
Part of the Observability Practitioner Series
Your feedback shapes the final book. Get the preview, see what's covered, and let me know what resonates.
Independent Observability Architect
Prometheus, Grafana, ClickHouse, OpenTelemetry, OpenSearch, Datadog & Splunk Consulting
25 years building observability systems that don't break - and don't break the bank. Battle-tested at Fortune 500 scale.
Led Observability at Fortune 500 Companies:
- Successfully migrated away from Splunk, saving $2.5 million annually
- Supported 300+ engineers
- 1 Billion+ active time series in Prometheus
- 8M+ samples/second, 150TiB logs/day at scale
- Implemented Thanos and Mimir for Prometheus clustering
Built systems that actually work:
- HIPAA, GDPR, FedRAMP compliant Observability Platforms
- Architected Thanos/Grafana cluster: 1B+ unique time series
- Open source contributor: Graphite, Prometheus, Thanos
- Built StatsRelay (multi-million UDP packets/second capacity for StatsD)
Recognition:
- Gertrude Cox Award recipient for innovative teaching with technology
- Host of Cardinality Cloud YouTube channel
- Host of operations.fm podcast
- Conference speaker: Monitorama PDX 2023, Monitorama PDX 2019, All Things Open 2020
- Industry thought leader
Technology-agnostic expertise:
Prometheus • Grafana • Thanos • Mimir • Loki • Tempo • Datadog • ClickHouse • OpenSearch • Splunk • OpenTelemetry • Graphite • StatsD • InfluxDB • Honeycomb • Coralogix
Ready to Take the Next Step?
Dive into the free resources above, or book a discovery call to discuss your observability challenges.
No pitch, no pressure. Just a conversation to see if working together makes sense.
Prefer a different way to connect?
Email: jjneely@cardinality.cloud
YouTube: @cardinalitycloud
Podcast: operations.fm