About

 


Observability Strategist | 25+ Years | Fortune 500 Experience

Jack Neely

Jack Neely, Observability Strategist, Cardinality Cloud, LLC

You have the tools. You’re missing the strategy.

Most SaaS companies treat observability as overhead—insurance against outages. I help you see it differently: as a decision-making system that cuts costs, accelerates incident response, and improves reliability. You don’t need more dashboards. You need better decisions about what to measure, when to alert, and where to invest.

Why Vendor-Neutral Matters

Observability vendors make more money when you send them more data. More logs, more metrics, more traces = bigger bills for you, bigger revenue for them. I’m incentivized to do the opposite: help you collect smarter data, reduce waste, and make better decisions with what you have.

Typical engagements deliver 10-20% cost savings plus faster incident response.

What I’ve Built

From pre-seed startups to Fortune 500 enterprises, I’ve architected observability systems at scale:

  • 150+ TiB/day in logs (OpenSearch, Loki)
  • 8M+ samples/second (Prometheus, Thanos, Mimir)
  • 400M+ active time series (Grafana)
  • $2.5M annual savings from Splunk migration
  • 300+ engineers supported
  • 80+ Kubernetes clusters globally

I’ve solved high-cardinality challenges, built platform teams from scratch, and trained engineering organizations on SRE best practices. My focus: Prometheus/Grafana ecosystems, vendor-neutral architecture, and building teams that ship reliable software at speed.

Open Source & Community

I contribute to the observability ecosystem through open source and education:

  • Open source contributor to Graphite, Prometheus, and Thanos projects
  • prometheus-alert-generator.com - Free tool for creating SLO alerting rules (100+ users in first week)
  • operations.fm podcast host
  • Conference speaker and industry thought leader
  • Gertrude Cox Award recipient for innovative teaching with technology

The common thread? Making better decisions with your data.

Experience Highlights

Fractional CTO, KindHabitLabs, Inc

Pre-seed healthcare SaaS startup building Roo Mi, a HIPAA-compliant behavioral health platform for treatment facilities. Designed production AWS infrastructure (ECS, RDS PostgreSQL, CloudFront), migrated from prototype to production-grade architecture, established CI/CD pipelines, and mentored development team on cloud architecture and security practices.

Sr. Principal DevOps Observability Architect, Palo Alto Networks

Led global observability team for Prisma Cloud’s multi-cloud security platform. Architected unified observability across 80+ Kubernetes clusters in all regions including China and FedRAMP High. Managed 50+ TiB/day in OpenSearch and Loki, 150M+ metrics in Prometheus/Thanos/Mimir, and orchestrated migration from Splunk saving $2.5 million annually. Solved high-cardinality business intelligence challenges using AWS Kinesis and Apache Flink streaming pipelines.

Systems Architect, Fitbit, Inc

Built Fitbit’s Visibility Engineering team and implemented a Prometheus and Thanos observability platform ingesting 8 million data points per second. Migrated entire monitoring stack to Google Cloud Platform. Conducted time series forecasting for capacity planning during peak events. Led global migration from StatsD/Graphite to Prometheus, mentoring engineering teams across Python, Go, and Java. Contributed upstream fixes to Thanos and Prometheus projects.

Consulting, 42 Lines, Inc

Systems Architect for multiple SaaS products. Built scalable AWS load balancing solutions with Network Load Balancers and HAProxy. Introduced Prometheus and Grafana with Four Golden Signals dashboards. Created conference presentations and webinars on SRE best practices and observability-driven business decisions.

Operations and Systems Specialist, North Carolina State University

Practiced Site Reliability Engineering before the term existed. Built and maintained infrastructure for 100,000+ active users including email, Kerberos, and file storage. Led “Realm Linux” project—a fully automated installation system deployed across thousands of workstations and servers. Implemented load balancing, configuration management (Bcfg2/Puppet), and trained system administrators campus-wide.