Implementation & Architecture

Turn your observability roadmap into reality. Build sustainable processes, design meaningful alerts, and train your team to own the stack. Start with a free discovery call.


You Have a Plan. Now You Need to Execute It.

Your audit revealed the problems. You know where the costs are, where the gaps are, and what needs to change. But knowing isn’t the same as doing.

The challenge isn’t technical complexity. It’s building something sustainable that your team can own and evolve without constant vendor dependency.

Most implementations fail because they optimize for the tool instead of the outcome. They pile on dashboards without defining what “healthy” actually means. They set arbitrary alert thresholds that fire constantly or not at all. They create more noise, not more clarity.

I work differently. Implementation isn’t about deploying vendor defaults. It’s about building knowledge that compounds, processes that scale, and systems your team can own going forward.

Start With a Free Discovery Call

Let’s talk about your roadmap, your team’s capabilities, and your constraints. No pitch, no pressure. Just a conversation to see if implementation support makes sense for you.

What Implementation Includes

Implementation is where we act on the roadmap. I work alongside your engineers to build sustainable processes and help them own the observability stack going forward.

Core Deliverables

Architecture Reviews

  • Evaluate current stack against your actual usage patterns
  • Design systems that scale economically, not just technically
  • Plan migrations or coexistence strategies that avoid disruption
  • Build guardrails that allow team autonomy without runaway costs

SLO and Alerting Design

  • Define what “healthy” means from the customer’s viewpoint, not arbitrary thresholds
  • Redesign on-call around valid comparisons and evidence that leads to clear decisions
  • Eliminate alerts that don’t change what you do next (they’re noise, not signals)
  • Build alerting patterns that make the next incident easier than the last

Tool Optimization

  • Right-size retention policies based on actual query patterns
  • Identify high-cardinality metrics burning budget with no value
  • Implement sampling strategies that preserve signal while cutting noise
  • Configure tooling for your business logic, not vendor defaults

Team Training and Knowledge Transfer

  • Framework-based approach to asking better questions about telemetry
  • Build falsifiable questions with scope, comparison, and measurable outcomes
  • Train your team to own and evolve the architecture independently
  • Create processes where learning sticks and compounds over time

Platform Replacement (When Needed)

  • When the audit reveals a replacement platform makes sense, we can build it together
  • Goal: More efficient and powerful than vendor defaults
  • Your team holds the keys and owns the telemetry
  • Hands-on implementation where it matters, your team does the learning

The Core Method: Better Questions

Most observability implementations fail because they don’t ask the right questions. They collect everything “just in case” and alert on arbitrary thresholds.

I use a framework built around falsifiable questions. Questions need three things:

  1. Scope: What exactly are we measuring?
  2. Comparison: Compared to what baseline or threshold?
  3. Measurable outcomes: What evidence would prove this wrong?

If you can’t say what evidence would prove a question wrong, it’s not ready yet. This framework filters what telemetry is worth keeping and what’s noise.

The same approach applies to alerting. Most alerts fire constantly or not at all because they’re based on arbitrary thresholds, not meaningful signals. An alert that doesn’t change what you do next isn’t an alert. It’s noise.

We redesign on-call around valid comparisons and evidence. When alerts map to customer impact and clear next actions, incident response becomes faster and knowledge compounds.

This Isn’t Just Incident Response

This is how teams build knowledge that compounds. When learning sticks, the next incident gets easier. The team gets faster. The system gets more reliable.

That’s when observability stops being an expense and starts being leverage.

You’re not outsourcing observability to me. You’re building internal capability. I stay close to the work through regular calls and async access, but your team does the learning and owns the outcome.

How It Works

  1. Free discovery call to understand your roadmap and team capabilities
  2. Scope the engagement based on your priorities (architecture, alerting, migration, training)
  3. Regular working sessions with your team (calls + async access)
  4. Hands-on implementation where needed, with knowledge transfer built in
  5. Your team owns it when we’re done

Engagement length depends on scope. Typical engagements run 4-12 weeks depending on complexity.

Who This Is For

Ideal if you:

  • Have an IT, Software Engineering, DevOps, or SRE organization (50-1,500 employees)
  • Completed an observability audit and have a roadmap to execute
  • Want to redesign alerting, implement SLOs, or migrate platforms
  • Need hands-on architecture work with team training built in
  • Prefer building internal capability over vendor dependency

Who I Am

Jack Neely – Independent Observability Architect, Cardinality Cloud

  • 25 years in systems architecture and SRE
  • Led observability teams at Palo Alto Networks and Fitbit
  • Saved $2.5M annually migrating away from Splunk
  • Implemented Thanos at enterprise scale for Prometheus clustering (8M+ samples/sec, 150TiB logs/day, 300+ engineers)
  • Open source contributor (Graphite, Prometheus, Thanos, StatsRelay)
  • Compliance expertise: HIPAA, GDPR, FedRAMP
  • Host: Cardinality Cloud YouTube channel

I’m a practitioner who’s built observability systems that don’t break and don’t break the bank.

What Happens After?

Three outcomes:

  1. We scope an engagement (architecture, alerting, migration, training)
  2. Not a priority right now (fine, maybe we work together later)
  3. You implement yourself (great, the conversation gave you what you need)

No pressure. Just clarity.

Book Your Free Discovery Call

Let’s talk about your roadmap and see if implementation support makes sense.


FAQ

Q: Do I need to have completed an audit first? A: Not necessarily. If you have a clear roadmap and know what needs to be built, we can start there. The discovery call will help us figure out if we’re aligned.

Q: How hands-on is the implementation work? A: As hands-on as needed. Some engagements are mostly architecture reviews and training. Others involve building migration plans or redesigning alert systems together. The goal is always knowledge transfer—your team owns it when we’re done.

Q: What if we want to replace our observability platform? A: We can do that. When the audit reveals that replacement makes sense, we build a system that’s more efficient than vendor defaults. Your team holds the keys and owns the telemetry.

Q: How long does a typical engagement last? A: Depends on scope. Alerting redesign might be 4-6 weeks. Platform migration with training might be 8-12 weeks. We’ll scope it during the discovery call.

Q: Do you work remotely or on-site? A: Remote-first with regular calls and async access. On-site visits can be arranged if beneficial, but most work happens through structured working sessions with your team.

Q: What if our team doesn’t have strong observability experience? A: That’s common and completely fine. Training and knowledge transfer are built into every engagement. The goal is to build your team’s capability, not create dependency on me.