AI Log Analytics with ClickHouse: Tracing Requests and Extracting JSON

Published on January 24, 2026

This is part 2 of building cost-effective log analytics with ClickHouse Cloud. In part 1, we built CloudFront analytics for $3,300/year. Now let’s add Lambda logs with request tracing and JSON extraction.

If you haven’t read part 1, go read it first. The IAM setup and architecture concepts apply here too.

Why Lambda Logs Matter

You’re running serverless. Lambda functions power your APIs, process events, handle webhooks. But CloudWatch Logs Insights? That’s $0.005 per GB scanned. Query your logs a few times and you’ve spent more than this entire solution costs.

ClickHouse gives you:

Request tracing across multiple log lines
JSON structure extraction from application logs
Performance percentiles (cold starts, duration, memory)
Sub-second queries on millions of log lines
Fixed costs instead of per-query charges

SMART Goals: Lambda Log Analytics

Building on the CloudFront setup:

Specific: Parse Lambda CloudWatch logs exported to S3, extract JSON structured data from application logs, and enable request-level tracing.
Measurable: Costs. Don’t double the costs.
Achievable: Reuse ClickHouse infrastructure, IAM roles, and S3 bucket from CloudFront setup.
Relevant: Enable performance monitoring and debugging for serverless applications.
Time Bound: Setup should take about a day-ish.

Architecture

Building on the CloudFront setup from part 1, we’re adding a second S3Queue pipeline:

Data Flow:

Lambda executes, logs go to CloudWatch Logs
CloudWatch Logs export to S3 (hourly or daily)
S3Queue polls s3://bucket/lambda/** prefix
ClickHouse parses logs and extracts structured data
Query with Grafana, CLI, or ClickHouse console

Same infrastructure:

Same S3 bucket (different prefix)
Same IAM roles (already configured)
Same ClickHouse Cloud instance

Costs

We’re reusing infrastructure from the CloudFront setup. The main cost variable is how you get logs to S3.

Ingestion approaches:

Approach	Ingestion Cost	Notes
CloudWatch export to S3	$0.50/GB	Simple, mature Terraform support
Lambda direct to S3	$0.25/GB	Native AWS feature (2025), Terraform support still maturing
Kinesis + logging library	$0.12/GB	Best for high volume, requires app changes

Lambda’s direct S3 streaming is a first-party AWS feature released in 2025. It halves ingestion costs but Terraform support is still buggy. For very high volumes, using AWS MSK or Kinesis with a logging library that writes directly to the bus gives you more control and potentially lower costs.

This article uses CloudWatch export for simplicity and reliability.

Incremental costs (CloudWatch export approach):

CloudWatch Logs ingestion: $0.50/GB uncompressed
Storage: S3 at $0.023/GB/month + ClickHouse at $0.025/GB/month
ClickHouse compute: Already running the 8GiB instance ($220/month)
Data transfer: Zero (same region, S3 API reads)
S3 LIST requests: ~$0.04/month (5-10 minute polling)

Assuming moderate Lambda logging volume (~250GB compressed/month, 0.15 compression ratio = ~1,667GB uncompressed):

CloudWatch ingestion: 1,667GB × $0.50 = $833/month
S3 storage: 250GB × $0.023 = $5.75/month
ClickHouse storage: 250GB × $0.025 = $6.25/month
Total incremental: ~$845/month

Combined solution costs:

CloudFront logs: $275/month (from part 1)
Lambda logs: +$845/month
Total: ~$1,120/month or ~$13,440/year

CloudWatch ingestion dominates. With direct S3 streaming, you’d cut that roughly in half (~$417/month). Either way, the value is making logs queryable without per-query costs from CloudWatch Logs Insights ($0.005/GB scanned).

Understanding Lambda Log Format

Lambda logs exported from CloudWatch have this format:

TIMESTAMP MESSAGE

Example log types:

2026-01-19T20:50:10.108Z INIT_START Runtime Version: provided:al2023.v117
2026-01-19T20:50:10.216Z START RequestId: f6499439-8583-43e7-bdcb-b43eba4ed50a Version: $LATEST
2026-01-19T20:50:10.217Z 2026/01/19 20:50:10 192.168.1.1 POST /api/process Took: 1.879µs
2026-01-19T20:50:10.219Z END RequestId: f6499439-8583-43e7-bdcb-b43eba4ed50a
2026-01-19T20:50:10.219Z REPORT RequestId: f6499439-8583-43e7-bdcb-b43eba4ed50a	Duration: 2.12 ms	Billed Duration: 106 ms	Memory Size: 512 MB	Max Memory Used: 26 MB	Init Duration: 103.69 ms

Key patterns to extract:

Timestamps: ISO 8601 format (2026-01-19T20:50:10.108Z)
Log types: INIT_START, START, END, REPORT, or application logs
RequestId: Appears in START/END/REPORT lines (enables request tracing)
REPORT metrics: Duration, memory usage, cold start time
Application logs: Custom format with HTTP method, path, duration
Embedded JSON: Some logs contain structured JSON data

Implementation

Enable CloudWatch Logs Export to S3

Configure your Lambda functions to export logs to the same S3 bucket using the lambda/ prefix:

1
2
3
4
5
6
7


# Using AWS CLI to create export task (one-time or scheduled)
aws logs create-export-task \
  --log-group-name /aws/lambda/your-function-name \
  --from $(date -d '1 day ago' +%s)000 \
  --to $(date +%s)000 \
  --destination s3://your-bucket-name \
  --destination-prefix lambda/

Or use CloudWatch Logs subscription filters for real-time export, or schedule exports with EventBridge.

Reuse IAM Roles

If you followed part 1, your IAM role already has access to the S3 bucket. No changes needed. The ClickHouse instance can already read from s3://bucket/lambda/**.

Create Lambda Log Schema

The schema handles multiple complexities:

Function name extraction: Auto-detects Lambda function name from S3 path
Stream ID extraction: Identifies execution environment (container) from S3 path
Log type classification: Detect INIT_START, START, END, REPORT, or custom logs
RequestId extraction: Pull from START/END/REPORT lines for tracing
REPORT parsing: Extract duration, memory, cold start metrics
JSON extraction: Find and extract {...} patterns, parse as JSON type
HTTP pattern matching: Parse custom app logs like IP METHOD /path Took: Xµs

The complete schema: lambda-clickhouse-schema.sql

Key features:

Multi-function support: Logs from multiple Lambda functions in one table
Stream ID extraction: Track container reuse and concurrency patterns
Daily partitioning for efficient querying
JSON type for dynamic schema and simple queries
Materialized columns for common filters (is_cold_start, is_error, has_json)
Extracted fields for REPORT metrics
Polymorphic JSON support (different event types, different fields)
Efficient filtering by function name (indexed in ORDER BY)
Request-level tracing via request_id, container-level via stream_id
90-day TTL
MCP-ready for AI-powered exploration

Apply the schema:

1
2
3
4
5


# Connect to ClickHouse
clickhouse-client --host <your-instance>.clickhouse.cloud --secure --user default --password <password>

# Run the schema
\. lambda-clickhouse-schema.sql

Testing and Debugging

Verify S3 access before the queue starts consuming:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


-- List files matching the glob pattern (verify IAM permissions)
SELECT
  _path,
  _file
FROM s3(
  'https://s3.amazonaws.com/<BUCKET_NAME>/lambda/**.gz',
  'LineAsString',
  'raw_line String',
  'gzip',
  extra_credentials(role_arn = 'arn:aws:iam::012345678901:role/ClickHouseAccess-foobar')
)
LIMIT 20;

Important: The 'raw_line String' parameter is required for LineAsString format - it defines the single column structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


-- Check actual log content
SELECT
  _path,
  _file,
  raw_line
FROM s3(
  'https://s3.amazonaws.com/<BUCKET_NAME>/lambda/**.gz',
  'LineAsString',
  'raw_line String',
  'gzip',
  extra_credentials(role_arn = 'arn:aws:iam::012345678901:role/ClickHouseAccess-foobar')
)
LIMIT 5;

If you get 0 rows, check:

Bucket name: Verify S3 bucket exists and has files
Path pattern: Ensure lambda/**.gz matches your actual structure
IAM permissions: Role must have s3:GetObject and s3:ListBucket
Region: Bucket and ClickHouse instance should be in same region

Check queue processing status:

1
2
3
4


SELECT * FROM system.s3queue_log
WHERE table = 'lambda_logs_queue'
ORDER BY event_time DESC
LIMIT 20;

Querying Lambda Logs

Request Tracing

The killer feature: trace a single request across all log lines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


SELECT
    timestamp,
    log_type,
    message,
    duration_ms,
    max_memory_used_mb,
    init_duration_ms
FROM lambda_logs
WHERE request_id = 'f6499439-8583-43e7-bdcb-b43eba4ed50a'
ORDER BY timestamp;

Results:

┌───────────────timestamp─┬─log_type─┬─message────────────────────────┬─duration_ms─┬─max_memory_used_mb─┬─init_duration_ms─┐
│ 2026-01-19 20:50:10.216 │ START    │ START RequestId: f6499...      │           0 │                  0 │                0 │
│ 2026-01-19 20:50:10.217 │ INFO     │ 2603:6080... OPTIONS /api/...  │           0 │                  0 │                0 │
│ 2026-01-19 20:50:10.219 │ END      │ END RequestId: f6499...        │           0 │                  0 │                0 │
│ 2026-01-19 20:50:10.219 │ REPORT   │ REPORT RequestId: f6499...     │        2.12 │                 26 │           103.69 │
└─────────────────────────┴──────────┴────────────────────────────────┴─────────────┴────────────────────┴──────────────────┘

See everything: when it started, what it did, how long it took, memory used, whether it was a cold start.

Multi-Function Analysis

The schema automatically extracts the Lambda function name from the S3 path. This lets you analyze multiple functions in one table.

List all functions:

1
2
3
4
5
6
7
8


SELECT
    function_name,
    count() AS log_lines,
    countIf(log_type = 'START') AS invocations
FROM lambda_logs
WHERE date >= today() - 7
GROUP BY function_name
ORDER BY invocations DESC;

Results:

┌─function_name────┬─log_lines─┬─invocations─┐
│ roi-calculator   │    45,823 │      12,450 │
│ webhook-handler  │    23,102 │       5,234 │
│ image-processor  │    15,891 │       3,876 │
└──────────────────┴───────────┴─────────────┘

Compare performance across functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


SELECT
    function_name,
    count() AS invocations,
    quantile(0.5)(duration_ms) AS p50_ms,
    quantile(0.9)(duration_ms) AS p90_ms,
    avg(max_memory_used_mb) AS avg_memory_mb,
    countIf(init_duration_ms > 0) AS cold_starts
FROM lambda_logs
WHERE date = today() AND log_type = 'REPORT'
GROUP BY function_name
ORDER BY invocations DESC;

This shows which functions are slow, memory-hungry, or experiencing cold starts.

Filter queries by function:

1
2
3
4
5
6


-- Performance for specific function
SELECT quantile(0.9)(duration_ms) AS p90_ms
FROM lambda_logs
WHERE date = today()
  AND function_name = 'roi-calculator'
  AND log_type = 'REPORT';

The function_name is first in the ORDER BY clause, making these queries efficient.

Performance Monitoring

Lambda duration percentiles:

1
2
3
4
5
6
7
8


SELECT
    quantile(0.5)(duration_ms) AS p50_ms,
    quantile(0.9)(duration_ms) AS p90_ms,
    quantile(0.99)(duration_ms) AS p99_ms,
    avg(max_memory_used_mb) AS avg_memory_mb,
    max(max_memory_used_mb) AS peak_memory_mb
FROM lambda_logs
WHERE date >= today() - 7 AND log_type = 'REPORT';

Cold Start Analysis

Track cold start frequency and performance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


SELECT
    toStartOfHour(timestamp) AS hour,
    countIf(init_duration_ms > 0) AS cold_starts,
    count() AS total_invocations,
    round(countIf(init_duration_ms > 0) / count() * 100, 2) AS cold_start_pct,
    quantile(0.9)(init_duration_ms) AS p90_init_ms
FROM lambda_logs
WHERE date = today() AND log_type = 'REPORT'
GROUP BY hour
ORDER BY hour;

Container Reuse and Concurrency Patterns

Understanding Log Streams:

CloudWatch organizes Lambda logs into streams, where each stream represents a single execution environment (container). When Lambda scales, it creates new containers with new streams. In CloudWatch, this prevents log interweaving.

What happens when we consolidate into ClickHouse?

We preserve per-request tracing via request_id
We add container tracking via stream_id (extracted from S3 path)
We lose the strict stream-level isolation CloudWatch provides

But we gain powerful analysis capabilities:

Container reuse patterns:

1
2
3
4
5
6
7
8
9


SELECT
    function_name,
    uniq(stream_id) AS unique_containers,
    count() AS total_invocations,
    round(count() / uniq(stream_id), 2) AS avg_invocations_per_container
FROM lambda_logs
WHERE date = today() AND log_type = 'START' AND stream_id != ''
GROUP BY function_name
ORDER BY unique_containers DESC;

Results show how efficiently Lambda reuses containers:

┌─function_name────┬─unique_containers─┬─total_invocations─┬─avg_invocations_per_container─┐
│ roi-calculator   │               142 │            12,450 │                          87.68 │
│ webhook-handler  │               523 │             5,234 │                          10.01 │
└──────────────────┴───────────────────┴───────────────────┴───────────────────────────────┘

High reuse (roi-calculator) means warm containers and low cold start rate. Low reuse (webhook-handler) suggests frequent cold starts or high concurrency.

Concurrent execution analysis:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


SELECT
    function_name,
    toStartOfMinute(timestamp) AS minute,
    uniq(stream_id) AS concurrent_containers
FROM lambda_logs
WHERE date = today() AND log_type = 'START' AND stream_id != ''
GROUP BY function_name, minute
HAVING concurrent_containers > 5
ORDER BY concurrent_containers DESC
LIMIT 20;

This identifies peak concurrency periods where multiple containers run simultaneously.

Trace a specific container’s lifetime:

1
2
3
4
5
6
7
8
9


SELECT
    timestamp,
    request_id,
    duration_ms,
    init_duration_ms
FROM lambda_logs
WHERE stream_id = '[$LATEST]abc123-def456'
  AND log_type = 'REPORT'
ORDER BY timestamp;

See every request handled by a single container, including the initial cold start.

Practical implications:

Request tracing still works perfectly: Use request_id to follow a single invocation
Container insights are new: CloudWatch doesn’t make this analysis easy
Log interweaving: Only matters if querying by timestamp alone (use request_id or stream_id for grouping)
Same as most observability platforms: Datadog, New Relic, etc. also consolidate streams

JSON Data Extraction

This schema uses ClickHouse’s JSON type for dynamic schema inference and simpler querying. This is perfect for polymorphic data where different event types have different fields.

Extract structured data from application logs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


SELECT
    timestamp,
    request_id,
    json_data.event AS event_type,
    json_data.observabilitySpend AS observability_spend,
    json_data.savings3Year AS savings_3year,
    json_data.savingsPercentage AS savings_pct,
    json_data.ipAddress AS client_ip
FROM lambda_logs
WHERE date = today()
  AND has_json = 1
  AND position(message, 'CALCULATION_TRACKED') > 0;

Notice the simple json_data.fieldName syntax instead of JSONExtractString() functions.

Results:

┌───────────────timestamp─┬─event_type──────────┬─observability_spend─┬─savings_3year─┬─savings_pct─┐
│ 2026-01-19 20:50:10.272 │ calculation_tracked │              500000 │       2115000 │          50 │
└─────────────────────────┴─────────────────────┴─────────────────────┴───────────────┴─────────────┘

Discover all JSON fields:

1

DESCRIBE TABLE lambda_logs;

This shows all the dynamically created columns from your JSON data. As new event types arrive with different fields, ClickHouse automatically adds columns.

Why JSON type matters:

Simple SQL syntax (no need to remember which JSONExtract function to use)
Schema discovery (run DESCRIBE to see all fields)
Type inference (ClickHouse detects types automatically)
Perfect for MCP integration (Claude can explore the schema dynamically)
Handles polymorphic data (different event types with different fields)

Now you can analyze business metrics embedded in logs.

HTTP Endpoint Analysis

If your application logs include HTTP patterns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


SELECT
    http_method,
    http_path,
    count() AS requests,
    quantile(0.5)(http_duration_us / 1000) AS p50_ms,
    quantile(0.9)(http_duration_us / 1000) AS p90_ms,
    quantile(0.99)(http_duration_us / 1000) AS p99_ms
FROM lambda_logs
WHERE date = today() AND http_method != ''
GROUP BY http_method, http_path
ORDER BY requests DESC;

Error Rate Monitoring

Track errors over time:

1
2
3
4
5
6
7
8
9


SELECT
    toStartOfHour(timestamp) AS hour,
    countIf(is_error = 1) AS errors,
    count() AS total_logs,
    round(countIf(is_error = 1) / count() * 100, 2) AS error_rate_pct
FROM lambda_logs
WHERE date >= today() - 7
GROUP BY hour
ORDER BY hour;

Visualization

Grafana Dashboards:

Invocation count over time
Duration percentiles (p50, p90, p99)
Cold start rate
Memory utilization
Error rate
Per-endpoint latency

ClickHouse Console: Create dashboards for at-a-glance monitoring.

Advanced: Combining CloudFront and Lambda Logs

Join CloudFront requests with Lambda executions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


-- Find slow Lambda executions for specific endpoints
SELECT
    l.timestamp,
    l.request_id,
    l.http_path,
    l.duration_ms AS lambda_duration_ms,
    c.time_taken * 1000 AS cloudfront_total_ms,
    c.time_to_first_byte * 1000 AS cloudfront_ttfb_ms,
    c.c_ip
FROM lambda_logs l
JOIN cloudfront_logs c ON toStartOfMinute(l.timestamp) = toStartOfMinute(c.timestamp)
    AND l.client_ip = c.c_ip
WHERE l.date = today()
  AND l.log_type = 'REPORT'
  AND l.duration_ms > 1000
ORDER BY l.duration_ms DESC
LIMIT 100;

This correlates backend performance with frontend experience.

MCP Integration: AI-Powered Log Analysis

The JSON type makes this setup perfect for MCP (Model Context Protocol) integration. The official ClickHouse MCP server lets Claude query your logs naturally.

Why JSON type is ideal for MCP:

Schema discovery: Claude can run DESCRIBE TABLE to see what fields exist
Simple SQL: Claude generates cleaner queries with json_data.field syntax
Dynamic exploration: No hardcoded field names needed
Type safety: ClickHouse handles type inference automatically

Example MCP interaction:

User: "Show me ROI calculations from the last hour with savings over $1M"

Claude (via MCP):
SELECT
    timestamp,
    json_data.event,
    json_data.observabilitySpend,
    json_data.savings3Year,
    json_data.ipAddress
FROM lambda_logs
WHERE date = today()
  AND timestamp > now() - INTERVAL 1 HOUR
  AND json_data.event = 'calculation_tracked'
  AND json_data.savings3Year > 1000000
ORDER BY timestamp DESC;

Claude can explore your logs interactively without you writing SQL. The JSON type makes this seamless.

Setting Up MCP with Claude Code

The official ClickHouse MCP server makes your logs conversationally accessible. Here’s how to set it up:

1. Configure the MCP Server

Add the ClickHouse MCP server to your Claude Code configuration file (~/.claude.json). This example uses uv (the fast Python package manager) to run the MCP server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


{
  "mcpServers": {
    "clickhouse": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "mcp-clickhouse",
        "--python",
        "3.10",
        "mcp-clickhouse"
      ],
      "env": {
        "CLICKHOUSE_HOST": "your-instance.us-east-2.aws.clickhouse.cloud",
        "CLICKHOUSE_PORT": "8443",
        "CLICKHOUSE_USER": "default",
        "CLICKHOUSE_PASSWORD": "your-password-here",
        "CLICKHOUSE_DATABASE": "o11y",
        "CLICKHOUSE_SECURE": "true"
      }
    }
  }
}

Why UV? The uv package manager handles Python dependencies cleanly without polluting your global environment. It downloads and caches the mcp-clickhouse package automatically on first run.

2. Get Your ClickHouse Connection Details

From your ClickHouse Cloud console:

Host: Your instance hostname (e.g., abc123.us-east-2.aws.clickhouse.cloud)
Port: 8443 for HTTPS (important: this is the HTTP interface port, not the native protocol port 9440)
User: Typically default
Password: Your ClickHouse password
Database: The database containing your logs (e.g., o11y)

Important: Use port 8443 (HTTPS), not 9440 (native protocol). The MCP server uses ClickHouse’s HTTP interface, which requires the HTTPS port on ClickHouse Cloud.

3. Restart Claude Code

The MCP server loads on startup. After updating ~/.claude.json, restart Claude Code to pick up the new configuration.

Tip: Run /mcp in Claude Code to see the status of configured MCP servers. This command helps debug connection issues by showing which servers are active, any error messages, and available tools.

4. Test the Connection

Try asking Claude:

“Show me the schema for the lambda_logs table”

“What Lambda functions do I have logs for in the last 24 hours?”

Claude will now have tools like:

clickhouse_query - Execute SELECT queries
clickhouse_execute - Run DDL/DML statements
clickhouse_ping - Test connection

Benefits of MCP Integration:

Natural language queries: Ask questions without writing SQL
Schema discovery: Claude can explore what fields exist in your JSON data
Contextual analysis: Claude can suggest queries based on your data structure
Multi-step analysis: Ask follow-up questions and refine queries
Cost insights: Ask “What’s costing me the most?” and get actionable answers

Example interactions:

You: "Show me high-value ROI calculations that didn't convert to leads"

Claude: [Queries lambda_logs for calculation_tracked events with high savings
         that don't have matching roi_generated events from the same IP]

You: "What's the cold start rate for roi-calculator this week?"

Claude: [Calculates percentage of invocations with init_duration_ms > 0]

You: "Find the slowest Lambda invocations today"

Claude: [Queries REPORT lines ordered by duration_ms DESC]

This transforms your logs into a conversational analytics platform. The JSON type makes it seamless - Claude can discover fields dynamically and generate clean queries without hardcoded field names.

Real Example: Cold Start Analysis

Here’s an actual Claude Code session querying Lambda cold start metrics:

Claude Code querying cold start rate

This question, “Show me the cold start rate,” isn’t something AWS CloudWatch answers directly. CloudWatch gives you invocation counts, duration, and errors as built-in metrics, but cold start rate requires querying the logs, finding REPORT lines with Init Duration present, and calculating the percentage yourself.

With MCP, you just ask. Claude generates the SQL, runs it against ClickHouse, and formats the results. The P90 init duration (50.48ms in this case) tells you what your users actually experience during cold starts, not just that cold starts happened.

Teaching Claude Your Schema

MCP gives Claude access to ClickHouse, but Claude doesn’t automatically know your schema or application-specific JSON structure. For better results, create a Claude skill file that documents:

Database and table names: Which database to query (o11y.lambda_logs)
Column definitions: What each column contains and its type
JSON field schemas: Your application’s event types and their fields
Common query patterns: Examples for performance metrics, cold starts, business analytics

See the complete example: lambda-analytics.skill.md

Place this Claude skill file in ~/.claude/skills/<skill-name>/SKILL.md or reference it in your project’s CLAUDE.md. Claude Code will use it to generate accurate queries for your specific data model.

Tradeoffs and Considerations

Claude Isn’t Magic:

Claude can discover your schema but doesn’t inherently know what your data means
It won’t understand that json_data.observabilitySpend represents annual cost without context
Business logic, thresholds, and domain knowledge need to be documented
The Claude skill file becomes a shared library for your SRE team to ensure consistent interpretation
Without it, each person asking questions gets different (possibly wrong) assumptions

Export Latency:

CloudWatch Logs export to S3 runs on schedule (hourly/daily)
Not real-time, but sufficient for analytics and debugging

S3Queue Configuration:

Unordered mode: Handles late arrivals and complex path structures
- CloudWatch can delay log delivery up to 24 hours
- Multiple functions create overlapping prefixes (lambda/func-a/..., lambda/func-b/...)
- Unordered mode tracks each file individually, won’t skip late arrivals
Polling interval: 5-10 minutes for hourly exports
- Balances latency with S3 LIST API costs (~$0.04/month)
- Adjust based on your export frequency (more frequent exports → faster polling)
Ordered mode pitfall: Don’t use with multi-function setup
- Processes files lexicographically, can skip late-arriving files
- Only safe for single-function, predictable export schedules

Storage Costs:

Lambda logs are verbose (every log line is a row)
Use the 90-day TTL to control costs
Use Lambda’s new ability to stream directly to S3.

Query Performance:

Daily partitioning enables fast date range queries
request_id in ORDER BY enables efficient request tracing
Use materialized columns for common filters

JSON Type Considerations:

Requires ClickHouse 23.1+ for stable JSON type support
Dynamic schema grows as new JSON fields appear
Default limit of 1024 unique paths per JSON column (max_dynamic_paths); beyond this, paths go into less efficient shared storage
Field names are case-sensitive (observabilitySpend not observabilityspend)
NULL values for fields that don’t exist in specific rows (expected behavior)
Consider adding indexes on frequently queried JSON fields if needed

Wrapping Up

Did we meet our goal of not doubling costs? Not exactly.

CloudFront analytics from part 1 cost $275/month. Adding Lambda logs brings us to ~$1,120/month. That’s 4x, not 2x. But the cost breakdown tells a different story:

CloudWatch ingestion: $833/month (unavoidable with CloudWatch export approach)
Incremental analytics cost: ~$12/month (S3 + ClickHouse storage)

The analytics infrastructure barely registers. CloudWatch ingestion dominates. This is why we discussed alternatives: Lambda’s direct S3 streaming ($0.25/GB) or Kinesis ($0.12/GB) can cut that $833 down to $417 or $200 respectively. The logging path you choose matters far more than the analytics backend.

What you get for that cost:

Request-level tracing across all log lines
Performance percentiles (duration, memory, cold starts)
Cold start monitoring and analysis
JSON type for dynamic schema and simple queries
MCP integration for conversational queries
Sub-second queries on millions of log lines

The power here is SQL and the JSON type. You can ask any question of your data. The JSON type makes it simple to query polymorphic event data. And with MCP integration, Claude can explore your logs conversationally without you writing SQL.

One observation: AI-assisted observability isn’t a magical fix. Claude can correlate latency spikes and parse REPORT lines all day. That’s the commodity layer we all share. But it won’t understand that json_data.observabilitySpend represents your customer’s annual budget, or that a calculation_tracked event without a following roi_generated event means a lost lead. That’s your business domain, your architectural decisions, your telemetry schema.

This problem isn’t new. Most teams didn’t rigorously document their runbooks and telemetry before AI either. Everyone is hoping AI will solve this, or at least paper over it. But AI isn’t a substitute for good engineering practices. It will amplify what you have, good or bad.

The Claude skill file isn’t optional documentation. It’s the foundation that makes AI tooling useful instead of just impressive demos on standardized data.

Need help building your observability stack? Contact me today!

Resources:

AI Log Analytics with ClickHouse: Tracing Requests and Extracting JSON

AI Log Analytics with ClickHouse: Tracing Requests and Extracting JSON

Why Lambda Logs Matter

SMART Goals: Lambda Log Analytics

Architecture

Costs

Understanding Lambda Log Format

Implementation

Enable CloudWatch Logs Export to S3

Reuse IAM Roles

Create Lambda Log Schema

Testing and Debugging

Querying Lambda Logs

Request Tracing

Multi-Function Analysis

Performance Monitoring

Cold Start Analysis

Container Reuse and Concurrency Patterns

JSON Data Extraction

HTTP Endpoint Analysis

Error Rate Monitoring

Visualization

Advanced: Combining CloudFront and Lambda Logs

MCP Integration: AI-Powered Log Analysis

Setting Up MCP with Claude Code

Real Example: Cold Start Analysis

Teaching Claude Your Schema

Tradeoffs and Considerations

Wrapping Up

Tags

Categories