Skip to content

Claude Code Monitoring

Claude Code exports telemetry via OpenTelemetry (OTLP) to Prometheus and Loki, visualized in a dedicated Grafana dashboard.

Architecture

graph LR
  CC[Claude Code CLI]
  CC -->|http/protobuf OTLP metrics| Prom[Prometheus\nOTLP receiver]
  CC -->|http/protobuf OTLP logs| Loki[Loki]
  Prom --> Grafana[Grafana\nClaude Code dashboard]
  Loki --> Grafana
Hold "Alt" / "Option" to enable pan & zoom

Configuration

Telemetry is configured in ~/.claude/settings.json (managed in the dotfiles repo):

~/.claude/settings.json
{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_METRICS_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_METRICS_ENDPOINT": "https://prometheus.hdhomelab.com/api/v1/otlp/v1/metrics",
    "OTEL_EXPORTER_OTLP_LOGS_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_LOGS_ENDPOINT": "https://loki.hdhomelab.com/otlp/v1/logs",
    "OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE": "cumulative"
  }
}

Signal-specific endpoint path

OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is used as-is — the SDK does not append /v1/metrics. The full path must be specified. The general OTEL_EXPORTER_OTLP_ENDPOINT auto-appends signal paths, but signal-specific vars do not.

Key settings

Variable Value Notes
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE cumulative Required for Prometheus _total counters
OTEL_METRIC_EXPORT_INTERVAL 60000ms (default) Set to 10000 temporarily when debugging

Prometheus OTLP receiver

The native OTLP receiver is enabled in flux/monitoring/noah/kube-prometheus-stack/helmrelease.yaml:

prometheus:
  prometheusSpec:
    additionalArgs:
      - name: web.enable-otlp-receiver
        value: ""

Metrics are received at https://prometheus.hdhomelab.com/api/v1/otlp/v1/metrics.

Metrics

Metric Description
claude_code_cost_usage_USD_total API cost in USD, labeled by model
claude_code_token_usage_tokens_total Tokens used, labeled by type (input/output/cacheRead/cacheCreation)
claude_code_session_count_total CLI sessions started
claude_code_active_time_seconds_total Active time, labeled by type (user/cli)
claude_code_lines_of_code_count_total Lines of code, labeled by type (added/removed)
claude_code_commit_count_total Git commits created via Claude
claude_code_pull_request_count_total Pull requests opened via Claude
claude_code_code_edit_tool_decision_total Edit tool decisions, labeled by decision and tool_name

Note

commit_count and pull_request_count only emit when Claude Code actually runs git commit or opens a PR via the Bash tool. They will not appear in Prometheus until that occurs.

Grafana dashboard

The dashboard is stored at flux/monitoring/noah/grafana-dashboards/claude-code.json and served from the claude-code ConfigMap with the grafana_dashboard: "1" label.

PromQL notes

Count panels (sessions, commits, PRs, lines of code) use max_over_time - min_over_time rather than increase():

sum(max_over_time(claude_code_commit_count_total[$__range]))
- sum(min_over_time(claude_code_commit_count_total[$__range]))

increase() extrapolates at time range boundaries and returns fractional values even for integer counters. The max - min approach reads exact counter values and avoids extrapolation entirely.

The token rate panel uses a fixed 5-minute window:

sum by (type)(rate(claude_code_token_usage_tokens_total[5m]))

5m is ~4× the 60s export interval — the safe minimum for rate() to always have at least two data points. $__rate_interval is not used because Grafana cannot determine a scrape interval for OTLP push metrics.