Skip to content

Overview

The monitoring stack runs in the mon namespace and provides metrics, logs, and alerting for both the Kubernetes cluster and the Synology NAS.

Components

Component Purpose Ingress
Prometheus Metrics collection and short-term storage prometheus.hdhomelab.com
Thanos Long-term metrics storage, query federation thanos.hdhomelab.com
SNMP Exporter Synology NAS metrics via SNMPv3
Component Purpose Ingress
Loki Log aggregation
Fluent-bit Log shipping from K8s nodes (DaemonSet)
Promtail Log shipping from NAS Docker containers
Component Purpose Ingress
Grafana Dashboards and alerting grafana.hdhomelab.com

Architecture

graph LR
  subgraph Sources
    K8s[K8s targets\nServiceMonitors]
    NAS[NAS\nnode-exporter · cAdvisor\nTraefik · Watchtower]
    SNMP[Synology\nSNMP]
    NASLogs[NAS Docker\ncontainers]
    K8sLogs[K8s containers\n+ Talos system logs]
  end

  subgraph Metrics
    Prom[Prometheus]
    Thanos[Thanos]
    Minio[(MinIO\nS3)]
  end

  subgraph Logs
    Fluent[Fluent-bit]
    Promtail[Promtail]
    Loki[Loki]
    LokiMinio[(MinIO\nS3)]
  end

  Grafana[Grafana]

  K8s --> Prom
  NAS --> Prom
  SNMP --> Prom
  Prom --> Thanos
  Thanos <--> Minio
  Thanos --> Grafana

  K8sLogs -->|tail + TCP| Fluent
  NASLogs --> Promtail
  Fluent --> Loki
  Promtail --> Loki
  Loki <--> LokiMinio
  Loki --> Grafana
Hold "Alt" / "Option" to enable pan & zoom

Retention

Data Store Retention
Prometheus metrics local-path PVC 3 days
Thanos raw MinIO (S3) 10 days
Thanos 5m downsampled MinIO (S3) 90 days
Thanos 1h downsampled MinIO (S3) 10 years
Loki logs MinIO (S3) 28 days

Long-range queries

For queries beyond 3 days, use the Thanos datasource in Grafana (thanos-001) rather than Prometheus. Thanos Query automatically merges recent data from the Prometheus sidecar with historical blocks from MinIO.