Cluster Status & Metadata
Path: /operate/cluster. Verb: cluster:read (granted by maintainer, operator, admin).
This is the operator’s single pane for “is the OAP backend wired correctly?”. It surfaces:
- Live health of the two OAP ports.
- Required-module state for the four admin selectors.
- Cluster member discovery (OAP nodes behind the configured URL).
- A planned strip for upcoming metadata views (per-node module activity, storage health, receiver activity, effective config, TTL grid).
The triage flow during a banner-heavy incident lives here. The full check sequence is documented in Compatibility → Cluster Status Check Sequence; this page focuses on what the operator sees and does.
Page anatomy
┌─────────────────────────────────────────────────────────────────┐
│ Cluster Status │
├─────────────────────────────────────────────────────────────────┤
│ Query (:12800) │ Admin (:17128) │
│ ───────────────────── │ ───────────────────── │
│ Reachable ✓ │ Reachable ✓ │
│ Version 11.0.0 │ admin-server ✓ │
│ Timezone +0800 │ receiver-runtime-rule ✓ │
│ Timestamp 2026-05-18 09:14:02 │ dsl-debugging ✓ │
│ Health 0 (OK) │ inspect ✓ │
├─────────────────────────────────────────────────────────────────┤
│ Cluster members │
│ ──────────────── │
│ Host Port Role Heartbeat │
│ oap-1.example.com 12800 receiver 2s ago │
│ oap-2.example.com 12800 receiver 1s ago │
│ oap-3.example.com 12800 receiver 3s ago │
├─────────────────────────────────────────────────────────────────┤
│ Coming soon (Phase 6 / 7): │
│ • Per-node module activity matrix │
│ • Storage backend health (BanyanDB / ES / JDBC) │
│ • Receiver activity (gRPC / HTTP / Kafka / OTLP) │
│ • Effective-configuration tree (two-node diff) │
│ • TTL & retention grid │
└─────────────────────────────────────────────────────────────────┘
Live health (top row)
Two independent panes, refreshed in parallel.
Query pane
- Source:
GET /api/oap/info→ BFF GraphQL call to OAP. - Refresh: 30 s.
- Fields: reachable, version, server timezone (UTC offset), server timestamp, health score.
- Failure modes: see Cluster Status Check Sequence.
Admin pane
- Source:
GET /api/preflight→ BFF call to OAP/debugging/config/dump. - Refresh: 60 s.
- Per-module rows:
admin-server,receiver-runtime-rule,dsl-debugging,inspect. - Each row carries a tooltip with the env-var needed to enable it (e.g.
SW_INSPECT=default). - Failure modes: see Cluster Status Check Sequence.
Cluster members
- Source:
GET <queryUrl>/status/cluster/nodes(status client,packages/api-client/src/status.ts). - Returns: per-node host, port, role, last heartbeat.
- Refresh: Same cadence as the Query pane.
This is informational only. A 3-node OAP cluster behind one DNS name should show three rows. One row means the DNS resolves to a single IP (intentional in some deployments). Zero rows means OAP did not return cluster data — usually a single-node deployment without cluster module enabled, which is fine.
The cluster-members section is not required for Horizon to function; it is a sanity check that the operator’s expectation matches reality.
Coming soon strip
The page documents upcoming additions inline. These are not implemented today; the strip is there so operators know what’s on the roadmap and what is and isn’t currently surfaced:
- Per-node module activity matrix — module × provider × node grid. Requires per-node admin calls (currently the dump is consumed cluster-wide).
- Storage backend health — BanyanDB / Elasticsearch / JDBC: connection pool, index lag, throughput.
- Receiver activity — gRPC / HTTP / Kafka / OTLP: throughput, queue depth.
- Effective-configuration tree — two-node diff of merged config (advanced troubleshooting).
- TTL & retention grid — hot / warm / cold storage timeline per metric scope.
When these land, this page is where they will surface; the data flow will follow the same pattern as today’s panes (BFF preflight call → cached → polled).
Reading the page during an incident
- Both panes green? Backend is fine; the problem is elsewhere (network from browser, BFF process, OAP-side data ingestion).
- Query pane red? OAP itself is unreachable. Check the OAP process, the query port, network ACLs. Nothing in Horizon can proceed without this pane green.
- Query green, Admin red? OAP is up but the admin port is not reachable. Likely causes: admin port not exposed by your Kubernetes Service, firewall rule, OAP missing
SW_ADMIN_SERVER=default. Operate-section features (Cluster, Inspect, DSL Management, Live Debugger) are unavailable until fixed. - Admin pane mostly green but one yellow? That feature is degraded — e.g.,
dsl-debuggingoff means the Live Debugger doesn’t work. Set the corresponding env-var on OAP and restart. - Query pane shows health > 0? OAP is up but degraded. The pane shows the score;
detailsfromcheckHealth(also visible) names the degraded subsystem (storage lag, receiver backlog). - Cluster members count off? Either DNS / Service is misconfigured, or an OAP node is down. The Cluster Status page does not page or alert — it shows. Wire your own alerting on
/api/preflightand/api/oap/infoif you want notifications.
Related
- Compatibility → Cluster Status Check Sequence — per-pane behavior.
- Compatibility → Required OAP Modules — which modules gate which features.
- Operate → Inspect — the page that depends on the
inspectmodule.