FODC Setup: Proxy APIs and CLI Flags

This guide documents the FODC proxy HTTP APIs and the CLI flags for the FODC agent and proxy binaries. It complements the high-level overview in docs/operation/fodc/overview.md.

Proxy HTTP APIs

The proxy exposes HTTP endpoints on --http-listen-addr (default :17913). Assume the base URL is http://<proxy-host>:17913 in the examples below.

GET /metrics

Aggregates the latest metrics from all connected agents and renders them in Prometheus text format.

Query parameters

  • role (optional): Filters metrics whose node_role label matches the specified role.
  • pod_name (optional): Filters metrics whose pod_name label matches the specified pod.

Response

  • Content-Type: text/plain; version=0.0.4; charset=utf-8
  • Prometheus exposition format.
  • Labels include agent identity labels such as node_role, pod_name, and container_name when provided by the agent.

Example

GET http://localhost:17913/metrics
GET http://localhost:17913/metrics?role=ROLE_DATA
GET http://localhost:17913/metrics?pod_name=banyandb-data-0

GET /metrics-windows

Returns metrics within a time window as JSON time series. If the time window is omitted, the proxy returns the latest metrics, similar to /metrics, but formatted as JSON.

Query parameters

  • start_time (optional): RFC3339 timestamp for the beginning of the time window.
  • end_time (optional): RFC3339 timestamp for the end of the time window.
  • role (optional): Filters metrics whose node_role label matches the specified role.
  • pod_name (optional): Filters metrics whose pod_name label matches the specified pod.

Response

  • Content-Type: application/json
  • Body shape (array of time series):
[
	{
		"name": "banyandb_query_latency_ms",
		"description": "Average query latency",
		"labels": {
			"node_role": "ROLE_DATA",
			"pod_name": "banyandb-data-cold-0",
			"container_name": "banyandb"
		},
		"agent_id": "9b3f44a8-acde-4f7c-a9f9-0b4b4581fd12",
		"pod_name": "banyandb-data-cold-0",
		"data": [
			{
				"timestamp": "2026-02-02T09:59:00Z",
				"value": 12.4
			},
			{
				"timestamp": "2026-02-02T10:00:00Z",
				"value": 10.1
			}
		]
	}
]

Example

GET http://localhost:17913/metrics-windows
GET http://localhost:17913/metrics-windows?start_time=2026-02-02T09:55:00Z&end_time=2026-02-02T10:00:00Z
GET http://localhost:17913/metrics-windows?role=ROLE_DATA&pod_name=banyandb-data-0

GET /cluster/topology

Requests cluster topology snapshots from all agents and returns the aggregated topology as JSON. The proxy triggers a topology collection across all registered agents and merges the responses.

Response

  • Content-Type: application/json
  • Body shape:
{
	"nodes": [
		{
      "metadata": {
        "name": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912"
      },
      "grpc_address": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912",
      "created_at": {
        "seconds": 1769671048,
        "nanos": 362947026
      },
      "labels": {
        "pod_name": "demo-banyandb-data-hot-0",
        "type": "hot"
      },
      "property_repair_gossip_grpc_address": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17932",
      "status": "online",
      "last_heartbeat": "2026-02-02T17:11:35.027349+08:00",
      "roles": [
        "ROLE_META",
        "ROLE_DATA"
      ]
    },
    {
      "metadata": {
        "name": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912"
      },
      "grpc_address": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912",
      "created_at": {
        "seconds": 1769671048,
        "nanos": 362947026
      },
      "labels": {
        "pod_name": "demo-banyandb-data-hot-0",
        "type": "hot"
      },
      "property_repair_gossip_grpc_address": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17932",
      "status": "online",
      "last_heartbeat": "2026-02-02T17:11:35.027349+08:00",
      "roles": [
        "ROLE_META",
        "ROLE_DATA"
      ]
    }
	],
	"calls": [
		{
      "id": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912-demo-banyandb-data-hot-1.demo-banyandb-data-hot-headless.skywalking-showcase:17912",
      "target": "demo-banyandb-data-hot-1.demo-banyandb-data-hot-headless.skywalking-showcase:17912",
      "source": "demo-banyandb-data-hot-0.demo-banyandb-data-hot-headless.skywalking-showcase:17912"
    }
	]
}

Field notes

  • nodes entries are derived from banyandb.database.v1.Node plus:
    • roles: role names converted to strings.
    • status / last_heartbeat: online status from the agent registry.
  • calls entries describe the node-to-node call graph reported by agents.

GET /diagnostics

Requests crash diagnostic records from all connected agents and returns aggregated records as JSON. The proxy triggers a fresh collection from all matching agents, waits up to 2 seconds for records to arrive, then returns the current cached snapshot. Records are deduplicated by agent and artifact directory across requests.

Query parameters

  • role (optional): Filters records whose agent role matches the specified value (case-insensitive).
  • pod_name (optional): Filters records whose pod name matches the specified value (case-insensitive).

Response

  • Content-Type: application/json
  • Body shape (array of crash records):
[
	{
		"fetched_at": "2026-04-20T10:00:00Z",
		"panic_record": {
			"occurred_at": "2026-04-20T09:59:30Z",
			"component": "watchdog",
			"panic_value": "unexpected nil pointer",
			"recovered": true,
			"goroutine_stack": "goroutine 42 [running]:\nmain.foo(...)\n\t/src/main.go:17"
		},
		"agent_id": "9b3f44a8-acde-4f7c-a9f9-0b4b4581fd12",
		"pod_name": "banyandb-data-0",
		"role": "ROLE_DATA",
		"source_endpoint": "file:///crash",
		"artifact_dir": "20260420T095930.000000000Z-watchdog-1234",
		"files": ["panic.json", "deep-dump.json"]
	}
]

Field notes

  • panic_record is omitted when no structured panic record was captured. Filesystem-backed crash artifacts derive this record from panic.json; in-process reports can include richer fields.
  • goroutine_stack is the stack trace of the panicking goroutine at the time of capture.
  • source_endpoint is file:/// for filesystem-watched artifacts or fodc-agent for in-process captures.
  • artifact_dir is the name of the crash artifact directory relative to the watched crash source directory.
  • files lists the files present in the artifact directory (e.g., panic.json, deep-dump.json).

Example

GET http://localhost:17913/diagnostics
GET http://localhost:17913/diagnostics?role=ROLE_DATA
GET http://localhost:17913/diagnostics?pod_name=banyandb-data-0

GET /cluster/lifecycle

Requests lifecycle data from all connected agents and returns aggregated lifecycle reports and group information as JSON. The proxy triggers lifecycle data collection from all agents that support the lifecycle stream.

Response

  • Content-Type: application/json
  • Body shape:
{
	"groups": [
		{
			"name": "sw_metric",
			"catalog": "CATALOG_MEASURE",
			"resource_opts": {
				"shard_num": 2,
				"segment_interval": {"unit": "UNIT_DAY", "num": 1},
				"ttl": {"unit": "UNIT_DAY", "num": 7}
			},
			"data_info": []
		}
	],
	"lifecycle_statuses": [
		{
			"pod_name": "banyandb-data-0",
			"reports": [
				{
					"filename": "2026-03-26.json",
					"report_json": "{\"status\":\"ok\"}"
				}
			]
		}
	]
}

Field notes

  • groups: Group lifecycle information collected from the first agent that provides it (typically the liaison node). Each entry contains the group name, catalog type, resource options, and data info.
  • lifecycle_statuses: Per-pod lifecycle reports. Each entry contains the pod name and an array of report files (JSON files read from the agent’s lifecycle report directory).
  • Agents that do not support the lifecycle stream are silently skipped.

Example

GET http://localhost:17913/cluster/lifecycle

FODC Agent CLI Flags

The agent binary is fodc (see fodc/agent/cmd/agent).

Flag Default Description
--poll-metrics-interval 10s Interval for scraping local BanyanDB metrics endpoints.
--poll-metrics-ports 2121 Ports to scrape for /metrics (repeatable or comma-separated).
--max-metrics-memory-usage-percentage 10 Maximum percentage of cgroup memory used for in-memory metric cache.
--prometheus-listen-addr :9090 Address for the agent’s Prometheus endpoint.
--proxy-addr localhost:17900 Proxy gRPC address for agent registration and streaming.
--pod-name empty Pod name used for agent identity; required for proxy registration.
--container-names empty Container names mapped one-to-one with --poll-metrics-ports.
--heartbeat-interval 10s Heartbeat interval to the proxy; proxy may override on registration.
--reconnect-interval 5s Backoff between reconnection attempts to the proxy.
--cluster-state-ports empty gRPC ports for BanyanDB cluster state polling; enables topology collection.
--cluster-state-poll-interval 30s Interval for polling cluster state from BanyanDB nodes.
--lifecycle-port 18912 gRPC port for lifecycle InspectAll service. Set to 0 to disable lifecycle collection.
--lifecycle-report-dir /tmp/lifecycle-reports Directory where lifecycle sidecar writes report files.
--lifecycle-cache-ttl 10m TTL for cached lifecycle data. After expiry, the next collection call refreshes the cache.
--diagnosis-listen-addr :9091 Address on which the agent exposes its local diagnosis collection HTTP endpoint.
--max-fodc-diagnosis-memory-usage-percentage 5 Maximum percentage of cgroup memory used for the agent crash-diagnosis collector ring buffer.
--crash-source-dir empty Shared volume directory watched for BanyanDB crash artifacts via filesystem notifications.
--panic-diagnostics-enabled true Enable structured panic diagnostics.
--panic-diagnostics-dir crash Directory used to store recovered panic artifacts.
--panic-diagnostics-max-artifacts 10 Maximum number of crash artifact directories to retain; oldest are removed first (0 disables pruning).
--max-diagnosis-memory-usage-percentage 50 Set GOMEMLIMIT to this percentage of the cgroup memory limit, reserving headroom for post-panic diagnostics (0 disables).

Behavior notes

  • The proxy client starts only when --proxy-addr, --pod-name, and a node role are available. A node role is derived from cluster state polling, so set --cluster-state-ports for full registration.
  • --container-names must match the number of entries in --poll-metrics-ports.

FODC Proxy CLI Flags

The proxy binary is fodc-proxy (see fodc/proxy/cmd/proxy).

Flag Default Description
--grpc-listen-addr :17912 gRPC address for agent connections.
--http-listen-addr :17913 HTTP address for REST/Prometheus endpoints.
--agent-heartbeat-timeout 30s Mark agents offline if no heartbeat is received within this duration.
--agent-cleanup-timeout 5m Unregister agents that remain offline beyond this duration.
--max-agents 1000 Maximum number of agents that can register.
--grpc-max-msg-size 4194304 Maximum gRPC message size in bytes.
--http-read-timeout 10s HTTP server read timeout.
--http-write-timeout 10s HTTP server write timeout.
--heartbeat-interval 10s Default heartbeat interval communicated to agents on registration.