ClickHouse
The CLICKHOUSE layer monitors ClickHouse database clusters. SkyWalking collects ClickHouse’s internal metrics — queries, query latency, merges and mutations, data parts, replication, ZooKeeper / Keeper coordination, and per-node host stats — through OpenTelemetry, and Horizon renders them as a cluster-level and a node-level dashboard. This is a metrics-only layer: there are no traces, logs, endpoints, or a topology map for ClickHouse.
In Horizon’s sidebar this layer is grouped under Databases and named ClickHouse. Its services are listed as ClickHouse clusters and its instances as Nodes. Because the layer carries no endpoint, topology, trace, or log data, it enables only two sub-tabs: a Service (cluster) dashboard and an Instance (node) dashboard.
This page is the operator reference for the bundled CLICKHOUSE dashboard: what you see on each scope and what each widget means.
The widgets and metrics below are read from the bundled CLICKHOUSE template; if an operator has published a customized CLICKHOUSE template to OAP, the live dashboard reflects that copy instead. See Layer Dashboard Templates for how the bundled default, your local draft, and the OAP-published copy relate.
Service list
Before opening a cluster, the layer landing page lists every ClickHouse cluster with four sortable columns, sorted by select rate (Select / s) by default:
-
Select / s —
SELECTqueries per second across the cluster (aggregate_labels(meter_clickhouse_query_select_rate,sum)). -
Insert / s —
INSERTqueries per second across the cluster (aggregate_labels(meter_clickhouse_query_insert_rate,sum)). -
Slow Reads — slow file reads across the cluster (
aggregate_labels(meter_clickhouse_query_slow,sum)). -
Open Files — the latest count of open files across the cluster (
latest(aggregate_labels(meter_clickhouse_file_open,sum))).
Service dashboard
The cluster-level drill-down for one selected ClickHouse cluster. Every widget aggregates across the cluster’s nodes with aggregate_labels(...,sum).
-
Files Open — the latest number of open files in the cluster, as a single card (
latest(aggregate_labels(meter_clickhouse_file_open,sum))). -
QPS — query rate per second, plotted as two series:
select(aggregate_labels(meter_clickhouse_query_select_rate,sum)) andinsert(aggregate_labels(meter_clickhouse_query_insert_rate,sum)). -
Queries — query counts split into
total,select, andinsert(aggregate_labels(meter_clickhouse_query,sum),aggregate_labels(meter_clickhouse_query_select,sum),aggregate_labels(meter_clickhouse_query_insert,sum)). -
Query Time (ms) — average time per query in ms, as
avg,select, andinsertseries, each computed as total query microseconds divided by query count and converted to ms (aggregate_labels(meter_clickhouse_querytime_microseconds,sum)/aggregate_labels(meter_clickhouse_query,sum)/1000and the matching_select_/_insert_pair). -
Connections — open client connections by protocol:
TCP(aggregate_labels(meter_clickhouse_tcp_connections,sum)) andHTTP(aggregate_labels(meter_clickhouse_http_connections,sum)). -
Slow Reads — slow file reads across the cluster (
aggregate_labels(meter_clickhouse_query_slow,sum)). -
Merge / Mutations — background
mergeoperations (aggregate_labels(meter_clickhouse_background_merge,sum)) andmutations(aggregate_labels(meter_clickhouse_mutations,sum)). -
Insert Throughput — insert volume on a dual axis:
bytes/son the left (aggregate_labels(meter_clickhouse_inserted_bytes,sum)) androws/son the right (aggregate_labels(meter_clickhouse_inserted_rows,sum)). -
Delayed Inserts (s) — inserts that were throttled / delayed (
aggregate_labels(meter_clickhouse_delayed_inserts,sum)). -
Active Data Parts — the number of active MergeTree data parts in the cluster (
aggregate_labels(meter_clickhouse_parts_active,sum)). -
Replicated Fetch / Send — replication traffic between replicas:
fetch(aggregate_labels(meter_clickhouse_replicated_fetch,sum)) andsend(aggregate_labels(meter_clickhouse_replicated_send,sum)). -
Zookeeper Activity — the coordination layer’s health, with the latest
sessionsandwatches(latest(aggregate_labels(meter_clickhouse_zookeeper_session,sum)),latest(aggregate_labels(meter_clickhouse_zookeeper_watch,sum))) plusbytes sentandbytes recvover time (aggregate_labels(meter_clickhouse_zookeeper_bytes_sent,sum),aggregate_labels(meter_clickhouse_zookeeper_bytes_received,sum)). -
Keeper Alive Conns — the latest count of alive ClickHouse Keeper connections, as a single card (
latest(aggregate_labels(meter_clickhouse_keeper_connections_alive,sum))). -
Keeper Outstanding Requests — the latest count of outstanding ClickHouse Keeper requests, as a single card (
latest(aggregate_labels(meter_clickhouse_keeper_outstanding_requests,sum))).
Instance dashboard
The node-level drill-down for one selected ClickHouse node. These widgets read the per-node meter_clickhouse_instance_* family directly, with no cross-node aggregation.
-
Uptime (days) — how long the node has been running, as a card, converted from seconds (
latest(meter_clickhouse_instance_uptime)/3600/24). -
Version — the node’s ClickHouse version, as a card (
latest(meter_clickhouse_instance_version)). -
CPU (cores) — CPU consumption expressed in cores (
meter_clickhouse_instance_cpu_usage/1000000). -
Memory (%) —
usedvsavailablememory percentage (meter_clickhouse_instance_memory_usage,meter_clickhouse_instance_memory_available). -
Network (B) — bytes
receivevssendon the node (meter_clickhouse_instance_network_receive_bytes,meter_clickhouse_instance_network_send_bytes). -
Connections — open client connections by protocol:
TCP(meter_clickhouse_instance_tcp_connections) andHTTP(meter_clickhouse_instance_http_connections). -
Queries — query counts split into
total,select, andinsert(meter_clickhouse_instance_query,meter_clickhouse_instance_query_select,meter_clickhouse_instance_query_insert). -
QPS — query rate per second, as
selectandinsertseries (meter_clickhouse_instance_query_select_rate,meter_clickhouse_instance_query_insert_rate). -
Query Time (ms) — average time per query in ms, as
avg,select, andinsertseries, each total query microseconds divided by query count and converted to ms (meter_clickhouse_instance_querytime_microseconds/meter_clickhouse_instance_query/1000and the matching_select_/_insert_pair). -
File Slow Read — slow file reads on the node (
meter_clickhouse_instance_query_slow). -
Background Merge — background merge operations on the node (
meter_clickhouse_instance_background_merge). -
Mutations — mutation operations on the node (
meter_clickhouse_instance_mutations). -
Files Open — the latest number of open files on the node, as a card (
latest(meter_clickhouse_instance_file_open)). -
Insert Throughput — insert volume on a dual axis:
bytes/son the left (meter_clickhouse_instance_inserted_bytes) androws/son the right (meter_clickhouse_instance_inserted_rows). -
Delayed Inserts (s) — inserts that were throttled / delayed on the node (
meter_clickhouse_instance_delayed_inserts).
Requirements
The CLICKHOUSE dashboard is a pure consumer of what OAP reports — it invents no data, and a widget with no backing data simply reads no data. To populate it, OAP needs ClickHouse metrics delivered through the OpenTelemetry receiver, which OAP aggregates into the meter_clickhouse_* families:
-
Cluster (service-scope) metrics — the
meter_clickhouse_*family (queries, query rate, query time, connections, slow reads, merges, mutations, data parts, insert throughput, delayed inserts, replication, ZooKeeper, and Keeper), which the cluster dashboard and the Service list aggregate across nodes. -
Node (instance-scope) metrics — the
meter_clickhouse_instance_*family (uptime, version, CPU, memory, network, connections, queries, query rate, query time, slow reads, merges, mutations, open files, and insert throughput) for the node dashboard.
Each metric is queried at its own OAP scope; OAP does not roll a metric up across scopes, so the node-scope widgets stay empty until per-node data is reported. Setting up the ClickHouse OpenTelemetry collection is described in the upstream ClickHouse monitoring guide.