VMs monitoring

SkyWalking leverages Prometheus node-exporter for collecting metrics data from the VMs, and leverages OpenTelemetry Collector to transfer the metrics to OpenTelemetry receiver and into the Meter System.
We defined the VM entity as a Service in OAP, use vm:: as a prefix to identify.

Data flow

  1. Prometheus node-exporter collects metrics data from the VMs.
  2. OpenTelemetry Collector fetches metrics from node-exporter via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter.
  3. The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results.

Setup

  1. Setup Prometheus node-exporter.
  2. Setup OpenTelemetry Collector . This is an example for OpenTelemetry Collector configuration otel-collector-config.yaml.
  3. Config SkyWalking OpenTelemetry receiver.

Supported Metrics

Monitoring Panel Unit Metric Name Description Data Source
CPU Usage % cpu_total_percentage The CPU cores total used percentage, if there are 2 cores the max usage is 200% Prometheus node-exporter
Memory RAM Usage MB meter_vm_memory_used The RAM total usage Prometheus node-exporter
Memory Swap Usage % meter_vm_memory_swap_percentage The swap memory used percentage Prometheus node-exporter
CPU Average Used % meter_vm_cpu_average_used The CPU cores used percentage in each mode Prometheus node-exporter
CPU Load meter_vm_cpu_load1
meter_vm_cpu_load5
meter_vm_cpu_load15
The CPU 1m / 5m / 15m average load Prometheus node-exporter
Memory RAM MB meter_vm_memory_total
meter_vm_memory_available
meter_vm_memory_used
The RAM statistics, include Total / Available / Used Prometheus node-exporter
Memory Swap MB meter_vm_memory_swap_free
meter_vm_memory_swap_total
The Swap Memory statistics, include Free / Total Prometheus node-exporter
File System Mountpoint Usage % meter_vm_filesystem_percentage The File System used percentage in each mount point Prometheus node-exporter
Disk R/W KB/s meter_vm_disk_read,meter_vm_disk_written The Disk read and written Prometheus node-exporter
Network Bandwidth Usage KB/s meter_vm_network_receive
meter_vm_network_transmit
The Network receive and transmit Prometheus node-exporter
Network Status meter_vm_tcp_curr_estab
meter_vm_tcp_tw
meter_vm_tcp_alloc
meter_vm_sockets_used
meter_vm_udp_inuse
The number of the TCP establish / TCP time wait / TCP allocated / Sockets inuse / UDP inuse Prometheus node-exporter
Filefd Allocated meter_vm_filefd_allocated The number of the File Descriptor allocated Prometheus node-exporter

Customizing

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are in /config/otel-oc-rules/vm.yaml.
The dashboard panel confirmations are in /config/ui-initialized-templates/vm.yml.

Blog

A related blog can see: SkyWalking 8.4 provides infrastructure monitoring

K8s monitoring

SkyWalking leverages K8s kube-state-metrics and cAdvisor for collecting metrics data from the K8s, and leverages OpenTelemetry Collector to transfer the metrics to OpenTelemetry receiver and into the Meter System. This feature requires authorizing the OAP Server to access K8s’s API Server.
We defined the k8s-cluster as a Service in OAP, use k8s-cluster:: as a prefix to identify.
Defined the k8s-node as an Instance in OAP, the name is k8s node name.
Defined the k8s-service as an Endpoint in OAP, the name is $serviceName.$namespace.

Data flow

  1. K8s kube-state-metrics and cAdvisor collects metrics data from the K8s.
  2. OpenTelemetry Collector fetches metrics from kube-state-metrics and cAdvisor via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter.
  3. The SkyWalking OAP Server access to K8s’s API Server gets meta info and parses the expression with MAL to filter/calculate/aggregate and store the results.

Setup

  1. Setup kube-state-metric.
  2. cAdvisor is integrated into kubelet by default.
  3. Setup OpenTelemetry Collector . Prometheus Receiver in OpenTelemetry Collector for K8s can reference here. For a quick start, we provided a full example for OpenTelemetry Collector configuration otel-collector-config.yaml.
  4. Config SkyWalking OpenTelemetry receiver.

Supported Metrics

From the different point of view to monitor the K8s, there are 3 kinds of metrics: Cluster / Node / Service

CLuster

These metrics are related to the selected cluster(Current Service in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Node Total k8s_cluster_node_total The number of the nodes K8s kube-state-metrics
Namespace Total k8s_cluster_namespace_total The number of the namespaces K8s kube-state-metrics
Deployment Total k8s_cluster_deployment_total The number of the deployments K8s kube-state-metrics
Service Total k8s_cluster_service_total The number of the services K8s kube-state-metrics
Pod Total k8s_cluster_pod_total The number of the pods K8s kube-state-metrics
Container Total k8s_cluster_container_total The number of the containers K8s kube-state-metrics
CPU Resources m k8s_cluster_cpu_cores
k8s_cluster_cpu_cores_requests
k8s_cluster_cpu_cores_limits
k8s_cluster_cpu_cores_allocatable
The capacity and the Requests / Limits / Allocatable of the CPU K8s kube-state-metrics
Memory Resources GB k8s_cluster_memory_total
k8s_cluster_memory_requests
k8s_cluster_memory_limits
k8s_cluster_memory_allocatable
The capacity and the Requests / Limits / Allocatable of the memory K8s kube-state-metrics
Storage Resources GB k8s_cluster_storage_total
k8s_cluster_storage_allocatable
The capacity and allocatable of the storage K8s kube-state-metrics
Node Status k8s_cluster_node_status The current status of the nodes K8s kube-state-metrics
Deployment Status k8s_cluster_deployment_status The current status of the deployment K8s kube-state-metrics
Deployment Spec Replicas k8s_cluster_deployment_spec_replicas The number of desired pods for a deployment K8s kube-state-metrics
Service Status k8s_cluster_service_pod_status The services current status, depending on the related pods' status K8s kube-state-metrics
Pod Status Not Running k8s_cluster_pod_status_not_running The pods which the current phase is not running K8s kube-state-metrics
Pod Status Waiting k8s_cluster_pod_status_waiting The pods and containers which currently in the waiting status, and show the reason K8s kube-state-metrics
Pod Status Terminated k8s_cluster_container_status_terminated The pods and containers which currently in the terminated status, and show the reason K8s kube-state-metrics

Node

These metrics are related to the selected node (Current Instance in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Pod Total k8s_node_pod_total The number of the pods which in this node K8s kube-state-metrics
Node Status k8s_node_node_status The current status of this node K8s kube-state-metrics
CPU Resources m k8s_node_cpu_cores
k8s_node_cpu_cores_allocatable
k8s_node_cpu_cores_requests
k8s_node_cpu_cores_limits
The capacity and the Requests / Limits / Allocatable of the CPU K8s kube-state-metrics
Memory Resources GB k8s_node_memory_total
k8s_node_memory_allocatable
k8s_node_memory_requests
k8s_node_memory_limits
The capacity and the Requests / Limits / Allocatable of the memory K8s kube-state-metrics
Storage Resources GB k8s_node_storage_total
k8s_node_storage_allocatable
The capacity and allocatable of the storage K8s kube-state-metrics
CPU Usage m k8s_node_cpu_usage The CPU cores total usage, if there are 2 cores the max usage is 2000m cAdvisor
Memory Usage GB k8s_node_memory_usage The memory total usage cAdvisor
Network I/O KB/s k8s_node_network_receive
k8s_node_network_transmit
The Network receive and transmit cAdvisor

Service

In these metrics, the pods are related to the selected service (Current Endpoint in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Service Pod Total k8s_service_pod_total The number of the pods K8s kube-state-metrics
Service Pod Status k8s_service_pod_status The current status of pods K8s kube-state-metrics
Service CPU Resources m k8s_service_cpu_cores_requests
k8s_service_cpu_cores_limits
The CPU resources Requests / Limits of this service K8s kube-state-metrics
Service Memory Resources MB k8s_service_memory_requests
k8s_service_memory_limits
The memory resources Requests / Limits of this service K8s kube-state-metrics
Pod CPU Usage m k8s_service_pod_cpu_usage The CPU resources total usage of pods cAdvisor
Pod Memory Usage MB k8s_service_pod_memory_usage The memory resources total usage of pods cAdvisor
Pod Waiting k8s_service_pod_status_waiting The pods and containers which currently in the waiting status, and show the reason K8s kube-state-metrics
Pod Terminated k8s_service_pod_status_terminated The pods and containers which currently in the terminated status, and show the reason K8s kube-state-metrics
Pod Restarts k8s_service_pod_status_restarts_total The number of per container restarts that related to the pod K8s kube-state-metrics
Pod Network Receive KB/s k8s_service_pod_network_receive The Network receive of the pods cAdvisor
Pod Network Transmit KB/s k8s_service_pod_network_transmit The Network transmit of the pods cAdvisor
Pod Storage Usage MB k8s_service_pod_fs_usage The storage resources total usage of pods which related to this service cAdvisor

Customizing

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are in /config/otel-oc-rules/k8s-cluster.yaml,/config/otel-oc-rules/k8s-node.yaml, /config/otel-oc-rules/k8s-service.yaml.
The dashboard panel confirmations are in /config/ui-initialized-templates/k8s.yml.