VMs monitoring

SkyWalking leverages Prometheus node-exporter for collecting metrics data from the VMs, and leverages OpenTelemetry Collector to transfer the metrics to OpenTelemetry receiver and into the Meter System.
We define the VM entity as a Service in OAP, and use vm:: as a prefix to identify it.

Data flow

  1. The Prometheus node-exporter collects metrics data from the VMs.
  2. The OpenTelemetry Collector fetches metrics from the node-exporter via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus gRPC Exporter.
  3. The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results.

Setup

  1. Setup Prometheus node-exporter.
  2. Setup OpenTelemetry Collector . This is an example for OpenTelemetry Collector configuration otel-collector-config.yaml.
  3. Config SkyWalking OpenTelemetry receiver.

Supported Metrics

Monitoring Panel Unit Metric Name Description Data Source
CPU Usage % cpu_total_percentage The total percentage usage of the CPU core. If there are 2 cores, the maximum usage is 200%. Prometheus node-exporter
Memory RAM Usage MB meter_vm_memory_used The total RAM usage Prometheus node-exporter
Memory Swap Usage % meter_vm_memory_swap_percentage The percentage usage of swap memory Prometheus node-exporter
CPU Average Used % meter_vm_cpu_average_used The percentage usage of the CPU core in each mode Prometheus node-exporter
CPU Load meter_vm_cpu_load1
meter_vm_cpu_load5
meter_vm_cpu_load15
The CPU 1m / 5m / 15m average load Prometheus node-exporter
Memory RAM MB meter_vm_memory_total
meter_vm_memory_available
meter_vm_memory_used
The RAM statistics, including Total / Available / Used Prometheus node-exporter
Memory Swap MB meter_vm_memory_swap_free
meter_vm_memory_swap_total
The swap memory statistics, including Free / Total Prometheus node-exporter
File System Mountpoint Usage % meter_vm_filesystem_percentage The percentage usage of the file system at each mount point Prometheus node-exporter
Disk R/W KB/s meter_vm_disk_read,meter_vm_disk_written The disk read and written Prometheus node-exporter
Network Bandwidth Usage KB/s meter_vm_network_receive
meter_vm_network_transmit
The network receive and transmit Prometheus node-exporter
Network Status meter_vm_tcp_curr_estab
meter_vm_tcp_tw
meter_vm_tcp_alloc
meter_vm_sockets_used
meter_vm_udp_inuse
The number of TCPs established / TCP time wait / TCPs allocated / sockets in use / UDPs in use Prometheus node-exporter
Filefd Allocated meter_vm_filefd_allocated The number of file descriptors allocated Prometheus node-exporter

Customizing

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in /config/otel-oc-rules/vm.yaml.
The dashboard panel confirmations are found in /config/ui-initialized-templates/vm.yml.

Blog

For more details, see blog article SkyWalking 8.4 provides infrastructure monitoring.

K8s monitoring

SkyWalking leverages K8s kube-state-metrics and cAdvisor for collecting metrics data from K8s, and leverages OpenTelemetry Collector to transfer the metrics to OpenTelemetry receiver and into the Meter System. This feature requires authorizing the OAP Server to access K8s’s API Server.
We define the k8s-cluster as a Service in the OAP, and use k8s-cluster:: as a prefix to identify it.
We define the k8s-node as an Instance in the OAP, and set its name as the K8s node name.
We define the k8s-service as an Endpoint in the OAP, and set its name as $serviceName.$namespace.

Data flow

  1. K8s kube-state-metrics and cAdvisor collect metrics data from K8s.
  2. OpenTelemetry Collector fetches metrics from kube-state-metrics and cAdvisor via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter.
  3. The SkyWalking OAP Server access to K8s’s API Server gets meta info and parses the expression with MAL to filter/calculate/aggregate and store the results.

Setup

  1. Setup kube-state-metric.
  2. cAdvisor is integrated into kubelet by default.
  3. Set up OpenTelemetry Collector . For details on Prometheus Receiver in OpenTelemetry Collector for K8s, refer to here. For a quick start, we have provided a full example for OpenTelemetry Collector configuration otel-collector-config.yaml.
  4. Config SkyWalking OpenTelemetry receiver.

Supported Metrics

From the different points of view to monitor K8s, there are 3 kinds of metrics: Cluster / Node / Service

Cluster

These metrics are related to the selected cluster (Current Service in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Node Total k8s_cluster_node_total The number of nodes K8s kube-state-metrics
Namespace Total k8s_cluster_namespace_total The number of namespaces K8s kube-state-metrics
Deployment Total k8s_cluster_deployment_total The number of deployments K8s kube-state-metrics
Service Total k8s_cluster_service_total The number of services K8s kube-state-metrics
Pod Total k8s_cluster_pod_total The number of pods K8s kube-state-metrics
Container Total k8s_cluster_container_total The number of containers K8s kube-state-metrics
CPU Resources m k8s_cluster_cpu_cores
k8s_cluster_cpu_cores_requests
k8s_cluster_cpu_cores_limits
k8s_cluster_cpu_cores_allocatable
The capacity and the Requests / Limits / Allocatable of the CPU K8s kube-state-metrics
Memory Resources GB k8s_cluster_memory_total
k8s_cluster_memory_requests
k8s_cluster_memory_limits
k8s_cluster_memory_allocatable
The capacity and the Requests / Limits / Allocatable of the memory K8s kube-state-metrics
Storage Resources GB k8s_cluster_storage_total
k8s_cluster_storage_allocatable
The capacity and allocatable of the storage K8s kube-state-metrics
Node Status k8s_cluster_node_status The current status of the nodes K8s kube-state-metrics
Deployment Status k8s_cluster_deployment_status The current status of the deployment K8s kube-state-metrics
Deployment Spec Replicas k8s_cluster_deployment_spec_replicas The number of desired pods for a deployment K8s kube-state-metrics
Service Status k8s_cluster_service_pod_status The services current status, depending on the related pods' status K8s kube-state-metrics
Pod Status Not Running k8s_cluster_pod_status_not_running The pods which are not running in the current phase K8s kube-state-metrics
Pod Status Waiting k8s_cluster_pod_status_waiting The pods and containers which are currently in the waiting status, with reasons shown K8s kube-state-metrics
Pod Status Terminated k8s_cluster_container_status_terminated The pods and containers which are currently in the terminated status, with reasons shown K8s kube-state-metrics

Node

These metrics are related to the selected node (Current Instance in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Pod Total k8s_node_pod_total The number of pods in this node K8s kube-state-metrics
Node Status k8s_node_node_status The current status of this node K8s kube-state-metrics
CPU Resources m k8s_node_cpu_cores
k8s_node_cpu_cores_allocatable
k8s_node_cpu_cores_requests
k8s_node_cpu_cores_limits
The capacity and the requests / Limits / Allocatable of the CPU K8s kube-state-metrics
Memory Resources GB k8s_node_memory_total
k8s_node_memory_allocatable
k8s_node_memory_requests
k8s_node_memory_limits
The capacity and the requests / Limits / Allocatable of the memory K8s kube-state-metrics
Storage Resources GB k8s_node_storage_total
k8s_node_storage_allocatable
The capacity and allocatable of the storage K8s kube-state-metrics
CPU Usage m k8s_node_cpu_usage The total usage of the CPU core, if there are 2 cores the maximum usage is 2000m cAdvisor
Memory Usage GB k8s_node_memory_usage The totaly memory usage cAdvisor
Network I/O KB/s k8s_node_network_receive
k8s_node_network_transmit
The network receive and transmit cAdvisor

Service

In these metrics, the pods are related to the selected service (Current Endpoint in the dashboard).

Monitoring Panel Unit Metric Name Description Data Source
Service Pod Total k8s_service_pod_total The number of pods K8s kube-state-metrics
Service Pod Status k8s_service_pod_status The current status of pods K8s kube-state-metrics
Service CPU Resources m k8s_service_cpu_cores_requests
k8s_service_cpu_cores_limits
The CPU resources requests / Limits of this service K8s kube-state-metrics
Service Memory Resources MB k8s_service_memory_requests
k8s_service_memory_limits
The memory resources requests / Limits of this service K8s kube-state-metrics
Pod CPU Usage m k8s_service_pod_cpu_usage The CPU resources total usage of pods cAdvisor
Pod Memory Usage MB k8s_service_pod_memory_usage The memory resources total usage of pods cAdvisor
Pod Waiting k8s_service_pod_status_waiting The pods and containers which are currently in the waiting status, with reasons shown K8s kube-state-metrics
Pod Terminated k8s_service_pod_status_terminated The pods and containers which are currently in the terminated status, with reasons shown K8s kube-state-metrics
Pod Restarts k8s_service_pod_status_restarts_total The number of per container restarts related to the pods K8s kube-state-metrics
Pod Network Receive KB/s k8s_service_pod_network_receive The network receive of the pods cAdvisor
Pod Network Transmit KB/s k8s_service_pod_network_transmit The network transmit of the pods cAdvisor
Pod Storage Usage MB k8s_service_pod_fs_usage The storage resources total usage of pods related to this service cAdvisor

Customizing

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in /config/otel-oc-rules/k8s-cluster.yaml,/config/otel-oc-rules/k8s-node.yaml, /config/otel-oc-rules/k8s-service.yaml.
The dashboard panel configurations are found in /config/ui-initialized-templates/k8s.yml.