AWS EKS

The AWS_EKS layer monitors Amazon Elastic Kubernetes Service (EKS) clusters. SkyWalking ingests EKS observability data through OpenTelemetry — Container Insights / CloudWatch metrics scraped into OAP — and reshapes it into cluster, node, and pod metrics. It groups under AWS in the sidebar.

In Horizon’s sidebar this layer’s entities are renamed to fit the EKS model: services are listed as Clusters, instances as Nodes, and endpoints as EKS services (the Kubernetes services running inside the cluster). The AWS_EKS layer enables the Service (Cluster), Instance (Node), and Endpoint (EKS service) scopes; it does not enable a topology, traces, or logs tab — EKS reports metric data only.

This page is the operator reference for the bundled AWS_EKS dashboard: what you see on each scope and what each widget means.

The widgets and metrics below are read from the bundled AWS_EKS template; if an operator has published a customized AWS_EKS template to OAP, the live dashboard reflects that copy instead. See Layer Dashboard Templates for how the bundled default, your local draft, and the OAP-published copy relate.

Cluster list

Before opening a cluster, the layer landing page lists every EKS cluster with four sortable columns, sorted by Nodes by default. Each column shows the latest reading averaged across the window:

  • Nodes — number of nodes in the cluster (latest(eks_cluster_node_count)).

  • Failed Nodes — nodes currently in a failed state (latest(eks_cluster_failed_node_count)).

  • Namespaces — Kubernetes namespaces in the cluster (latest(eks_cluster_namespace_count)).

  • Services — Kubernetes services in the cluster (latest(eks_cluster_service_count)).

Cluster dashboard

The primary drill-down for one selected cluster (a Service in OAP terms).

  • Node Count — number of nodes over time (eks_cluster_node_count).

  • Failed Nodes — nodes in a failed state over time (eks_cluster_failed_node_count).

  • Namespace Count — Kubernetes namespaces in the cluster (eks_cluster_namespace_count).

  • EKS Service Count — Kubernetes services in the cluster (eks_cluster_service_count).

  • Cluster Network Errors — cluster-wide receive and transmit error counts, plotted as two series, rx (eks_cluster_net_rx_error) and tx (eks_cluster_net_tx_error).

  • Cluster Network Drops — cluster-wide dropped packets on receive and transmit, rx (eks_cluster_net_rx_dropped) and tx (eks_cluster_net_tx_dropped).

Node dashboard

For one selected node (an Instance in OAP terms).

  • Pod Count — pods scheduled on the node (eks_cluster_node_pod_number).

  • CPU Utilization (%) — node CPU utilization (eks_cluster_node_cpu_utilization).

  • Memory Utilization (%) — node memory utilization (eks_cluster_node_memory_utilization).

  • FS Utilization (%) — node filesystem utilization (eks_cluster_node_fs_utilization).

  • Network RX (KB/s) — node receive throughput in KB/s (eks_cluster_node_net_rx_bytes/1024) on the left axis, with receive errors (eks_cluster_node_net_rx_error) on a second axis so the error count doesn’t get lost against the byte scale.

  • Network TX (KB/s) — node transmit throughput in KB/s (eks_cluster_node_net_tx_bytes/1024) on the left axis, with transmit errors (eks_cluster_node_net_tx_error) on a second axis.

  • Disk IO (B/s) — node disk read and write throughput in bytes/s, plotted as read (eks_cluster_node_disk_io_read) and write (eks_cluster_node_disk_io_write).

  • Pod CPU on Node — aggregate CPU utilization of the pods running on this node (eks_cluster_node_pod_cpu_utilization).

  • Pod Memory on Node — aggregate memory utilization of the pods running on this node (eks_cluster_node_pod_memory_utilization).

EKS service dashboard

For one selected EKS service (an Endpoint in OAP terms) — a Kubernetes service running inside the cluster, with its pod-level resource and network metrics.

  • Pod CPU Utilization (%) — CPU utilization across the service’s pods (eks_cluster_service_pod_cpu_utilization).

  • Pod Memory Utilization (%) — memory utilization across the service’s pods (eks_cluster_service_pod_memory_utilization).

  • Pod Network RX (KB/s) — pod receive throughput in KB/s (eks_cluster_service_pod_net_rx_bytes/1024).

  • Pod RX Errors / s — pod receive error rate (eks_cluster_service_pod_net_rx_error).

  • Pod Network TX (KB/s) — pod transmit throughput in KB/s (eks_cluster_service_pod_net_tx_bytes/1024).

  • Pod TX Errors / s — pod transmit error rate (eks_cluster_service_pod_net_tx_error).

Requirements

The AWS_EKS dashboard is a pure consumer of what OAP reports — it invents no data, and a widget with no backing data simply reads no data. To populate it, OAP needs the EKS observability metric families, fed in through the OpenTelemetry receiver from Amazon CloudWatch / Container Insights:

  • Cluster metrics — the eks_cluster_* family at cluster scope: node / failed-node / namespace / service counts and cluster-wide network error and drop counters.

  • Node metrics — the eks_cluster_node_* family at node scope: pod count, CPU / memory / filesystem utilization, network receive / transmit bytes and errors, disk read / write IO, and the per-node aggregate pod CPU / memory utilization.

  • EKS service metrics — the eks_cluster_service_pod_* family at EKS-service scope: per-service pod CPU / memory utilization and pod network receive / transmit bytes and errors.

Each metric is queried at its own OAP scope (Cluster / Node / EKS service); OAP does not roll a metric up across scopes, so a node- or service-scope metric stays empty until that level of data is reported. For how to stand up the EKS-to-OAP pipeline, see the layer’s setup documentation.