AWS Cloud EKS monitoring
SkyWalking leverages OpenTelemetry Collector with AWS Container Insights Receiver to transfer the metrics to OpenTelemetry receiver and into the Meter System.
Data flow
- OpenTelemetry Collector fetches metrics from EKS via AWS Container Insights Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
- The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results.
Set up
- Deploy amazon/aws-otel-collector with AWS Container Insights Receiver to EKS
- Config SkyWalking OpenTelemetry receiver.
Read Monitoring AWS EKS and S3 with SkyWalking for more details
EKS Monitoring
AWS Container Insights Receiver provides multiple dimensions metrics for EKS cluster, node, service, etc.
Accordingly, SkyWalking observes the status, and payload of the EKS cluster, which is cataloged as a LAYER: AWS_EKS
Service
in the OAP. Meanwhile, the k8s nodes would be recognized as LAYER: AWS_EKS
instance
s. The k8s service would be recognized as endpoint
s.
Specify Job Name
SkyWalking distinguishes AWS Cloud EKS metrics by attributes job_name
, which value is aws-cloud-eks-monitoring
.
You could leverage OTEL Collector processor to add the attribute as follows:
processors:
resource/job-name:
attributes:
- key: job_name
value: aws-cloud-eks-monitoring
action: insert
Notice, if you don’t specify job_name
attribute, SkyWalking OAP will ignore the metrics
Supported Metrics
Monitoring Panel | Unit | Metric Name | Catalog | Description | Data Source |
---|---|---|---|---|---|
Node Count | eks_cluster_node_count | Service | The node count of the EKS cluster | AWS Container Insights Receiver | |
Failed Node Count | eks_cluster_failed_node_count | Service | The failed node count of the EKS cluster | AWS Container Insights Receiver | |
Pod Count (namespace dimension) | eks_cluster_namespace_count | Service | The count of pod in the EKS cluster(namespace dimension) | AWS Container Insights Receiver | |
Pod Count (service dimension) | eks_cluster_service_count | Service | The count of pod in the EKS cluster(service dimension) | AWS Container Insights Receiver | |
Network RX Dropped Count (per second) | count/s | eks_cluster_net_rx_dropped | Service | Network RX dropped count | AWS Container Insights Receiver |
Network RX Error Count (per second) | count/s | eks_cluster_net_rx_error | Service | Network RX error count | AWS Container Insights Receiver |
Network TX Dropped Count (per second) | count/s | eks_cluster_net_rx_dropped | Service | Network TX dropped count | AWS Container Insights Receiver |
Network TX Error Count (per second) | count/s | eks_cluster_net_rx_error | Service | Network TX error count | AWS Container Insights Receiver |
Pod Count | eks_cluster_node_pod_number | Instance | The count of pod running on the node | AWS Container Insights Receiver | |
CPU Utilization | percent | eks_cluster_node_cpu_utilization | Instance | The CPU Utilization of the node | AWS Container Insights Receiver |
Memory Utilization | percent | eks_cluster_node_memory_utilization | Instance | The Memory Utilization of the node | AWS Container Insights Receiver |
Network RX | bytes/s | eks_cluster_node_net_rx_bytes | Instance | Network RX bytes of the node | AWS Container Insights Receiver |
Network RX Error Count | count/s | eks_cluster_node_net_rx_bytes | Instance | Network RX error count of the node | AWS Container Insights Receiver |
Network TX | bytes/s | eks_cluster_node_net_rx_bytes | Instance | Network TX bytes of the node | AWS Container Insights Receiver |
Network TX Error Count | count/s | eks_cluster_node_net_rx_bytes | Instance | Network TX error count of the node | AWS Container Insights Receiver |
Disk IO Write | bytes/s | eks_cluster_node_net_rx_bytes | Instance | The IO write bytes of the node | AWS Container Insights Receiver |
Disk IO Read | bytes/s | eks_cluster_node_net_rx_bytes | Instance | The IO read bytes of the node | AWS Container Insights Receiver |
FS Utilization | percent | eks_cluster_node_net_rx_bytes | Instance | The filesystem utilization of the node | AWS Container Insights Receiver |
CPU Utilization | percent | eks_cluster_node_pod_cpu_utilization | Instance | The CPU Utilization of the pod running on the node | AWS Container Insights Receiver |
Memory Utilization | percent | eks_cluster_node_pod_memory_utilization | Instance | The Memory Utilization of the pod running on the node | AWS Container Insights Receiver |
Network RX | bytes/s | eks_cluster_node_pod_net_rx_bytes | Instance | Network RX bytes of the pod running on the node | AWS Container Insights Receiver |
Network RX Error Count | count/s | eks_cluster_node_pod_net_rx_error | Instance | Network RX error count of the pod running on the node | AWS Container Insights Receiver |
Network TX | bytes/s | eks_cluster_node_pod_net_tx_bytes | Instance | Network RX bytes of the pod running on the node | AWS Container Insights Receiver |
Network TX Error Count | count/s | eks_cluster_node_pod_net_tx_error | Instance | Network RX error count of the pod running on the node | AWS Container Insights Receiver |
CPU Utilization | percent | eks_cluster_service_pod_cpu_utilization | Endpoint | The CPU Utilization of pod that belong to the service | AWS Container Insights Receiver |
Memory Utilization | percent | eks_cluster_service_pod_memory_utilization | Endpoint | The Memory Utilization of pod that belong to the service | AWS Container Insights Receiver |
Network RX | bytes/s | eks_cluster_service_pod_net_rx_bytes | Endpoint | Network RX bytes of the pod that belong to the service | AWS Container Insights Receiver |
Network RX Error Count | count/s | eks_cluster_service_pod_net_rx_error | Endpoint | Network TX error count of the pod that belongs to the service | AWS Container Insights Receiver |
Network TX | bytes/s | eks_cluster_service_pod_net_tx_bytes | Endpoint | Network TX bytes of the pod that belong to the service | AWS Container Insights Receiver |
Network TX Error Count | count/s | eks_cluster_node_pod_net_tx_error | Endpoint | Network TX error count of the pod that belongs to the service | AWS Container Insights Receiver |
Customizations
You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in /config/otel-rules/aws-eks/
.
The AWS Cloud EKS dashboard panel configurations are found in /config/ui-initialized-templates/aws_eks
.
OTEL Configuration Sample With AWS Container Insights Receiver
extensions:
health_check:
receivers:
awscontainerinsightreceiver:
processors:
resource/job-name:
attributes:
- key: job_name
value: aws-cloud-eks-monitoring
action: insert
exporters:
otlp:
endpoint: oap-service:11800
tls:
insecure: true
logging:
loglevel: debug
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [resource/job-name]
exporters: [otlp,logging]
extensions: [health_check]
Refer to AWS Container Insights Receiver for more information