Kubernetes (K8s) monitoring from Rover
SkyWalking uses the Cilium Fetcher to gather traffic data between services from Cilium Hubble via the Observe API. It then leverages the OAL System for metrics and entity analysis.
Data flow
SkyWalking fetches Cilium Node and Observability Data from gRPC API, analysis to generate entity and using OAL to generating metrics.
API Requirements
- Peers API: Listen the hubble node in the cluster, OAP would communicate with Hubble node to obtain Observe data.
- Observe API: Fetch the Flow data from Hubble node.
Setup
- Please following the Setup Hubble Observability documentation to setting the Hubble for provided API.
- To activate Cilium receiver module, set
selector=default
in the YAML orset SW_CILIUM_FETCHER=default
through the system environment variable.
cilium-fetcher:
selector: ${SW_CILIUM_FETCHER:default}
default:
# Host name and port of Hubble peer component
peerHost: ${SW_CILIUM_FETCHER_PEER_HOST:hubble-peer.kube-system.svc.cluster.local}
peerPort: ${SW_CILIUM_FETCHER_PEER_PORT:80}
fetchFailureRetrySecond: ${SW_CILIUM_FETCHER_FETCH_FAILURE_RETRY_SECOND:10}
sslConnection: ${SW_CILIUM_FETCHER_SSL_CONNECTION:false}
sslPrivateKeyFile: ${SW_CILIUM_FETCHER_PRIVATE_KEY_FILE_PATH:}
sslCertChainFile: ${SW_CILIUM_FETCHER_CERT_CHAIN_FILE_PATH:}
sslCaFile: ${SW_CILIUM_FETCHER_CA_FILE_PATH:}
convertClientAsServerTraffic: ${SW_CILIUM_FETCHER_CONVERT_CLIENT_AS_SERVER_TRAFFIC:true}
- If enabled the TLS certificate within the Hubble, please update these few configurations.
peerPort
: usually should be updated to the443
.sslConnection
: should be set totrue
.sslPrivateKeyFile
: the path of the private key file.sslCertChainFile
: the path of the certificate chain file.sslCaFile
: the path of the CA file.
- Configure the cilium rules please configure the following configuration:
cilium-rules/exclude.yaml
: Configure the which endpoint should be excluded from the monitoring, Please read exclude rules selection for more detail.cilium-rules/metadata-service-mapping.yaml
: Configure the service name and endpoint mapping.
Exclude Rules
The exclude configuration in Cilium rules is used to specify which Cilium Endpoints would be excluded from being added to the topology map or from the generation of metrics and other data.
namespaces: # define with traffic from which namespace should be excluded
- kube-system
labels: # define with traffic from which endpoint labels should be excluded, if matches any labels, the traffic would be excluded.
- k8s:io.cilium.k8s.namespace.labels.istio-injection: "enabled" # Each labels is a key-value pair, the key is the label key, the value is the label value.
k8s:security.istio.io/tlsMode: istio
By default, all the traffic from kube-system
and traffic management by istio mesh would be excluded.
NOTE: Only the endpoint in both source and destination matches the exclude rules would be excluded. Otherwise, the traffic would be still included.
Generated Entities
SkyWalking fetch the flow from Cilium, analyzes the source and destination endpoint to parse out the following corresponding entities:
- Service
- Service Instance
- Service Endpoint
- Service Relation
- Service Instance Relation
- Service Endpoint Relation
Generate Metrics
For each of the above-mentioned entities, metrics such as L4 and L7 protocols can be analyzed.
L4 Metrics
Record the relevant metrics for every service read/write packages with other services.
Name | Unit | Description |
---|---|---|
Read Package CPM | Count | Total Read Package from other Service counts per minutes. |
Write Package CPM | Count | Total Write Package from other Service counts per minutes. |
Drop Package CPM | Count | Total Drop Package from other Service counts per minutes. |
Drop Package Reason Count | Labeled Count | Total Read Package reason(labeled) from other Service counts per minutes. |
Protocol
Based on each transfer data analysis, extract the information of the 7-layer network protocol.
NOTE: By default, Cilium only reports L4 metrics. If you need L7 metrics, they must be explicitly specified in each service’s CiliumNetworkPolicy. For details please refer to this document.
HTTP
Name | Unit | Description |
---|---|---|
CPM | Count | HTTP Request calls per minutes. |
Duration | Nanoseconds | Total HTTP Response use duration. |
Success CPM | Count | Total HTTP Response success(status < 500) count. |
Status 1/2/3/4/5xx | Count | HTTP Response status code group by 1xx/2xx/3xx/4xx/5xx. |
DNS
Name | Unit | Description |
---|---|---|
CPM | Count | DNS Request calls per minutes. |
Duration | Nanoseconds | Total DNS Response use duration. |
Success CPM | Count | Total DNS Response success(code == 0) count. |
Error Count | Label Count | DNS Response error count with error description label. |
Kafka
Name | Unit | Description |
---|---|---|
CPM | Count | Kafka Request calls per minutes. |
Duration | Nanoseconds | Total Kafka Response use duration. |
Success CPM | Count | Total Kafka Response success(errorCode == 0) count. |
Error Count | Label Count | Kafka Response error count with error description label. |
Load Balance for Cilium Fetcher with OAP cluster
The Cilium Fetcher module relies on the Cluster module, when the Cilium Fetcher module starts up, it obtains information about all Cilium nodes and node information in the OAP cluster through Peers API on each OAP node.
Additionally, it averagely distributes collected Cilium nodes to every OAP node. Moreover, it ensures that a single Cilium node is not monitored by multiple OAP nodes.
Customizations
You can customize your own metrics/dashboard panel.
The metrics definition and expression rules are found in /config/oal/cilium.oal
, please refer the Scope Declaration Documentation.
The Cilium dashboard panel configurations are found in /config/ui-initialized-templates/cilium_service
.