SWIP-10 Support Envoy AI Gateway Observability

Motivation

Envoy AI Gateway is a gateway/proxy for AI/LLM API traffic (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. It provides GenAI-specific observability following OpenTelemetry GenAI Semantic Conventions, including token usage tracking, request latency, time-to-first-token (TTFT), and inter-token latency.

SkyWalking should support monitoring Envoy AI Gateway as a first-class integration, providing:

  1. Metrics monitoring via OTLP push for GenAI metrics.
  2. Access log collection via OTLP log sink for per-request AI metadata analysis.

This is complementary to PR #13745 (agent-based Virtual GenAI monitoring). The agent-based approach monitors LLM calls from the client application side, while this SWIP monitors from the gateway (infrastructure) side. Both can coexist — the AI Gateway provides infrastructure-level visibility regardless of whether the calling application is instrumented.

Architecture Graph

Metrics Path (OTLP Push)

┌─────────────────┐      OTLP gRPC       ┌─────────────────┐
│  Envoy AI       │  ──────────────────>  │  SkyWalking OAP │
│  Gateway        │   (push, port 11800)   │  (otel-receiver) │
│                 │                       │                 │
│  4 GenAI metrics│                       │  MAL rules      │
│  + labels       │                       │  → aggregation  │
└─────────────────┘                       └─────────────────┘

Access Log Path (OTLP Push)

┌─────────────────┐      OTLP gRPC       ┌─────────────────┐
│  Envoy AI       │  ──────────────────>  │  SkyWalking OAP │
│  Gateway        │   (push, port 11800)   │  (otel-receiver) │
│                 │                       │                 │
│  access logs    │                       │  LAL rules      │
│  with AI meta   │                       │  → analysis     │
└─────────────────┘                       └─────────────────┘

The AI Gateway natively supports an OTLP access log sink (via Envoy Gateway’s OpenTelemetry sink), pushing structured access logs directly to the OAP’s OTLP receiver. No FluentBit or external log collector is needed.

Proposed Changes

1. New Layer: ENVOY_AI_GATEWAY

Add a new layer in Layer.java:

/**
 * Envoy AI Gateway is an AI/LLM traffic gateway built on Envoy Proxy,
 * providing observability for GenAI API traffic.
 */
ENVOY_AI_GATEWAY(46, true),

This is a normal layer (isNormal=true) because the AI Gateway is a real, instrumented infrastructure component (similar to KONG, APISIX, NGINX), not a virtual/conjectured service.

2. Entity Model

job_name — Routing Tag for MAL/LAL Rules

The job_name resource attribute is set explicitly in OTEL_RESOURCE_ATTRIBUTES to a fixed value for all AI Gateway deployments. MAL rule filters use it to route metrics to the correct rule set:

filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }"

job_name is NOT the SkyWalking service name — it is only used for metric/log routing. The SkyWalking service name comes from OTEL_SERVICE_NAME (standard OTel env var), which is set per deployment.

Service and Instance Mapping

SkyWalking Entity Source Example
Service OTEL_SERVICE_NAME / service.name (per-deployment gateway name) my-ai-gateway
Service Instance service.instance.id resource attribute (pod name, set via Downward API) aigw-pod-7b9f4d8c5

Each Kubernetes Gateway deployment sets its own OTEL_SERVICE_NAME (the standard OTel env var) as the SkyWalking service name. Each pod is a service instance identified by service.instance.id.

The job_name resource attribute is set explicitly to the fixed value envoy-ai-gateway for MAL/LAL rule routing. This is separate from service.name — all AI Gateway deployments share the same job_name for routing, but each has its own service.name for entity identity.

The layer (ENVOY_AI_GATEWAY) is set via service.layer resource attribute and used by LAL for log routing. MAL rules use job_name for metric routing.

Provider and model are metric-level labels, not separate entities in this layer. They are used for fine-grained metric breakdowns within the gateway service dashboards rather than being modeled as separate services (unlike the agent-based VIRTUAL_GENAI layer where provider=service, model=instance).

The MAL expSuffix uses the service_name tag as the SkyWalking service name and service_instance_id as the instance name:

expSuffix: service(['service_name'], Layer.ENVOY_AI_GATEWAY).instance(['service_name', 'service_instance_id'])

Complete Kubernetes Setup Example

The following example shows a complete Envoy AI Gateway deployment configured for SkyWalking observability via OTLP metrics and access logs.

# 1. GatewayClass — standard Envoy Gateway controller
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy-ai-gateway
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
# 2. GatewayConfig — OTLP configuration for SkyWalking
#    One GatewayConfig per gateway. Sets job_name, service name, instance ID,
#    and enables OTLP push for both metrics and access logs.
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: GatewayConfig
metadata:
  name: my-gateway-config
  namespace: default
spec:
  extProc:
    kubernetes:
      env:
        # SkyWalking service name = Gateway CRD name (auto-resolved from pod label)
        # OTEL_SERVICE_NAME is the standard OTel env var for service.name
        - name: GATEWAY_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name']
        - name: OTEL_SERVICE_NAME
          value: "$(GATEWAY_NAME)"
        # OTLP endpoint — SkyWalking OAP gRPC receiver
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://skywalking-oap.skywalking:11800"
        - name: OTEL_EXPORTER_OTLP_PROTOCOL
          value: "grpc"
        - name: OTEL_METRICS_EXPORTER
          value: "otlp"
        - name: OTEL_LOGS_EXPORTER
          value: "otlp"
        # Pod name for instance identity
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        # job_name — fixed routing tag for MAL/LAL rules (same for ALL AI Gateway deployments)
        # service.instance.id — SkyWalking instance name (= pod name)
        # service.layer — routes logs to ENVOY_AI_GATEWAY LAL rules
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "job_name=envoy-ai-gateway,service.instance.id=$(POD_NAME),service.layer=ENVOY_AI_GATEWAY"
---
# 3. Gateway — references the GatewayConfig via annotation
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: my-ai-gateway
  namespace: default
  annotations:
    aigateway.envoyproxy.io/gateway-config: my-gateway-config
spec:
  gatewayClassName: envoy-ai-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
# 4. AIGatewayRoute — routing rules + token metadata for access logs
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: my-ai-gateway-route
  namespace: default
spec:
  parentRefs:
    - name: my-ai-gateway
      kind: Gateway
      group: gateway.networking.k8s.io
  # Enable token counts in access logs
  llmRequestCosts:
    - metadataKey: llm_input_token
      type: InputToken
    - metadataKey: llm_output_token
      type: OutputToken
    - metadataKey: llm_total_token
      type: TotalToken
  # Route all models to the backend
  rules:
    - backendRefs:
        - name: openai-backend
---
# 5. AIServiceBackend + Backend — LLM provider
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: openai-backend
  namespace: default
spec:
  schema:
    name: OpenAI
  backendRef:
    name: openai-backend
    kind: Backend
    group: gateway.envoyproxy.io
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: openai-backend
  namespace: default
spec:
  endpoints:
    - fqdn:
        hostname: api.openai.com
        port: 443

Key env var mapping:

Env Var / Resource Attribute SkyWalking Concept Example Value
OTEL_SERVICE_NAME Service name my-ai-gateway (auto-resolved from Gateway CRD name)
job_name (in OTEL_RESOURCE_ATTRIBUTES) MAL/LAL rule routing envoy-ai-gateway (fixed for all deployments)
service.instance.id (in OTEL_RESOURCE_ATTRIBUTES) Instance name envoy-default-my-ai-gateway-... (auto-resolved from pod name)
service.layer (in OTEL_RESOURCE_ATTRIBUTES) LAL log routing ENVOY_AI_GATEWAY (fixed)

No manual per-gateway configuration needed for service and instance names:

  • GATEWAY_NAME is auto-resolved from the pod label gateway.envoyproxy.io/owning-gateway-name, which is set automatically by the Envoy Gateway controller on every envoy pod.
  • OTEL_SERVICE_NAME uses $(GATEWAY_NAME) substitution to set the per-deployment service name.
  • POD_NAME is auto-resolved from the pod name via the Downward API.

The GatewayConfig.spec.extProc.kubernetes.env field accepts full corev1.EnvVar objects (including valueFrom), merged into the ext_proc container by the gateway mutator webhook. Verified on Kind cluster — the gateway label resolves correctly (e.g., my-ai-gateway).

Important: The resource.WithFromEnv() code path in the AI Gateway (internal/metrics/metrics.go) is conditional — it only executes when OTEL_EXPORTER_OTLP_ENDPOINT is set (or OTEL_METRICS_EXPORTER=console). The ext_proc runs in-process (not as a subprocess), so there is no env var propagation issue.

3. MAL Rules for OTLP Metrics

Create oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/ with 2 MAL rule files consuming the 4 GenAI metrics from Envoy AI Gateway. Since expSuffix is file-level, service and instance scopes need separate files. Provider and model breakdowns share the same expSuffix as their parent scope, so they are included in the same file.

File expSuffix Contains
gateway-service.yaml service(['service_name'], Layer.ENVOY_AI_GATEWAY) Service aggregates + per-provider breakdown + per-model breakdown
gateway-instance.yaml instance(['service_name'], ['service_instance_id'], Layer.ENVOY_AI_GATEWAY) Instance aggregates + per-provider breakdown + per-model breakdown

All MAL rule files use the job_name filter to match only AI Gateway traffic:

filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }"

Source Metrics from AI Gateway

Metric Type Labels
gen_ai_client_token_usage Histogram (Delta) gen_ai.token.type (input/output), gen_ai.provider.name, gen_ai.response.model, gen_ai.operation.name
gen_ai_server_request_duration Histogram gen_ai.provider.name, gen_ai.response.model, gen_ai.operation.name
gen_ai_server_time_to_first_token Histogram gen_ai.provider.name, gen_ai.response.model, gen_ai.operation.name
gen_ai_server_time_per_output_token Histogram gen_ai.provider.name, gen_ai.response.model, gen_ai.operation.name

Proposed SkyWalking Metrics

Gateway-level (Service) metrics:

Monitoring Panel Unit Metric Name Description
Request CPM count/min meter_envoy_ai_gw_request_cpm Requests per minute
Request Latency Avg ms meter_envoy_ai_gw_request_latency_avg Average request duration
Request Latency Percentile ms meter_envoy_ai_gw_request_latency_percentile P50/P75/P90/P95/P99 request duration
Input Tokens Rate tokens/min meter_envoy_ai_gw_input_token_rate Input tokens per minute (total across all models)
Output Tokens Rate tokens/min meter_envoy_ai_gw_output_token_rate Output tokens per minute (total across all models)
Total Tokens Rate tokens/min meter_envoy_ai_gw_total_token_rate Total tokens per minute
TTFT Avg ms meter_envoy_ai_gw_ttft_avg Average time to first token
TTFT Percentile ms meter_envoy_ai_gw_ttft_percentile P50/P75/P90/P95/P99 time to first token
Time Per Output Token Avg ms meter_envoy_ai_gw_tpot_avg Average inter-token latency
Time Per Output Token Percentile ms meter_envoy_ai_gw_tpot_percentile P50/P75/P90/P95/P99 inter-token latency
Estimated Cost cost/min meter_envoy_ai_gw_estimated_cost Estimated cost per minute (from token counts × config pricing)

Per-provider breakdown metrics (service scope):

Monitoring Panel Unit Metric Name Description
Provider Request CPM count/min meter_envoy_ai_gw_provider_request_cpm Requests per minute by provider
Provider Token Usage tokens/min meter_envoy_ai_gw_provider_token_rate Token rate by provider and token type
Provider Latency Avg ms meter_envoy_ai_gw_provider_latency_avg Average latency by provider

Per-model breakdown metrics (service scope):

Monitoring Panel Unit Metric Name Description
Model Request CPM count/min meter_envoy_ai_gw_model_request_cpm Requests per minute by model
Model Token Usage tokens/min meter_envoy_ai_gw_model_token_rate Token rate by model and token type
Model Latency Avg ms meter_envoy_ai_gw_model_latency_avg Average latency by model
Model TTFT Avg ms meter_envoy_ai_gw_model_ttft_avg Average TTFT by model
Model TPOT Avg ms meter_envoy_ai_gw_model_tpot_avg Average inter-token latency by model

Instance-level (per-pod) aggregate metrics:

Same metrics as service-level but scoped to individual pods via expSuffix: service([...]).instance([...]).

Monitoring Panel Unit Metric Name Description
Request CPM count/min meter_envoy_ai_gw_instance_request_cpm Requests per minute per pod
Request Latency Avg ms meter_envoy_ai_gw_instance_request_latency_avg Average request duration per pod
Request Latency Percentile ms meter_envoy_ai_gw_instance_request_latency_percentile P50/P75/P90/P95/P99 per pod
Input Tokens Rate tokens/min meter_envoy_ai_gw_instance_input_token_rate Input tokens per minute per pod
Output Tokens Rate tokens/min meter_envoy_ai_gw_instance_output_token_rate Output tokens per minute per pod
Total Tokens Rate tokens/min meter_envoy_ai_gw_instance_total_token_rate Total tokens per minute per pod
TTFT Avg ms meter_envoy_ai_gw_instance_ttft_avg Average TTFT per pod
TTFT Percentile ms meter_envoy_ai_gw_instance_ttft_percentile P50/P75/P90/P95/P99 TTFT per pod
TPOT Avg ms meter_envoy_ai_gw_instance_tpot_avg Average inter-token latency per pod
TPOT Percentile ms meter_envoy_ai_gw_instance_tpot_percentile P50/P75/P90/P95/P99 TPOT per pod
Estimated Cost cost/min meter_envoy_ai_gw_instance_estimated_cost Estimated cost per minute per pod

Per-provider breakdown metrics (instance scope):

Monitoring Panel Unit Metric Name Description
Provider Request CPM count/min meter_envoy_ai_gw_instance_provider_request_cpm Requests per minute by provider per pod
Provider Token Usage tokens/min meter_envoy_ai_gw_instance_provider_token_rate Token rate by provider per pod
Provider Latency Avg ms meter_envoy_ai_gw_instance_provider_latency_avg Average latency by provider per pod

Per-model breakdown metrics (instance scope):

Monitoring Panel Unit Metric Name Description
Model Request CPM count/min meter_envoy_ai_gw_instance_model_request_cpm Requests per minute by model per pod
Model Token Usage tokens/min meter_envoy_ai_gw_instance_model_token_rate Token rate by model per pod
Model Latency Avg ms meter_envoy_ai_gw_instance_model_latency_avg Average latency by model per pod
Model TTFT Avg ms meter_envoy_ai_gw_instance_model_ttft_avg Average TTFT by model per pod
Model TPOT Avg ms meter_envoy_ai_gw_instance_model_tpot_avg Average inter-token latency by model per pod

Cost Estimation

Reuse the same gen-ai-config.yml pricing configuration from PR #13745. The MAL rules will:

  1. Keep total token counts (input + output) per model from gen_ai_client_token_usage.
  2. Look up per-million-token pricing from config.
  3. Compute estimated_cost = input_tokens × input_cost_per_m / 1_000_000 + output_tokens × output_cost_per_m / 1_000_000.
  4. Amplify by 10^6 (same as PR #13745) to avoid floating point precision issues.

No new MAL function is needed — standard arithmetic operations on counters/gauges are sufficient.

Metrics vs Access Logs for Token Cost

Both data sources provide token counts, but serve different cost analysis purposes:

Aspect OTLP Metrics (MAL) Access Logs (LAL)
Granularity Aggregated counters — token sums over time windows Per-request — exact token count for each individual call
Cost output Cost rate (e.g., $X/minute) — good for trends and capacity planning Cost per request (e.g., this call cost $0.03) — good for attribution and audit
Precision Approximate (counter deltas over scrape intervals) Exact (individual request values)
Use case Dashboard trends, billing estimates, provider comparison Detect expensive individual requests, cost anomaly alerting, per-user/per-session attribution

The metrics path provides aggregated cost trends. The access log path enables per-request cost analysis — for example, alerting on a single request that consumed an unusually large number of tokens (e.g., a runaway prompt). Both paths reuse the same gen-ai-config.yml pricing data.

4. Access Log Collection via OTLP

The AI Gateway natively supports an OTLP access log sink. When OTEL_LOGS_EXPORTER=otlp (or defaulting to OTLP when OTEL_EXPORTER_OTLP_ENDPOINT is set), Envoy pushes structured access logs directly via OTLP gRPC to the same endpoint as metrics. No FluentBit or external log collector is needed.

AI Gateway Configuration

The OTLP log sink shares the same GatewayConfig CRD env vars as metrics (see Section 2). OTEL_LOGS_EXPORTER=otlp and OTEL_EXPORTER_OTLP_ENDPOINT enable the log sink. The OTEL_RESOURCE_ATTRIBUTES (including job_name, service.instance.id, and service.layer) are injected as resource attributes on each OTLP log record, ensuring consistency between metrics and access logs.

Additionally, enable token metadata population in AIGatewayRoute so token counts appear in access logs:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
spec:
  llmRequestCosts:
    - metadataKey: llm_input_token
      type: InputToken
    - metadataKey: llm_output_token
      type: OutputToken
    - metadataKey: llm_total_token
      type: TotalToken

OTLP Log Record Structure (Verified)

Each access log record is pushed as an OTLP LogRecord with the following structure:

Resource attributes (from OTEL_RESOURCE_ATTRIBUTES + Envoy metadata):

Attribute Example Notes
job_name envoy-ai-gateway From OTEL_RESOURCE_ATTRIBUTES — MAL/LAL routing tag
service.instance.id aigw-pod-7b9f4d8c5 From OTEL_RESOURCE_ATTRIBUTES — SkyWalking instance name
service.name envoy-ai-gateway From OTEL_SERVICE_NAME — SkyWalking service name for logs
node_name default-aigw-run-85f8cf28 Envoy node identifier
cluster_name default/aigw-run Envoy cluster name

Log record attributes (per-request, LLM traffic):

Attribute Example Description
gen_ai.request.model llama3.2:latest Original requested model
gen_ai.response.model llama3.2:latest Actual model from response
gen_ai.provider.name openai Backend provider name
gen_ai.usage.input_tokens 31 Input token count
gen_ai.usage.output_tokens 4 Output token count
session.id sess-abc123 Session identifier (if set via header mapping)
response_code 200 HTTP status code
duration 1835 Request duration (ms)
request.path /v1/chat/completions API path
connection_termination_details - Envoy connection termination reason
upstream_transport_failure_reason - Upstream failure reason

Note: total_tokens is not a separate field in the OTLP log — it equals input_tokens + output_tokens and can be computed in LAL rules. connection_termination_details and upstream_transport_failure_reason serve as error/timeout indicators (replacing response_flags from the file-based log format).

Log record attributes (per-request, MCP traffic):

Attribute Example Description
mcp.method.name tools/call MCP method name
mcp.provider.name kiwi MCP provider identifier
jsonrpc.request.id 1 JSON-RPC request ID
mcp.session.id sess-xyz MCP session ID

LAL Rules — Sampling Policy

Create oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml to process the OTLP access logs.

Sampling strategy: Not all access logs need to be stored — only those that indicate abnormal or expensive requests. The LAL rules apply the following sampling policy:

  1. High token cost — persist logs where input_tokens + output_tokens >= threshold (default 10,000).
  2. Error responses — always persist logs with response_code >= 400.
  3. Slow/timeout requests — always persist logs where duration exceeds a configurable timeout threshold, or where connection_termination_details / upstream_transport_failure_reason indicate upstream failures. LLM requests are inherently slow (especially streaming), so timeout sampling is important for diagnosing provider availability issues.

All other access logs are dropped to avoid storage bloat.

Industry token usage reference (from OpenRouter State of AI 2025, 100 trillion token study):

Use Case Avg Input Tokens Avg Output Tokens Avg Total
Simple chat/Q&A 500–1,000 200–400 ~1,000
Customer support 500–3,000 300–400 ~2,500
RAG applications 3,000–4,000 300–500 ~3,500
Programming/code 6,000–20,000+ 400–1,500 ~10,000+
Overall average (2025) ~6,000 ~400 ~6,400

Note: The overall average is heavily skewed by programming workloads. Non-programming use cases (chat, RAG, support) typically fall in the 1,000–3,500 total token range.

Default sampling threshold: 10,000 total tokens (configurable). This is approximately 3× the non-programming median (~3,000), which captures genuinely expensive or abnormal requests without logging every routine call. The threshold is configurable to accommodate different workload profiles:

  • Lower (e.g., 5,000) for chat-heavy deployments where most requests are short.
  • Higher (e.g., 30,000) for code-generation-heavy deployments where large prompts are normal.

The LAL rules would:

  1. Extract AI metadata (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, gen_ai.provider.name) from OTLP log record attributes.
  2. Compute total_tokens = input_tokens + output_tokens.
  3. Associate logs with the gateway service and instance using resource attributes (service.name, service.instance.id) in the ENVOY_AI_GATEWAY layer.
  4. Apply sampling: persist only logs matching at least one of:
    • total_tokens >= 10,000 (configurable threshold)
    • response_code >= 400
    • duration >= timeout_threshold or non-empty upstream_transport_failure_reason

5. UI Dashboard

OAP side — Create dashboard JSON templates under oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/:

  • envoy-ai-gateway-root.json — Root list view of all AI Gateway services.
  • envoy-ai-gateway-service.json — Service dashboard: Request CPM, latency, token rates, TTFT, TPOT, estimated cost, with provider and model breakdown panels.
  • envoy-ai-gateway-instance.json — Instance (pod) level dashboard: Same aggregate metrics as service dashboard but scoped to a single pod, plus per-provider and per-model breakdown panels for that pod.

UI side — A separate PR in skywalking-booster-ui is needed for i18n menu entries (similar to skywalking-booster-ui#534 for Virtual GenAI). The menu entry should be added under the infrastructure/gateway category.

Imported Dependencies libs and their licenses.

No new dependency. The AI Gateway pushes both metrics and access logs via OTLP to SkyWalking’s existing otel-receiver.

Compatibility

  • New layer ENVOY_AI_GATEWAY — no breaking change, additive only.
  • New MAL rules — opt-in via configuration.
  • New LAL rules for OTLP access logs — opt-in via configuration.
  • Reuses existing gen-ai-config.yml for cost estimation (shared with agent-based GenAI from PR #13745).
  • No changes to query protocol or storage structure — uses existing meter and log storage.
  • No external log collector (FluentBit, etc.) required — access logs are pushed via OTLP.

General usage docs

Prerequisites

  • Envoy AI Gateway deployed with the GatewayConfig CRD configured (see Section 2 for the full env var setup including OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES).

Step 1: Configure Envoy AI Gateway

Apply the GatewayConfig CRD from Section 2 to your AI Gateway deployment. Key env vars:

Env Var Value Purpose
OTEL_SERVICE_NAME $(GATEWAY_NAME) SkyWalking service name (per-deployment, auto-resolved from Gateway CRD name)
OTEL_EXPORTER_OTLP_ENDPOINT http://skywalking-oap:11800 SkyWalking OAP OTLP receiver
OTEL_EXPORTER_OTLP_PROTOCOL grpc OTLP transport
OTEL_METRICS_EXPORTER otlp Enable OTLP metrics push
OTEL_LOGS_EXPORTER otlp Enable OTLP access log push
GATEWAY_NAME (auto from label) Auto-resolved from pod label gateway.envoyproxy.io/owning-gateway-name
POD_NAME (auto from Downward API) Auto-resolved from pod name
OTEL_RESOURCE_ATTRIBUTES job_name=envoy-ai-gateway,service.instance.id=$(POD_NAME),service.layer=ENVOY_AI_GATEWAY Routing tag (fixed) + instance ID (auto) + layer for LAL routing

Step 2: Configure SkyWalking OAP

Enable the OTel receiver, MAL rules, and LAL rules in application.yml:

receiver-otel:
  selector: ${SW_OTEL_RECEIVER:default}
  default:
    enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"}
    enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"envoy-ai-gateway"}

log-analyzer:
  selector: ${SW_LOG_ANALYZER:default}
  default:
    lalFiles: ${SW_LOG_LAL_FILES:"envoy-ai-gateway"}

Cost Estimation

Update gen-ai-config.yml with pricing for the models served through the AI Gateway. The same config file is shared with agent-based GenAI monitoring.

Appendix A: OTLP Payload Verification

The following data was verified by capturing raw OTLP payloads from the AI Gateway (envoyproxy/ai-gateway-cli:latest Docker image) via an OTel Collector debug exporter.

Resource Attributes

With OTEL_RESOURCE_ATTRIBUTES=service.instance.id=test-instance-456 and OTEL_SERVICE_NAME=aigw-test-service:

Attribute Value Notes
service.instance.id test-instance-456 Set via OTEL_RESOURCE_ATTRIBUTESconfirmed working
service.name aigw-test-service Set via OTEL_SERVICE_NAME env var
telemetry.sdk.language go SDK metadata
telemetry.sdk.name opentelemetry SDK metadata
telemetry.sdk.version 1.40.0 SDK metadata

Not present by default (without explicit env config): service.instance.id, job_name, service.layer, host.name. These must be explicitly set via OTEL_RESOURCE_ATTRIBUTES in the GatewayConfig CRD (see Section 2).

resource.WithFromEnv() (source: internal/metrics/metrics.go:35-94) is called inside a conditional block that requires OTEL_EXPORTER_OTLP_ENDPOINT to be set. When configured, OTEL_RESOURCE_ATTRIBUTES is fully honored.

Metric-Level Attributes (Labels)

All 4 metrics carry:

Label Example Value Notes
gen_ai.operation.name chat Operation type
gen_ai.original.model llama3.2:latest Original model from request
gen_ai.provider.name openai Backend provider name. In K8s mode with explicit backend routing, this is the configured backend name.
gen_ai.request.model llama3.2:latest Requested model
gen_ai.response.model llama3.2:latest Model from response
gen_ai.token.type input / output / cached_input / cache_creation_input Only on gen_ai.client.token.usage. No total value — total must be computed. cached_input and cache_creation_input are for Anthropic-style prompt caching.

Metric Names and Types

OTLP Metric Name Type Unit Temporality
gen_ai.client.token.usage Histogram (not Counter!) token Delta
gen_ai.server.request.duration Histogram s (seconds, not ms!) Delta
gen_ai.server.time_to_first_token Histogram s Delta (streaming only)
gen_ai.server.time_per_output_token Histogram s Delta (streaming only)

Key findings:

  1. Token usage is a Histogram, not a Counter — Sum/Count/Min/Max available per bucket.
  2. Duration is in seconds — MAL rules must multiply by 1000 for ms display.
  3. Temporality is Delta — MAL needs increase() semantics, not rate().
  4. TTFT and TPOT only appear for streaming requests — non-streaming produces only token.usage + request.duration.
  5. Dots in metric names — OTLP uses dots (gen_ai.client.token.usage), Prometheus converts to underscores.

Histogram Bucket Boundaries (verified from source: internal/metrics/genai.go)

Token usage (14 boundaries, power-of-4): 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864

Request duration (14 boundaries, power-of-2 seconds): 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92

TTFT (21 boundaries, finer granularity for streaming): 0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 20.0, 30.0, 45.0, 60.0

TPOT (13 boundaries, finest granularity): 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5

Impact on Implementation

Finding Impact
No service.instance.id by default OTEL_RESOURCE_ATTRIBUTES=service.instance.id=<value> works when OTLP exporter is configured (verified). MAL rules should treat instance as optional and document OTEL_RESOURCE_ATTRIBUTES configuration.
gen_ai.provider.name = backend name In K8s mode with explicit backend config, this is the configured backend name.
Token usage is Histogram MAL uses histogram sum/count, not counter value.
Delta temporality SkyWalking OTel receiver must handle delta-to-cumulative conversion.
Duration in seconds MAL rules multiply by 1000 for ms-based metrics.
TTFT/TPOT streaming-only Dashboard should note these metrics may be absent for non-streaming workloads.

Bonus: Traces Also Pushed

The AI Gateway also pushes OpenInference traces via OTLP, including full request/response content in span attributes (llm.input_messages, llm.output_messages, llm.token_count.*). This is a potential future integration point but out of scope for this SWIP.

Appendix B: Raw OTLP Metric Data (Verified)

Captured from OTel Collector debug exporter. This is the actual OTLP payload from envoyproxy/ai-gateway-cli:latest.

Resource Attributes

Resource SchemaURL: https://opentelemetry.io/schemas/1.39.0
Resource attributes:
     -> service.instance.id: Str(test-instance-456)
     -> service.name: Str(aigw-test-service)
     -> telemetry.sdk.language: Str(go)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.40.0)

OTEL_RESOURCE_ATTRIBUTES=service.instance.id=<value> is honored when an OTLP exporter is configured (i.e., OTEL_EXPORTER_OTLP_ENDPOINT is set). Without an OTLP endpoint, the resource block is skipped and only the Prometheus reader is used (which does not carry resource attributes per-metric).

InstrumentationScope

ScopeMetrics SchemaURL:
InstrumentationScope envoyproxy/ai-gateway

Metric 1: gen_ai.client.token.usage (input tokens)

Name: gen_ai.client.token.usage
Description: Number of tokens processed.
Unit: token
DataType: Histogram
AggregationTemporality: Delta

Data point attributes:
     -> gen_ai.operation.name: Str(chat)
     -> gen_ai.original.model: Str(llama3.2:latest)
     -> gen_ai.provider.name: Str(openai)
     -> gen_ai.request.model: Str(llama3.2:latest)
     -> gen_ai.response.model: Str(llama3.2:latest)
     -> gen_ai.token.type: Str(input)
Count: 1
Sum: 31.000000
Min: 31.000000
Max: 31.000000
ExplicitBounds: [1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864]

Metric 1b: gen_ai.client.token.usage (output tokens)

Data point attributes:
     -> gen_ai.token.type: Str(output)
     (other attributes same as above)
Count: 1
Sum: 3.000000

Metric 2: gen_ai.server.request.duration

Name: gen_ai.server.request.duration
Description: Generative AI server request duration such as time-to-last byte or last output token.
Unit: s
DataType: Histogram
AggregationTemporality: Delta

Data point attributes:
     -> gen_ai.operation.name: Str(chat)
     -> gen_ai.original.model: Str(llama3.2:latest)
     -> gen_ai.provider.name: Str(openai)
     -> gen_ai.request.model: Str(llama3.2:latest)
     -> gen_ai.response.model: Str(llama3.2:latest)
Count: 1
Sum: 10.432428
ExplicitBounds: [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92]

Metric 3: gen_ai.server.time_to_first_token (streaming only)

Name: gen_ai.server.time_to_first_token
Description: Time to receive first token in streaming responses.
Unit: s
DataType: Histogram
AggregationTemporality: Delta
(Same attributes as request.duration, excluding gen_ai.token.type)
ExplicitBounds (from source code): [0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5,
                                     0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 20.0, 30.0, 45.0, 60.0]

Metric 4: gen_ai.server.time_per_output_token (streaming only)

Name: gen_ai.server.time_per_output_token
Description: Time per output token generated after the first token for successful responses.
Unit: s
DataType: Histogram
AggregationTemporality: Delta
(Same attributes as request.duration, excluding gen_ai.token.type)
ExplicitBounds (from source code): [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5,
                                     0.75, 1.0, 2.5]

Appendix C: Access Log Format (from Envoy Config Dump)

The AI Gateway auto-configures two access log entries on the listener (one for LLM, one for MCP). Verified from config_dump of the AI Gateway.

LLM Access Log Format (JSON)

Filter: request.headers['x-ai-eg-model'] != '' (only logs requests processed by the AI Gateway ext_proc)

{
  "start_time": "%START_TIME%",
  "method": "%REQ(:METHOD)%",
  "request.path": "%REQ(:PATH)%",
  "x-envoy-origin-path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
  "response_code": "%RESPONSE_CODE%",
  "duration": "%DURATION%",
  "bytes_received": "%BYTES_RECEIVED%",
  "bytes_sent": "%BYTES_SENT%",
  "user-agent": "%REQ(USER-AGENT)%",
  "x-request-id": "%REQ(X-REQUEST-ID)%",
  "x-forwarded-for": "%REQ(X-FORWARDED-FOR)%",
  "x-envoy-upstream-service-time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
  "upstream_host": "%UPSTREAM_HOST%",
  "upstream_cluster": "%UPSTREAM_CLUSTER%",
  "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
  "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
  "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
  "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
  "connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%",
  "gen_ai.request.model": "%REQ(X-AI-EG-MODEL)%",
  "gen_ai.response.model": "%DYNAMIC_METADATA(io.envoy.ai_gateway:model_name_override)%",
  "gen_ai.provider.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:backend_name)%",
  "gen_ai.usage.input_tokens": "%DYNAMIC_METADATA(io.envoy.ai_gateway:llm_input_token)%",
  "gen_ai.usage.output_tokens": "%DYNAMIC_METADATA(io.envoy.ai_gateway:llm_output_token)%",
  "session.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:session.id)%"
}

Code review corrections (source: internal/metrics/genai.go, examples/access-log/basic.yaml, site/docs/capabilities/observability/accesslogs.md):

  • response_flags (%RESPONSE_FLAGS%) IS documented in AI Gateway access log docs and used in tests, but not in the default config. Can be added via EnvoyProxy resource if needed.
  • gen_ai.usage.total_tokens IS supported via %DYNAMIC_METADATA(io.envoy.ai_gateway:llm_total_token)% when AIGatewayRoute.spec.llmRequestCosts includes type: TotalToken.
  • Access log format is user-configurable via EnvoyProxy resource, not hardcoded by the AI Gateway. The AI Gateway only populates dynamic metadata; users define which fields appear in logs.
  • Additional token cost types beyond input/output/total: CachedInputToken and CacheCreationInputToken (for Anthropic-style prompt caching, stored as llm_cached_input_token and llm_cache_creation_input_token in dynamic metadata).

MCP Access Log Format (JSON)

Filter: request.headers['x-ai-eg-mcp-backend'] != ''

{
  "start_time": "%START_TIME%",
  "method": "%REQ(:METHOD)%",
  "request.path": "%REQ(:PATH)%",
  "response_code": "%RESPONSE_CODE%",
  "duration": "%DURATION%",
  "mcp.method.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_method)%",
  "mcp.provider.name": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_backend)%",
  "mcp.session.id": "%REQ(MCP-SESSION-ID)%",
  "jsonrpc.request.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:mcp_request_id)%",
  "session.id": "%DYNAMIC_METADATA(io.envoy.ai_gateway:session.id)%"
}