Virtual GenAI
Virtual GenAI represents the Generative AI service nodes detected by server agents’ plugins. The performance metrics of the GenAI operations are from the GenAI client-side perspective.
For example, a Spring AI plugin in the Java agent could detect the latency of a chat completion request. As a result, SkyWalking would show traffic, latency, success rate, token usage (input/output), and estimated cost in the GenAI dashboard.
Span Contract
The GenAI operation span should have the following properties:
- It is an Exit span
- Span’s layer == GENAI
- Tag key =
gen_ai.provider.name, value = The Generative AI provider, e.g. openai, anthropic, ollama - Tag key =
gen_ai.response.model, value = The name of the GenAI model, e.g. gpt-4o, claude-3-5-sonnet - Tag key =
gen_ai.usage.input_tokens, value = The number of tokens used in the GenAI input (prompt) - Tag key =
gen_ai.usage.output_tokens, value = The number of tokens used in the GenAI response (completion) - Tag key =
gen_ai.server.time_to_first_token, value = The duration in milliseconds until the first token is received (streaming requests only) - If the GenAI service is a remote API (e.g. OpenAI), the span’s peer should be the network address (IP or domain) of the GenAI server.
Provider Configuration
SkyWalking uses gen-ai-config.yml to map model names to providers and configure cost estimation.
When the gen_ai.provider.name tag is present in the span, it is used directly. Otherwise, SkyWalking matches the model name
against prefix-match rules to identify the provider. For example, a model name starting with gpt is mapped to openai.
To configure cost estimation, add models with pricing under the provider:
providers:
- provider: openai
prefix-match:
- gpt
models:
- name: gpt-4o
input-estimated-cost-per-m: 2.5 # estimated cost per 1,000,000 input tokens
output-estimated-cost-per-m: 10 # estimated cost per 1,000,000 output tokens
Metrics
The following metrics are available at the provider (service) level:
gen_ai_provider_cpm- Calls per minutegen_ai_provider_sla- Success rategen_ai_provider_resp_time- Average response timegen_ai_provider_latency_percentile- Latency percentilesgen_ai_provider_input_tokens_sum / avg- Input token usagegen_ai_provider_output_tokens_sum / avg- Output token usagegen_ai_provider_total_estimated_cost / avg_estimated_cost- Estimated cost
The following metrics are available at the model (service instance) level:
gen_ai_model_call_cpm- Calls per minutegen_ai_model_sla- Success rategen_ai_model_latency_avg / percentile- Latencygen_ai_model_ttft_avg / percentile- Time to first token (streaming only)gen_ai_model_input_tokens_sum / avg- Input token usagegen_ai_model_output_tokens_sum / avg- Output token usagegen_ai_model_total_estimated_cost / avg_estimated_cost- Estimated cost
Requirement
SkyWalking Java Agent version >= 9.7