DSL Debug API — OAL
Status: shipped. Operator reference for the OAL slice of the DSL Debug API. Design: SWIP-13. Index of related pages: DSL Debug API overview.
What it captures
OAL’s gate is per-metric: each generated metric (e.g.
service_relation_server_cpm) has its own gate holder, and a debug
session attaches to one metric. Sibling rules under the same source
dispatcher stay silent.
A session against (catalog=oal, name=<file>, ruleName=<metric>)
captures every source event walked through that one metric’s pipeline.
Every event produces one record; within a record, each probe stage
appends one sample:
nodes[]
records[]
startedAtMs — record boundary timestamp (ms)
dsl — verbatim per-metric `.oal` line
rule — { ruleName, sourceLine }
samples[]
type — input | filter | aggregation | output
sourceText — verbatim ANTLR slice from the `.oal` file
continueOn — true = pipeline continued past this step
payload — { type, scope, fields | timeBucket | … }
sourceLine — 1-based line in the source `.oal` file
Sample types and the probes that emit them:
type |
Probe | Fired when |
|---|---|---|
input |
captureSource |
A source event arrives at the metric’s pipeline. Payload = ISource.toJson() with all source columns. |
filter |
captureFilter |
An OAL .filter(...) clause runs. Both kept (continueOn=true) and rejected (continueOn=false) branches are captured. |
aggregation |
captureAggregation |
The metric’s aggregation function runs (cpm(), percentile2(10), …). Carries the post-aggregation source view. |
output |
captureEmit |
The metric is emitted to the persistence pipeline (terminal). Payload = Metrics.toJson() with count / total / value / timeBucket etc. |
sourceText is the verbatim ANTLR slice
Pulled at .oal parse time via
ctx.getStart().getInputStream().getText(Interval.of(...)). Whitespace
and identifier spelling are byte-identical to the source file:
input.sourceText— the source clause (e.g.from(ServiceRelation.*)).filter.sourceText— the filter clause including the leading.(the dot is part of the slice, e.g..filter(detectPoint == DetectPoint.SERVER)).aggregation.sourceText— the aggregate function call (e.g.cpm()/percentile2(10)).output.sourceText— the metric name.
Operators can grep the captured sourceText against the original .oal
file directly.
Both kept and rejected filter branches
Unlike MAL (where rejected executions are dropped to avoid tag-cardinality
noise), OAL captures both filter branches. OAL filters are deterministic
discriminators — CLIENT-vs-SERVER, layer matchers, status / latency
predicates — and seeing the rejected source samples (continueOn=false)
is the filter doing its job in plain view, useful for verifying partition
logic.
When no session is bound, the codegen-emitted probe call sites are single volatile-bool reads — idle cost is effectively free.
Enabling
Two selectors must be enabled — the shared admin HTTP host (admin-server)
and the DSL-debug feature on top of it:
SW_ADMIN_SERVER=default
SW_DSL_DEBUGGING=default
injectionEnabled is a boot-time codegen switch, default true once the
dsl-debugging module is enabled — the OAL dispatcher template emits
per-metric GateHolder fields and probe call sites, so debug sessions
capture samples. Set false only if the REST surface is wanted but no
codegen-side probe overhead is acceptable; with false the OAL bytecode is
byte-identical to a build without SWIP-13. Flipping the flag requires an
OAP restart:
SW_DSL_DEBUGGING_INJECTION_ENABLED=false # default is true; set false to disable probes
SECURITY: capture payloads include source-event contents (service names, endpoint names, span attributes). Treat the admin port as authenticated infrastructure — see Admin API readme — Security Notice.
Picking the rule key
A session targets one OAL metric. The key tuple is
(catalog=oal, name=<file>, ruleName=<metric>):
| Field | Source |
|---|---|
catalog |
oal |
name |
The .oal file the metric is declared in (e.g. core.oal). |
ruleName |
The metric name on the LHS of the = (e.g. service_relation_server_cpm). |
To list the metrics loaded on a node, query
GET /runtime/oal/files — each file’s response lists its metrics. The
same endpoint also exposes the rules registered against each source —
useful when picking a metric whose filter clauses you want to inspect.
End-to-end example
The shipped core.oal declares service_relation_server_cpm against
ServiceRelation with a SERVER detect-point filter:
service_relation_server_cpm = from(ServiceRelation.*)
.filter(detectPoint == DetectPoint.SERVER).cpm();
1. Open a debug session
curl -s -X POST \
'http://OAP:17128/dsl-debugging/session?catalog=oal&name=core.oal&ruleName=service_relation_server_cpm&clientId=alice'
2. Drive ingest
Send any agent traffic that produces inter-service spans (HTTP between
two SkyWalking-instrumented services, gRPC, etc.). The dispatcher fires
the metric’s pipeline on every ServiceRelation source event.
3. Poll
curl -s 'http://OAP:17128/dsl-debugging/session/SESSION_ID'
A trimmed slice (one record = one source event):
{
"ruleKey": { "catalog": "oal", "name": "core.oal",
"ruleName": "service_relation_server_cpm" },
"nodes": [{
"nodeId": "0.0.0.0_11800",
"status": "ok",
"records": [{
"startedAtMs": 1778115085149,
"dsl": "service_relation_server_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();",
"rule": { "ruleName": "service_relation_server_cpm", "sourceLine": "30" },
"samples": [
{ "type": "input",
"sourceText": "from(ServiceRelation.*)",
"continueOn": true,
"payload": {
"type": "ServiceRelation", "scope": 4,
"fields": {
"sourceServiceName": "e2e-service-consumer",
"destServiceName": "e2e-service-provider",
"detectPoint": "SERVER",
"endpoint": "POST:/users",
"componentId": 1, "latency": 962, "status": true,
"httpResponseStatusCode": 200,
"timeBucket": 202605070051
}
},
"sourceLine": 30 },
{ "type": "filter",
"sourceText": ".filter(detectPoint == DetectPoint.SERVER)",
"continueOn": true,
"payload": { "type": "ServiceRelation", "fields": { /* same row */ } },
"sourceLine": 30 },
{ "type": "aggregation",
"sourceText": "cpm()",
"continueOn": true,
"payload": { "type": "ServiceRelationServerCpmMetrics",
"timeBucket": 202605070051,
"count": 1, "total": 1, "value": 1 },
"sourceLine": 30 }
]
}]
}]
}
A single source event commonly produces a rejected filter sample on
sibling rules (e.g. the CLIENT-detect-point sibling). The rejected
sample shows continueOn=false and no aggregation/output follows —
the metric’s pipeline stopped at the filter.
4. Stop
curl -s -X POST 'http://OAP:17128/dsl-debugging/session/SESSION_ID/stop'
Cluster behaviour
- Install broadcasts to every reachable peer; each peer attaches its own recorder to its dispatcher.
- Collect broadcasts and concatenates per-node slices.
- Stop broadcasts; missed acks fall out via retention timeout.
No cross-node merge — each peer’s slice is self-contained.
Failure modes
| Response | Meaning |
|---|---|
400 invalid_catalog |
Catalog must be oal. |
400 missing_param |
name or ruleName is missing. |
404 rule_not_found |
No metric for (name, ruleName) on this node — typo, or no .oal rule loaded. |
503 injection_disabled |
injectionEnabled=false. Restart with the flag on to debug. |
Limits
| Field | Default | Purpose |
|---|---|---|
recordCap |
1000 |
Max records before the recorder refuses appends. |
retentionMillis |
300000 (5m) |
Wall-clock retention. |
Override per-session in the install body:
{ "recordCap": 200, "retentionMillis": 600000 }
See also
- DSL Debug API — MAL
- DSL Debug API — LAL
- SWIP-13 — full design.