Security Notice

The SkyWalking OAP server, UI, and agent deployments should run in a secure environment, such as only inside your data center. OAP server, UI, and agent deployments should only be reachable by the operation team on default deployment.

All telemetry data are trusted. The OAP server would not validate any field of the telemetry data to avoid extra load for the server. Every field of every telemetry category should be validated by the operator before it reaches OAP — none are inherently safer than the others.

Examples of surfaces that routinely carry attacker-controllable strings (non-exhaustive):

  • Metrics: metric names, label keys, label values.
  • Traces: span operation names, span tags (keys and values), span logs / events, endpoint and peer identifiers.
  • Logs: log body, structured fields.
  • Profiling: profiling results (eBPF / async-profiler / JFR samples), captured stack frames, symbol names.
  • **HTTP capture **: HTTP request and response bodies, headers, query strings, and dumps collected by agent-side body-capture profiling plugins.

A request URI, a header value, an exception message from poisoned input, or any other free-form string an instrumented application happens to attach to any of the above will reach OAP and the UI verbatim. The list grows with every new feature; the operator contract is “validate everything,” not “validate this enumerated set.”

It is up to the operator(OPS team) whether to expose the OAP server, UI, or some agent deployment to unsecured environment. The following security policies should be considered to add to secure your SkyWalking deployment.

  1. HTTPs and gRPC+TLS should be used between agents and OAP servers, as well as UI.
  2. Set up TOKEN or username/password based authentications for the OAP server and UI through your Gateway.
  3. Validate all fields of the traceable RPC(including HTTP 1/2, MQ) headers(header names are sw8, sw8-x and sw8-correlation) when requests are from out of the trusted zone. Or simply block/remove those headers unless you are using the client-js agent.
  4. All fields of telemetry data should be validated and rejected when malicious — in both HTTP raw-text and encoded Protobuf transports. The scope is every category an agent can emit (metrics, traces, logs, profiling results, HTTP capture / debugging dumps, and any future telemetry surface), and every field within each category. Treat the list above as examples; the rule is “validate every field,” not “validate the ones we enumerated.” None of these surfaces are inherently safer than the others.
  5. Build a validation layer between agents and OAP as a security enhancement. The recommended deployment shape is an operator-controlled gateway / sidecar / service mesh that authenticates the source, enforces rate limits, and validates / sanitises every telemetry category before forwarding to OAP. Several security vendors offer commercial implementations of this layer; the OAP itself does not perform that validation.

Without these protections, an attacker could embed executable Javascript code in any of those fields, causing XSS or even Remote Code Execution (RCE) issues.

For some sensitive environment, consider to limit the telemetry report frequency in case of DoS/DDoS for exposed OAP and UI services.

Admin API Surface (ports 17128 + 17129)

The admin-server module hosts a shared HTTP port (default 17128) for admin / on-demand write APIs. Today this includes the Runtime Rule Hot-Update API (add, override, inactivate, delete MAL/LAL rule files at runtime — rules are compiled and loaded into the OAP JVM on the fly) and the DSL Debug API (sampling debugger that captures live MAL/LAL/OAL rule executions, including the raw payloads each rule processed). This surface is far more powerful than the telemetry receiver ports — a request can register new Javassist-compiled bytecode, mutate MeterSystem state, drop backend schema (BanyanDB measures), or inspect operationally-sensitive payloads (raw log bodies, parsed maps, metric source events).

admin-server also opens an admin-internal gRPC bus (default 17129) for peer-to-peer cluster RPCs between OAP nodes — runtime-rule Suspend / Resume / Forward, DSL debug install / collect / stop. This is intentionally a separate transport from the public agent / cluster gRPC port (core.gRPCPort, default 11800). An attacker who reaches 11800 to submit telemetry MUST NOT be able to invoke privileged admin RPCs.

admin-server is enabled by default so the bundled status and inspect feature modules are reachable out of the box. Both ports open with no built-in authentication — the design goal is a simple admin socket that a gateway / service mesh wraps with the operator’s existing auth story. Set SW_ADMIN_SERVER= (empty) to keep the host closed if no admin-side API is needed.

Required operator actions:

  1. Never expose port 17128 (HTTP) to the public internet. Bind to a private network interface or localhost and reach it through an operator-controlled gateway.
  2. Never expose port 17129 (admin gRPC) to operators or the agent network. Bind gRPCHost to a private peer-to-peer interface reachable ONLY between OAP nodes. This port has the same blast radius as port 17128; do not gateway-publish it.
  3. Gateway-protect 17128 with IP allow-list + authentication. Only the operator team should be able to reach the HTTP endpoint.
  4. Audit every request. Rule content is arbitrary YAML that compiles into the OAP JVM — a malicious rule could exfiltrate data, spike resource use, or create metric-name collisions. Captured DSL-debug payloads include sensitive operational state. Treat POST /runtime/rule/* and POST /dsl-debugging/* as equivalent to shell access on the OAP host.

Without these protections an attacker with network reach to port 17128 or 17129 can execute arbitrary code inside the OAP JVM. See docs/en/setup/backend/admin-api/readme.md for the full security notice and per-feature API surface.

Client-Side Monitoring

Client-side applications — iOS/iPadOS apps (via OpenTelemetry Swift SDK), browser web apps (via client-js), and WeChat/Alipay mini-programs (via mini-program-monitor) — send telemetry data from the public internet to OAP endpoints including OTLP/HTTP (/v1/traces, /v1/logs, /v1/metrics), SkyWalking native (/v3/segments), and browser reporting endpoints.

These endpoints accept data from any client without authentication by default. Apply the security policies listed above, especially rate limiting, to prevent abuse from untrusted client-side sources.