SWIP-11 Support iOS App Monitoring via OpenTelemetry
Motivation
iOS (including iPadOS) is one of the most important client-side platforms. Monitoring iOS app performance — HTTP request latency, crash rates, app launch time — is as important as browser monitoring, which SkyWalking has supported since v8.x.
The OpenTelemetry Swift SDK (v2.3.0, tracing stable) provides auto-instrumentation for iOS apps including HTTP request tracing (URLSession), device/OS resource attributes, and Apple MetricKit integration. All data is exported via standard OTLP.
Unlike browser monitoring which requires a custom SkyWalking protocol (BrowserPerf.proto) and a dedicated
receiver plugin, the OTel Swift SDK speaks standard OTLP. SkyWalking already has an OTLP receiver, so this
feature primarily requires layer detection, a MetricKit span analyzer, LAL rules for crash diagnostics, and
UI dashboards.
This SWIP also establishes a Mobile menu group in the UI, preparing for future Android monitoring (via opentelemetry-android).
Architecture Graph
┌──────────────────────┐ ┌─────────────────────────────────────────────┐
│ iOS App │ OTLP/HTTP │ SkyWalking OAP │
│ + OTel Swift SDK │ ────────────────────> │ │
│ │ (port 4318) │ ┌───────────────────────────────────────┐ │
│ Instrumentation: │ │ │ otel-receiver │ │
│ • URLSession (auto) │ │ │ │ │
│ • MetricKit (auto) │ │ │ Trace Handler │ │
│ │ │ │ ├─ detect os.name=iOS → Layer.IOS │ │
│ Signals: │ │ │ ├─ HTTP spans → SpanForward → OAL │ │
│ 1. HTTP trace spans │ │ │ └─ MetricKit spans │ │
│ 2. MetricKit spans │ │ │ → iOSMetricKitAnalyzer (new) │ │
│ 3. Diagnostic logs │ │ │ → extract as metrics │ │
│ │ │ │ │ │
│ Resource attrs: │ │ │ Log Handler (modified) │ │
│ os.name=iOS │ │ │ ├─ pass resource attrs as │ │
│ device.model.id=... │ │ │ │ sourceAttributes on LogMetadata │ │
│ service.name=MyApp │ │ │ └─ LAL script determines layer │ │
│ (NO service.layer) │ │ │ from sourceAttribute("os.name") │ │
└──────────────────────┘ │ └───────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
No OTel Collector is required, though one can be used for buffering.
Key challenge: The OTel Swift SDK does not set service.layer or service.instance.id — and
this is common for most OTLP sources. Rather than hardcoding layer inference in the handler, this
SWIP introduces a general-purpose mechanism: sourceAttributes on LogMetadata + LAL script-level
layer assignment.
Proposed Changes
1. New Layer: IOS
Add in Layer.java:
/**
* iOS/iPadOS app monitoring via OpenTelemetry Swift SDK
*/
IOS(47, true),
Normal layer (isNormal=true) because the iOS app is directly instrumented.
2. Source Attributes on LogMetadata (General Enhancement)
OTLP resource attributes (e.g., os.name, device.model.identifier) are currently extracted by
OpenTelemetryLogHandler to read service.name, service.layer, service.instance.id, then
discarded. They are not passed into LogData tags and not available to LAL scripts.
This is a problem not only for iOS but for any OTLP source where service.layer is absent — the
LAL script has no information to determine the layer.
Solution: sourceAttributes on LogMetadata
Add a non-persistent sourceAttributes field to LogMetadata (Java bean, not proto):
@Data
@Builder
public class LogMetadata {
private String service;
private String serviceInstance;
private String endpoint;
private String layer;
private long timestamp;
@Builder.Default
private TraceContext traceContext = TraceContext.EMPTY;
/**
* Non-persistent attributes from the log source (e.g., OTLP resource attributes,
* ALS node context). Available to LAL scripts via sourceAttribute() but NOT stored
* in tagsRawData.
*/
@Builder.Default
private Map<String, String> sourceAttributes = Collections.emptyMap();
}
Why sourceAttributes not resourceAttributes: Different receivers have different source
contexts — OTLP has resource attributes, Envoy ALS has node info, etc. sourceAttributes is
generic.
Why on LogMetadata not LogData: LogData is a proto object (from Logging.proto). Its
tags field gets serialized into tagsRawData and persisted to storage. LogMetadata is a Java
bean used only as a transient carrier during LAL processing — adding fields here has no storage
impact.
Handler Change: OpenTelemetryLogHandler
Pass all resource attributes into LogMetadata.sourceAttributes:
// Existing: extract specific fields from resource attributes
final var service = attributes.get("service.name");
final var layer = attributes.getOrDefault("service.layer", "");
final var serviceInstance = attributes.getOrDefault("service.instance.id", "");
// New: pass ALL resource attributes as sourceAttributes
final var metadata = LogMetadata.builder()
.service(service)
.serviceInstance(serviceInstance)
.layer(layer)
.timestamp(logRecord.getTimeUnixNano() / 1_000_000)
.sourceAttributes(attributes) // <-- all resource attrs, non-persistent
.build();
logAnalyzerService().doAnalysis(metadata, logDataBuilder);
LAL DSL: sourceAttribute() Function
Add a new function to the LAL DSL that reads from LogMetadata.sourceAttributes:
sourceAttribute("os.name") → "iOS"
sourceAttribute("os.version") → "17.4.1"
sourceAttribute("device.model.identifier") → "iPhone15,2"
This is similar to tag() but reads from the non-persistent source context instead of LogData tags.
3. LAL Script-Level Layer Assignment (layer: auto)
Currently, layer in a LAL rule YAML serves as both a routing key (only rules matching the log’s
layer are evaluated) and output metadata. This creates a chicken-and-egg problem: a rule that wants
to SET the layer cannot be reached if the layer is absent.
Solution: layer: auto mode
A new layer: auto declaration indicates the layer is determined by the script. Rules with
layer: auto match logs where service.layer is absent (empty/unset). The script is expected to
set the layer in the extractor:
rules:
- name: ios-metrickit-diagnostics
layer: auto # layer determined by script; dropped if not set
dsl: |
filter {
// Determine if this is an iOS log
if (sourceAttribute("os.name") != "iOS" && sourceAttribute("os.name") != "iPadOS") {
abort {}
}
extractor {
layer IOS # LAL script sets the layer
// ...
}
sink { }
}
Drop policy: In auto mode, if the script does not set the layer (either because the script
aborted or because the extractor omitted layer), the log is warned and dropped at persistence.
layer: auto means “I take responsibility for setting the layer” — if no layer is set, it’s either
a non-matching log (abort) or a script bug (warn).
This enforces that every OTLP log source either:
- Sets
service.layerexplicitly (like Envoy AI Gateway), OR - Has a matching
layer: autoLAL rule that determines the layer from source attributes
Backward compatibility: Existing OTLP log sources that set service.layer are unaffected —
their logs have a concrete layer and are routed to layer-specific rules as before. layer: auto
rules only see logs with absent layer. The existing default.yaml rule (layer: GENERAL) continues
to catch logs that have layer = GENERAL.
4. Resource Attributes Available to LAL (via sourceAttribute())
The OTel Swift SDK sets the following resource attributes, all available via sourceAttribute():
| Resource Attribute | Example Value | Source |
|---|---|---|
os.name |
iOS, iPadOS, macOS |
UIDevice.current.systemName |
os.type |
darwin |
Hardcoded |
os.version |
17.4.1 |
ProcessInfo.operatingSystemVersion |
device.model.identifier |
iPhone15,2 |
sysctl(HW_MACHINE) |
service.name |
MyApp |
CFBundleName |
service.version |
2.1.0 (45) |
CFBundleShortVersionString + build |
telemetry.sdk.language |
swift |
Hardcoded |
5. OTLP Span Listener Mechanism (General Enhancement)
Currently, SpanForward hardcodes GenAI-specific logic (processGenAILogic()) inline. Adding iOS
MetricKit handling as another hardcoded case would be unmaintainable. This SWIP introduces a general
span listener mechanism to support extensible span-based metric extraction and trace persistence
control.
Current Problem
// OpenTelemetryTraceHandler.java — converts ALL spans to Zipkin first
Span zipkinSpan = convertSpan(span, serviceName, resourceTags);
result.add(zipkinSpan);
// ...
// SpanForward.java — hardcoded GenAI logic after Zipkin conversion
processGenAILogic(zipkinSpan); // GenAI metric extraction (hardcoded)
getReceiver().receive(zipkinSpan); // always persists trace
Problems:
- GenAI logic is hardcoded — adding iOS/Android/etc. would keep growing
- Listeners can’t see original OTLP structure (InstrumentationScope name, resource attributes as separate fields) — everything is already flattened into Zipkin tags
- Spans that should NOT be persisted (e.g., 24-hour MetricKit) still get converted to Zipkin format
Solution: OTLPSpanListener Interface — Before Zipkin Conversion
Listeners operate on the raw OTLP span + resource attributes, before Zipkin conversion. This gives listeners access to:
InstrumentationScopename and version (lost in Zipkin conversion)- Resource attributes as a separate map (not flattened with span attributes)
- Original OTLP span structure
/**
* Listener for OTLP spans. Called BEFORE Zipkin conversion.
* Implementations can:
* 1. Extract metrics or other data from spans
* 2. Modify resource/span attributes before Zipkin conversion
* 3. Control whether the span should be converted and persisted as a trace
*/
public interface OTLPSpanListener {
/**
* Process an OTLP span.
*
* @param span the raw OTLP span
* @param resourceAttributes resource-level attributes (service.name, os.name, etc.)
* @param scopeName InstrumentationScope name (e.g., "NSURLSession", "MetricKit")
* @param scopeVersion InstrumentationScope version
* @return result controlling persistence and tag modifications
*/
OTLPSpanListenerResult onSpan(
io.opentelemetry.proto.trace.v1.Span span,
Map<String, String> resourceAttributes,
String scopeName,
String scopeVersion
);
}
public class OTLPSpanListenerResult {
/** Whether this span should be converted to Zipkin and persisted. Default: true */
private boolean persistTrace = true;
/** Additional tags to inject before Zipkin conversion (e.g., estimated_cost) */
private Map<String, String> additionalTags = Collections.emptyMap();
/** Layer override — if set, the service is assigned this layer */
private Layer layer = null;
}
Revised Flow in OpenTelemetryTraceHandler
// OpenTelemetryTraceHandler.java — listeners BEFORE Zipkin conversion
for (io.opentelemetry.proto.trace.v1.Span span : scopeSpans.getSpansList()) {
boolean shouldPersist = true;
Map<String, String> extraTags = new HashMap<>();
Layer layerOverride = null;
for (OTLPSpanListener listener : spanListeners) {
OTLPSpanListenerResult result = listener.onSpan(
span, resourceTags, scopeName, scopeVersion);
if (!result.isPersistTrace()) {
shouldPersist = false;
}
extraTags.putAll(result.getAdditionalTags());
if (result.getLayer() != null) {
layerOverride = result.getLayer();
}
}
if (shouldPersist) {
// Merge extraTags into resourceTags before Zipkin conversion
resourceTags.putAll(extraTags);
Span zipkinSpan = convertSpan(span, serviceName, resourceTags);
result.add(zipkinSpan);
}
}
Registered Listeners
| Listener | Detects | Extracts | Persists? | Modifies? |
|---|---|---|---|---|
GenAISpanListener |
gen_ai.system or gen_ai.provider.name attribute |
Token metrics, cost → Sources | Yes | Yes (adds estimated_cost tag) |
IOSMetricKitSpanListener |
scopeName == "MetricKit" + span.name == "MXMetricPayload" |
Device stats → SampleFamily → shared MAL pipeline | No | No |
Listeners are registered via SPI (META-INF/services/) and loaded at handler initialization.
The existing processGenAILogic() is refactored into GenAISpanListener — no behavior change,
just better structure.
Key design points:
- Listeners see raw OTLP data — InstrumentationScope name, resource attributes as separate map
- Any listener can veto trace persistence — prevents Zipkin conversion entirely (no wasted work)
- Any listener can inject tags — merged before Zipkin conversion
- Multiple listeners can process the same span (e.g., a GenAI span on iOS triggers both)
- If ANY listener vetoes persistence, the span is not converted or stored
Note: No IOSLayerSpanListener is needed. The IOS layer is registered automatically
when the MAL expSuffix with Layer.IOS processes MetricKit metrics. The OTLP→Zipkin trace
pipeline (SpanForward) emits Zipkin-specific sources (not OAL sources), so there are no
OAL traffic metrics for OTLP traces.
6. Entity Model
| SkyWalking Entity | Source | Example |
|---|---|---|
| Service | service_name label in MAL expSuffix |
MyApp |
| Service Instance | service_instance_id label in MAL expSuffix |
2.1.0 |
No endpoint entity — MetricKit metrics are service/instance scoped only.
7. HTTP Span Processing (Trace Path)
HTTP spans from InstrumentationScope NSURLSession flow through the existing OTLP → Zipkin → SpanForward
trace pipeline. They are stored as Zipkin spans and queryable via the Zipkin query API.
Note: The OTLP→Zipkin trace pipeline (SpanForward) emits Zipkin-specific sources
(ZipkinService, ZipkinServiceSpan, ZipkinServiceRelation), not OAL sources. There are
no OAL traffic metrics (e.g., service_cpm, service_resp_time) generated from OTLP traces.
HTTP trace metrics for iOS may be added in the future via MAL extraction in a SpanListener.
OTLP Export Feedback Loop
The URLSession auto-instrumentation captures all HTTP calls including the OTLP export calls themselves. This creates an exponential feedback loop — validated in our POC: 4 real HTTP requests generated 41,213 spurious export spans.
Recommended mitigation (documented in user guide): Use the SDK’s shouldInstrument callback to
exclude the collector URL:
URLSessionInstrumentationConfiguration(
shouldInstrument: { request in
return request.url?.host != "<collector-host>"
}
)
8. Metrics Overview
iOS monitoring metrics come from MetricKit — daily aggregated device statistics delivered once per day per device via the OTel Swift SDK’s MetricKit instrumentation.
9. MetricKit Span Listener (IOSMetricKitSpanListener)
Apple’s MetricKit delivers pre-aggregated app statistics once per day. The OTel Swift SDK encodes
this as a single span with startTime = 24h ago, endTime = now, with all statistics as span
attributes. These are not trace spans — they must be intercepted and converted to metrics.
IOSMetricKitSpanListener implements the SpanListener SPI (Section 5):
- Detection:
scopeName == "MetricKit"ANDspan.spanName() == "MXMetricPayload"— uses the raw OTLP InstrumentationScope name, available because listeners run before Zipkin conversion - Action: Extract span attributes as
SampleFamilysamples with 4 labels (service_name,service_instance_id,device_model,os_version), push into the shared MAL pipeline viaOpenTelemetryMetricRequestProcessor.toMeter()— no duplicate rule loading - Persistence: Returns
shouldPersist = false— a 24-hour span must not be stored as a trace - Required module:
receiver-otel— the listener uses the otel-receiver’s MAL converters configured viaenabledOtelMetricsRules
MetricKit Source Attributes
| Span Attribute | Type | Unit | Description |
|---|---|---|---|
metrickit.app_launch.time_to_first_draw_average |
Double | seconds | Average time to first draw |
metrickit.app_responsiveness.hang_time_average |
Double | seconds | Average hang duration |
metrickit.cpu.cpu_time |
Double | seconds | Cumulative CPU time (24h) |
metrickit.memory.peak_memory_usage |
Double | bytes | Peak memory usage |
metrickit.network_transfer.wifi_download |
Double | bytes | WiFi download (24h) |
metrickit.network_transfer.wifi_upload |
Double | bytes | WiFi upload (24h) |
metrickit.network_transfer.cellular_download |
Double | bytes | Cellular download (24h) |
metrickit.network_transfer.cellular_upload |
Double | bytes | Cellular upload (24h) |
metrickit.app_exit.foreground.abnormal_exit_count |
Int | count | Abnormal exits (foreground crashes) — aggregated |
metrickit.app_exit.foreground.normal_app_exit_count |
Int | count | Normal foreground exits — not aggregated |
metrickit.app_exit.background.abnormal_exit_count |
Int | count | Background abnormal exits — aggregated |
metrickit.app_exit.background.normal_app_exit_count |
Int | count | Normal background exits — not aggregated |
metrickit.app_exit.background.memory_pressure_exit_count |
Int | count | OOM kills — aggregated |
metrickit.animation.scroll_hitch_time_ratio |
Double | ratio | Scroll jank ratio |
metrickit.gpu.time |
Double | seconds | Cumulative GPU time |
metrickit.diskio.logical_write_count |
Double | bytes | Disk writes (24h) |
metrickit.metadata.device_type |
String | — | Device model in MetricKit payload |
metrickit.metadata.os_version |
String | — | OS version in MetricKit payload |
Aggregation Strategy
MetricKit data is inherently daily — each device reports once per day. Multiple devices running the
same app produce multiple data points per day. The analyzer uses the span’s end time as the data
point timestamp with day-level time bucket (TimeBucket.getDayTimeBucket()).
Different metrics require different cross-device aggregation:
| Metric Category | Aggregation | Reasoning |
|---|---|---|
| Pre-averaged values (launch time, hang time) | longAvg |
Apple already averaged per-device; average across fleet |
| Peak values (memory) | max |
Want the worst-case device |
| Counts (crash count, exit count) | sum |
Total events across fleet |
| Cumulative volumes (network bytes, disk writes, CPU time) | sum |
Total fleet resource usage |
| Ratios (scroll hitch) | doubleAvg |
Fleet-wide average jank |
Span-to-Sample Conversion
The listener converts each MXMetricPayload span into labeled SampleFamily samples:
metrickit_app_launch_time{service_name="MyApp", service_instance_id="2.1.0", device_model="iPhone15,2", os_version="17.4.1"} 850
metrickit_hang_time{service_name="MyApp", service_instance_id="2.1.0", device_model="iPhone15,2", os_version="17.4.1"} 120
metrickit_peak_memory{service_name="MyApp", service_instance_id="2.1.0", device_model="iPhone15,2", os_version="17.4.1"} 157286400
metrickit_foreground_abnormal_exit_count{service_name="MyApp", service_instance_id="2.1.0", device_model="iPhone15,2", os_version="18.0"} 2
metrickit_wifi_download{service_name="MyApp", service_instance_id="2.1.0", device_model="iPhone15,2", os_version="17.4.1"} 52428800
Labels are extracted from:
service_name→ resource attributeservice.nameservice_instance_id→ resource attributeservice.version(instance fallback)device_model→ span attributemetrickit.metadata.device_typeor resource attributedevice.model.identifieros_version→ span attributemetrickit.metadata.os_versionor resource attributeos.version
MAL Rules
Create oap-server/server-starter/src/main/resources/otel-rules/ios/ios-metrickit.yaml:
expSuffix: service(['service_name'], Layer.IOS)
metricPrefix: meter_ios
metricsRules:
# App responsiveness — percentile across devices (P50 = median)
- name: app_launch_time_percentile
exp: metrickit_app_launch_time_histogram.sum(['service_name', 'service_instance_id', 'le']).histogram().histogram_percentile([50,75,90,95,99])
- name: hang_time_percentile
exp: metrickit_hang_time_histogram.sum(['service_name', 'service_instance_id', 'le']).histogram().histogram_percentile([50,75,90,95,99])
- name: hang_time_sum
exp: metrickit_hang_time.sum(['service_name', 'service_instance_id'])
# Stability — sum across devices. Foreground/background are reported separately so
# watchdog kills and background-task crashes don't hide behind total crash counts.
# Normal graceful exits are not aggregated (no diagnostic signal).
- name: foreground_abnormal_exit_count
exp: metrickit_foreground_abnormal_exit_count.sum(['service_name', 'service_instance_id'])
- name: background_abnormal_exit_count
exp: metrickit_background_abnormal_exit_count.sum(['service_name', 'service_instance_id'])
- name: background_oom_kill_count
exp: metrickit_background_oom_kill_count.sum(['service_name', 'service_instance_id'])
# Resource usage
- name: peak_memory
exp: metrickit_peak_memory.max(['service_name', 'service_instance_id'])
- name: cpu_time
exp: metrickit_cpu_time.sum(['service_name', 'service_instance_id'])
- name: gpu_time
exp: metrickit_gpu_time.sum(['service_name', 'service_instance_id'])
- name: disk_write
exp: metrickit_disk_write.sum(['service_name', 'service_instance_id'])
# Network transfer — average per device
- name: wifi_download
exp: metrickit_wifi_download.avg(['service_name', 'service_instance_id'])
- name: wifi_upload
exp: metrickit_wifi_upload.avg(['service_name', 'service_instance_id'])
- name: cellular_download
exp: metrickit_cellular_download.avg(['service_name', 'service_instance_id'])
- name: cellular_upload
exp: metrickit_cellular_upload.avg(['service_name', 'service_instance_id'])
# UI quality
- name: scroll_hitch_ratio
exp: metrickit_scroll_hitch_ratio.avg(['service_name', 'service_instance_id'])
The listener emits histogram-bucketed samples (with le labels) for app launch time and
hang time, enabling histogram_percentile to compute P50/P75/P90/P95/P99 across the device fleet.
Bucket ceiling: both histograms top out at a finite 30 s bucket rather than +Inf. MAL
parses le="Infinity" to (long) Double.POSITIVE_INFINITY = Long.MAX_VALUE and surfaces it
verbatim in percentile queries; on a dashboard that renders as ~9.2×10¹⁸, which is worse than
a visibly alarming but human-readable cap. Values above 30 s are vanishingly rare for iOS app
launch / hang observations (MetricKit itself hard-caps hangs near 30 s), so the finite sentinel
preserves percentile accuracy without breaking the UI.
Aggregation Example
Given 3 devices reporting on the same day for service “MyApp”:
Device A: appLaunchTime=850ms, peakMemory=150MB, abnormalExitCount=2, wifiDownload=50MB
Device B: appLaunchTime=1200ms, peakMemory=200MB, abnormalExitCount=0, wifiDownload=80MB
Device C: appLaunchTime=900ms, peakMemory=140MB, abnormalExitCount=1, wifiDownload=30MB
Resulting daily metrics:
| Metric | Aggregation | Result |
|---|---|---|
ios_app_launch_time_percentile P50 |
histogram_percentile | 900 ms |
ios_app_launch_time_percentile P90 |
histogram_percentile | 1200 ms |
ios_peak_memory |
max(150, 200, 140) | 200 MB |
ios_foreground_abnormal_exit_count |
sum(2, 0, 1) | 3 crashes |
ios_wifi_download |
avg(50, 80, 30) | 53 MB |
10. MetricKit Diagnostic Log Processing (LAL)
MetricKit diagnostic payloads arrive as OTLP log records with InstrumentationScope: MetricKit.
The diagnostic type is identified by the name log record attribute.
LogData Input to LAL
After the changes in Sections 2–3, the LogData seen by LAL for a crash diagnostic:
LogMetadata {
service: "MyApp"
serviceInstance: "" ← empty (SDK doesn't set service.instance.id)
layer: "" ← empty (SDK doesn't set service.layer)
sourceAttributes: { ← NEW: non-persistent, from OTLP resource
"os.name": "iOS",
"os.version": "17.4.1",
"device.model.identifier": "iPhone15,2",
"service.name": "MyApp",
"service.version": "2.1.0 (45)",
"telemetry.sdk.language": "swift",
"telemetry.sdk.name": "opentelemetry",
"os.type": "darwin"
}
}
LogData (proto) {
tags: [ ← only log record attributes, persisted in tagsRawData
{key: "name", value: "metrickit.diagnostic.crash"},
{key: "exception.type", value: "EXC_BAD_ACCESS"},
{key: "exception.message", value: "Could not access memory"},
{key: "exception.stacktrace", value: "0 CoreFoundation ..."},
{key: "metrickit.diagnostic.crash.exception.signal", value: "11"},
{key: "metrickit.diagnostic.crash.exception.signal.name", value: "SIGSEGV"}
]
body.text: "" ← MetricKit logs have empty body
}
Key distinction:
sourceAttributes→ readable viasourceAttribute()in LAL, NOT persistedtags→ readable viatag()in LAL, persisted intagsRawData- LAL extractor
tag 'key': value→ adds to both persistent tags and searchable tags
Diagnostic Types
| Diagnostic Type | name Attribute |
Key Attributes |
|---|---|---|
| Crash | metrickit.diagnostic.crash |
exception.type, exception.message, exception.stacktrace, metrickit.diagnostic.crash.exception.signal.name |
| Hang | metrickit.diagnostic.hang |
exception.stacktrace, metrickit.diagnostic.hang.hang_duration |
| CPU Exception | metrickit.diagnostic.cpu_exception |
metrickit.diagnostic.cpu_exception.total_cpu_time |
| Disk Write Exception | metrickit.diagnostic.disk_write_exception |
metrickit.diagnostic.disk_write_exception.total_writes_caused |
| App Launch (iOS 16+) | metrickit.diagnostic.app_launch |
metrickit.diagnostic.app_launch.launch_duration |
LAL Rules
Create oap-server/server-starter/src/main/resources/lal/ios-metrickit.yaml:
rules:
- name: ios-metrickit-diagnostics
layer: auto # layer determined by script; dropped if not set
dsl: |
filter {
// Only match iOS/iPadOS logs
if (sourceAttribute("os.name") != "iOS" && sourceAttribute("os.name") != "iPadOS") {
abort {}
}
// Only match MetricKit diagnostic logs
if (tag("name") == null || !tag("name").startsWith("metrickit.diagnostic.")) {
abort {}
}
extractor {
layer IOS
// Set instance from service.version (SDK doesn't set service.instance.id)
instance sourceAttribute("service.version")
// Selectively copy useful source attributes into persistent tags
tag 'device.model': sourceAttribute("device.model.identifier")
tag 'os.version': sourceAttribute("os.version")
// Copy diagnostic details from log record tags
tag 'diagnosticType': tag("name")
tag 'exception.type': tag("exception.type")
tag 'exception.message': tag("exception.message")
tag 'exception.stacktrace': tag("exception.stacktrace")
tag 'signal.name': tag("metrickit.diagnostic.crash.exception.signal.name")
tag 'hang.duration': tag("metrickit.diagnostic.hang.hang_duration")
}
sink {
// Store all diagnostics — they are already rare (once/day batches from real devices)
}
}
11. UI Menu and Dashboards
Menu Configuration
Add to oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml:
- title: Mobile
icon: mobile
description: Mobile application monitoring via OpenTelemetry SDKs.
i18nKey: mobile
menus:
- title: iOS
layer: IOS
description: iOS/iPadOS app monitoring via OpenTelemetry Swift SDK.
documentLink: https://skywalking.apache.org/docs/main/next/en/setup/service-agent/ios-monitoring/
i18nKey: ios
Dashboard Templates
Create dashboards under ui-initialized-templates/ios/:
ios-root.json — Root list view of all iOS app services.
ios-service.json — Per-app dashboard:
| Panel Group | Metrics | Source |
|---|---|---|
| HTTP Traffic | service_cpm, service_resp_time, service_sla, service_percentile |
OAL (from IOSHTTPSpanListener) |
| App Launch | meter_ios_app_launch_time |
MetricKit MAL |
| Stability | meter_ios_foreground_abnormal_exit_count, meter_ios_background_oom_kill_count |
MetricKit MAL |
| Memory | meter_ios_peak_memory |
MetricKit MAL |
| Network Transfer | meter_ios_wifi_download, meter_ios_cellular_download, etc. |
MetricKit MAL |
| Responsiveness | meter_ios_hang_time |
MetricKit MAL |
ios-instance.json — Per-version dashboard (instance = app version):
- HTTP traffic metrics scoped to instance (
service_instance_cpm,service_instance_resp_time,service_instance_sla) - MetricKit metrics per version
ios-endpoint.json — Per-domain dashboard (endpoint = net.peer.name domain):
endpoint_cpm,endpoint_resp_time,endpoint_sla,endpoint_percentile(from OAL)
UI Side
A separate PR in skywalking-booster-ui is needed for i18n menu entries for the “Mobile” group and “iOS” sub-item.
Imported Dependencies libs and their licenses.
No new dependencies. All processing uses existing OTLP receiver, OAL, LAL, and meter infrastructure.
Compatibility
- Configuration: New layer
IOSand menu entry — additive, no breaking change. - Storage: No new storage structures. Uses existing trace, metrics, and log storage.
- Protocols: No protocol changes. Uses existing OTLP receiver.
- LogMetadata: New
sourceAttributesfield — backward compatible. Existing receivers that don’t populate it get an empty map. Existing LAL rules that don’t callsourceAttribute()are unaffected. - LAL
layer: automode: Additive. Existing rules with specific layers (GENERAL, MESH, etc.) are unaffected. Only new rules can opt intoautomode to match logs with absent layer. - Drop policy for
autorules: Inautomode, logs where the script does not set a layer are warned and dropped. This only affects logs routed toautorules — logs with explicit layers are unaffected.
General usage docs
Prerequisites
- iOS app instrumented with OpenTelemetry Swift SDK v1.12+
- OTLP/HTTP exporter pointing to SkyWalking OAP (port 4318) or an OTel Collector
iOS App Setup
import OpenTelemetryApi
import OpenTelemetrySdk
import OpenTelemetryProtocolExporterHTTP
import ResourceExtension
import URLSessionInstrumentation
// Resource attributes (device, OS, app info — auto-collected)
let resources = DefaultResources().get()
// OTLP exporter
let traceExporter = OtlpHttpTraceExporter(
endpoint: URL(string: "http://<oap-host>:4318/v1/traces")!
)
let logExporter = OtlpHttpLogExporter(
endpoint: URL(string: "http://<oap-host>:4318/v1/logs")!
)
// TracerProvider
let tracerProvider = TracerProviderBuilder()
.add(spanProcessor: BatchSpanProcessor(spanExporter: traceExporter))
.with(resource: resources)
.build()
OpenTelemetry.registerTracerProvider(tracerProvider: tracerProvider)
// LoggerProvider (for MetricKit diagnostics)
let loggerProvider = LoggerProviderBuilder()
.with(resource: resources)
.with(processors: [SimpleLogRecordProcessor(logRecordExporter: logExporter)])
.build()
OpenTelemetry.registerLoggerProvider(loggerProvider: loggerProvider)
// Auto-instrument URLSession (exclude collector URL to avoid feedback loop)
let config = URLSessionInstrumentationConfiguration(
shouldInstrument: { request in
return request.url?.host != "<oap-host>"
}
)
let _ = URLSessionInstrumentation(configuration: config)
// MetricKit (real device only, not Simulator)
import MetricKit
let metricKit = MetricKitInstrumentation()
MXMetricManager.shared.add(metricKit)
SkyWalking OAP Configuration
Enable the OTLP receiver and LAL rules in application.yml:
receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-traces,otlp-logs"}
log-analyzer:
selector: ${SW_LOG_ANALYZER:default}
default:
lalFiles: ${SW_LOG_LAL_FILES:"ios-metrickit"}
What You’ll See
- Mobile > iOS menu appears when iOS services are detected
- Trace view shows individual HTTP requests with latency, status, URL, device model, carrier
- Service dashboard shows HTTP performance (real-time) + MetricKit stats (daily)
- Log view shows crash/hang diagnostic events with native stack traces, tagged with device model and OS version
Limitations
- MetricKit data (daily stats, crash diagnostics) requires a real iOS device — not available in Simulator
- MetricKit stats are delivered approximately once per day — not real-time
- Screen/view transition tracking is not automatic — the OTel Swift SDK does not instrument UIViewController or SwiftUI lifecycle
- Carrier info is iOS-only (not available on macOS)