Profiling

The profiling is used to profiling the processes from the Service Discovery, and send the snapshot to the backend server.

Configuration

Name	Default	Environment Key	Description
profiling.active	true	ROVER_PROFILING_ACTIVE	Is active the process profiling.
profiling.check_interval	10s	ROVER_PROFILING_CHECK_INTERVAL	Check the profiling task interval.
profiling.flush_interval	5s	ROVER_PROFILING_FLUSH_INTERVAL	Combine existing profiling data and report to the backend interval.
profiling.task.on_cpu.dump_period	9ms	ROVER_PROFILING_TASK_ON_CPU_DUMP_PERIOD	The profiling stack dump period.
profiling.task.network.report_interval	2s	ROVER_PROFILING_TASK_NETWORK_TOPOLOGY_REPORT_INTERVAL	The interval of send metrics to the backend.
profiling.task.network.meter_prefix	rover_net_p	ROVER_PROFILING_TASK_NETWORK_TOPOLOGY_METER_PREFIX	The prefix of network profiling metrics name.
profiling.task.network.protocol_analyze.per_cpu_buffer	400KB	ROVER_PROFILING_TASK_NETWORK_PROTOCOL_ANALYZE_PER_CPU_BUFFER	The size of socket data buffer on each CPU.
profiling.task.network.protocol_analyze.parallels	2	ROVER_PROFILING_TASK_NETWORK_PROTOCOL_ANALYZE_PARALLELS	The count of parallel protocol analyzer.
profiling.task.network.protocol_analyze.queue_size	5000	ROVER_PROFILING_TASK_NETWORK_PROTOCOL_ANALYZE_QUEUE_SIZE	The size of per paralleled analyzer queue.
profiling.task.network.protocol_analyze.sampling.http.default_request_encoding	UTF-8	ROVER_PROFILING_TASK_NETWORK_PROTOCOL_ANALYZE_SAMPLING_HTTP_DEFAULT_REQUEST_ENCODING	The default body encoding when sampling the request.
profiling.task.network.protocol_analyze.sampling.http.default_response_encoding	UTF-8	ROVER_PROFILING_TASK_NETWORK_PROTOCOL_ANALYZE_SAMPLING_HTTP_DEFAULT_RESPONSE_ENCODING	The default body encoding when sampling the response.
profiling.continuous.meter_prefix	rover_con_p	ROVER_PROFILING_CONTINUOUS_METER_PREFIX	The continuous related meters prefix name.
profiling.continuous.fetch_interval	1s	ROVER_PROFILING_CONTINUOUS_FETCH_INTERVAL	The interval of fetch metrics from the system, such as Process CPU, System Load, etc.
profiling.continuous.check_interval	5s	ROVER_PROFILING_CONTINUOUS_CHECK_INTERVAL	The interval of check metrics is reach the thresholds.
profiling.continuous.trigger.execute_duration	10m	ROVER_PROFILING_CONTINUOUS_TRIGGER_EXECUTE_DURATION	The duration of the profiling task.
profiling.continuous.trigger.silence_duration	20m	ROVER_PROFILING_CONTINUOUS_TRIGGER_SILENCE_DURATION	The minimal duration between the execution of the same profiling task.

Prepare service

Before profiling your service, please make sure your service already has the symbol data inside the binary file. So we could locate the stack symbol, It could be checked following these ways:

objdump: Using objdump --syms path/to/service.
readelf: Using readelf --syms path/to/service.

Profiling Type

All the profiling tasks are using the Linux Official Function and kprobe or uprobe to open perf event, and attach the eBPF Program to dump stacks.

On CPU

On CPU Profiling task is using PERF_COUNT_SW_CPU_CLOCK to profiling the process with the CPU clock.

Off CPU

Off CPU Profiling task is attach the finish_task_switch in krobe to profiling the process.

Network

Network Profiling task is intercept IO-related syscall and urprobe in process to identify the network traffic and generate the metrics. Also, the following protocol are supported for analyzing using OpenSSL library, BoringSSL library, GoTLS, NodeTLS or plaintext:

HTTP/1.x
HTTP/2
MySQL
CQL(The Cassandra Query Language)
MongoDB
Kafka
DNS

Collecting data

Network profiling uses metrics, logs send to the backend service.

Data Type

The network profiling has customized the following two types of metrics to represent the network data:

Counter: Records the total number of data in a certain period of time. Each counter containers the following data:
1. Count: The count of the execution.
2. Bytes: The package size of the execution.
3. Exe Time: The consumed time(nanosecond) of the execution.
Histogram: Records the distribution of the data in the bucket.
TopN: Record the highest latency data in a certain period of time.

Labels

Each metric contains the following labels to identify the process relationship:

Name	Type	Description
client_process_id or server_process_id	string	The ID of the current process, which is determined by the role of the current process in the connection as server or client.
client_local or server_local	boolean	The remote process is a local process.
client_address or server_address	string	The remote process address. ex: `IP:port`.
side	enum	The current process is either “client” or “server” in this connection.
protocol	string	Identification the protocol based on the package data content.
is_ssl	bool	Is the current connection using SSL.

Layer-4 Data

Based on the above two data types, the following metrics are provided.

Name	Type	Unit	Description
write	Counter	nanosecond	The socket write counter
read	Counter	nanosecond	The socket read counter
write RTT	Counter	microsecond	The socket write RTT counter
connect	Counter	nanosecond	The socket connect/accept with other server/client counter
close	Counter	nanosecond	The socket close counter
retransmit	Counter	nanosecond	The socket retransmit package counter
drop	Counter	nanosecond	The socket drop package counter
write RTT	Histogram	microsecond	The socket write RTT execute time histogram
write execute time	Histogram	nanosecond	The socket write data execute time histogram
read execute time	Histogram	nanosecond	The socket read data execute time histogram
connect execute time	Histogram	nanosecond	The socket connect/accept with other server/client execute time histogram
close execute time	Histogram	nanosecond	The socket close execute time histogram

HTTP/1.x Data

Metrics

Name	Type	Unit	Description
http1_request_cpm	Counter	count	The HTTP request counter
http1_response_status_cpm	Counter	count	The count of per HTTP response code
http1_request_package_size	Histogram	Byte size	The request package size
http1_response_package_size	Histogram	Byte size	The response package size
http1_client_duration	Histogram	millisecond	The duration of single HTTP response on the client side
http1_server_duration	Histogram	millisecond	The duration of single HTTP response on the server side

Logs

Name	Type	Unit	Description
slow_traces	TopN	millisecond	The Top N slow trace(id)s
status_4xx	TopN	millisecond	The Top N trace(id)s with response status in 400-499
status_5xx	TopN	millisecond	The Top N trace(id)s with response status in 500-599

Span Attached Event

Name	Description
HTTP Request Sampling	Complete information about the HTTP request, it’s only reported when it matches slow/4xx/5xx traces.
HTTP Response Sampling	Complete information about the HTTP response, it’s only reported when it matches slow/4xx/5xx traces.
Syscall xxx	The methods to use when the process invoke with the network-related syscall method. It’s only reported when it matches slow/4xx/5xx traces.

Continuous Profiling

The continuous profiling feature monitors low-power target process information, including process CPU usage and network requests, based on configuration passed from the backend. When a threshold is met, it automatically initiates a profiling task(on/off CPU, Network) to provide more detailed analysis.

Monitor Type

System Load

Monitor the average system load for the last minute, which is equivalent to using the first value of the load average in the uptime command.

Process CPU

The target process utilizes a certain percentage of the CPU on the current host.

Process Thread Count

The real-time number of threads in the target process.

Network

Network monitoring uses eBPF technology to collect real-time performance data of the current process responding to requests. Requests sent upstream are not monitored by the system.

Currently, network monitoring supports parsing of the HTTP/1.x protocol and supports the following types of monitoring:

Error Rate: The percentage of network request errors, such as HTTP status codes within the range of [500-600), is considered as erroneous.
Avg Response Time: Average response time(ms) for specified URI.

Metrics

Rover would periodically send collected monitoring data to the backend using the Native Meter Protocol.

Name	Unit	Description
process_cpu	(0-100)%	The CPU usage percent
process_thread_count	count	The thread count of process
system_load	count	The average system load for the last minute, each process have same value
http_error_rate	(0-100)%	The network request error rate percentage
http_avg_response_time	ms	The network average response duration

Edit this page

SkyWalking Rover

Profiling

Configuration

Prepare service

Profiling Type

On CPU

Off CPU

Network

Collecting data

Data Type

Labels

Layer-4 Data

HTTP/1.x Data

Metrics

Logs

Span Attached Event

Continuous Profiling

Monitor Type

System Load

Process CPU

Process Thread Count

Network

Metrics

Apache SkyWalking 2025 in Review: Making BanyanDB Ready for Production