Go App Profiling

Go App Profiling uses the pprof for sampling

pprof is bundled within the auto-instrument agent and corresponds to In-Process Profiling.

It is delivered to the agent in the form of a task, allowing it to be enabled or disabled dynamically. When service encounters performance issues (CPU usage, memory allocation, etc.), pprof task can be created. When the agent receives a task, it enables pprof for sampling. After sampling is completed, the sampling results are analyzed by requesting the server to render a flame graph for performance analysis to determine the specific business code lines that cause performance problems. Note, tracing profiling in the Go agent relies on the Go runtime’s global CPU sampling used by pprof. Since only one CPU profiler can run at a time within the same instance, tracing and pprof CPU profiling cannot be enabled simultaneously. If both are activated on the same instance, one task may fail to start.

Activate pprof in the OAP

OAP and the agent use a brand-new protocol to exchange pprof data, so it is necessary to start OAP with the following configuration:

receiver-pprof:
  selector: ${SW_RECEIVER_PPROF:default}
  default:
    # Used to manage the maximum size of the pprof file that can be received, the unit is Byte, default is 30M
    pprofMaxSize: ${SW_RECEIVER_PPROF_MAX_SIZE:31457280}
    # Used to determine whether to receive pprof in memory file or physical file mode
    #
    # The memory file mode have fewer local file system limitations, so they are by default. But it costs more memory.
    #
    # The physical file mode will use less memory when parsing and is more friendly to parsing large files.
    # However, if the storage of the tmp directory in the container is insufficient, the oap server instance may crash.
    # It is recommended to use physical file mode when volume mounting is used or the tmp directory has sufficient storage.
    memoryParserEnabled: ${SW_RECEIVER_PPROF_MEMORY_PARSER_ENABLED:true}

pprof Task with Analysis

To use the pprof feature, please follow these steps:

  1. Create pprof task: Use the UI or CLI tool to create a task.
  2. Wait agent collect data and upload: Wait for pprof to collect pprof data and report.
  3. Query task progress: Query the progress of tasks, including analyzing successful and failed instances and task logs.
  4. Analyze the data: Analyze the pprof data to determine where performance bottlenecks exist in the service.

Create an pprof task

Create an pprof task to notify some go-agent instances in the execution service to start pprof for data collection.

When creating a task, the following configuration fields are required:

  1. serviceId: Define the service to execute the task.
  2. serviceInstanceIds: Define which instances need to execute tasks.
  3. duration: Define the duration of this task in minutes, required for CPU, BLOCK, MUTEX events.
  4. events: Define which event types this task needs to collect.
  5. dumpPeriod: Define the period of the pprof dump, required for BLOCK, MUTEX events.

When the Agent receives a pprof task from OAP, it automatically generates a log to notify that the task has been acknowledged. The log contains the following field information:

  1. Instance: The name of the instance where the Agent is located.
  2. Type: Supports “NOTIFIED” and “EXECUTION_FINISHED” and “PPROF_UPLOAD_FILE_TOO_LARGE_ERROR”, “EXECUTION_TASK_ERROR”, with the current log displaying “NOTIFIED”.
  3. Time: The time when the Agent received the task.

Wait the agent to collect data and upload

At this point, pprof will trace the events you selected when you created the task:

  1. CPU: samples CPU usage over time to show which functions consume the most processing time.
  2. ALLOC, HEAP:
    • HEAP: a sampling of memory allocations of live objects.
    • ALLOC: a sampling of all past memory allocations.
  3. BLOCK, MUTEX:
    • BLOCK: stack traces that led to blocking on synchronization primitives.
    • MUTEX: stack traces of holders of contended mutexes.
  4. GOROUTINE, THREADCREAT:
    • GOROUTINE: stack traces of all current goroutines.
    • THREADCREATE: stack traces that led to the creation of new OS threads.

Finally, the agent will upload the pprof file produced by pprof to the oap server for online performance analysis.

Query the profiling task progresses

Wait for pprof to complete data collection and upload successfully. We can query the execution logs of the pprof task and the task status, which includes the following information:

  1. successInstanceIds: SuccessInstanceIds gives instances that have executed the task successfully.
  2. errorInstanceIds: ErrorInstanceIds gives instances that failed to execute the task.
  3. logs: All task execution logs of the current task.
    1. id: The task id.
    2. instanceId: InstanceId is the id of the instance which reported this task log.
    3. instanceName: InstanceName is the name of the instance which reported this task log.
    4. operationType: Contains “NOTIFIED” and “EXECUTION_FINISHED” and “PPROF_UPLOAD_FILE_TOO_LARGE_ERROR”, “EXECUTION_TASK_ERROR”.
    5. operationTime: operationTime is the time when the operation occurs.

Analyze the profiling data

Once some agents completed the task, we can analyze the data through the following query:

  1. taskId: The task id.
  2. instanceIds: InstanceIds defines the instances to be included for analysis

After the query, the following data would be returned to render a flame graph:

  1. taskId: The task id.
  2. elements: Combined with “id” to determine the hierarchical relationship.
    1. Id: Id is the identity of the stack element.
    2. parentId: Parent element ID. The dependency relationship between elements can be determined using the element ID and parent element ID.
    3. codeSignature: Method signatures in tree nodes.
    4. total:The total number of samples of the current tree node, including child nodes.
    5. self: The sampling number of the current tree node, excluding samples of the children.