ALS Load Balance
Using satellite as a load balancer in envoy and OAP can effectively prevent the problem of unbalanced messages received by OAP.
In this case, we mainly use memory queues for intermediate data storage.
Deference Envoy Count, OAP performance could impact the Satellite transmit performance.
Envoy Instance | Concurrent User | ALS OPS | Satellite CPU | Satellite Memory |
---|---|---|---|---|
150 | 100 | ~50K | 1.2C | 0.5-1.0G |
150 | 300 | ~80K | 1.8C | 1.0-1.5G |
300 | 100 | ~50K | 1.4C | 0.8-1.2G |
300 | 300 | ~100K | 2.2C | 1.3-2.0G |
800 | 100 | ~50K | 1.5C | 0.9-1.5G |
800 | 300 | ~100K | 2.6C | 1.7-2.7G |
1500 | 100 | ~50K | 1.7C | 1.4-2.4G |
1500 | 300 | ~100K | 2.7C | 2.3-3.0G |
2300 | 150 | ~50K | 1.8C | 1.9-3.1G |
2300 | 300 | ~90K | 2.5C | 2.3-4.0G |
2300 | 500 | ~110K | 3.2C | 2.8-4.7G |
Detail
Environment
Using GKE Environment, helm to build cluster.
Module | Version | Replicate Count | CPU Limit | Memory Limit | Description |
---|---|---|---|---|---|
OAP | 8.9.0 | 6 | 12C | 32Gi | Using ElasticSearch as Storage |
Satellite | 0.4.0 | 1 | 8C | 16Gi | |
ElasticSearch | 7.5.1 | 3 | 8 | 16Gi |
Setting
800 Envoy, 100K QPS ALS.
Module | Environment Config | Use Value | Default Value | Description | Recommend Value |
---|---|---|---|---|---|
Satellite | SATELLITE_QUEUE_PARTITION | 50 | 4 | Support several goroutines concurrently to consume the queue | Satellite CPU number * 4-6, It could help improve throughput, but the default value also could handle 800 Envoy Instance and 100K QPS ALS message. |
Satellite | SATELLITE_QUEUE_EVENT_BUFFER_SIZE | 3000 | 1000 | The size of the queue in each concurrency | This is related to the number of Envoys. If the number of Envoys is large, it is recommended to increase the value. |
Satellite | SATELLITE_ENVOY_ALS_V3_PIPE_RECEIVER_FLUSH_TIME | 3000 | 1000 | When the Satellite receives the message, how long(millisecond) will the ALS message be merged into an Event. | If a certain time delay is accepted, the value can be adjusted larger, which can effectively reduce CPU usage and make the Satellite more stable |
Satellite | SATELLITE_ENVOY_ALS_V3_PIPE_SENDER_FLUSH_TIME | 3000 | 1000 | How long(millisecond) is the memory queue data for each Goroutine to be summarized and sent to OAP | This depends on the amount of data in your queue, you can keep it consistent with SATELLITE_ENVOY_ALS_V3_PIPE_RECEIVER_FLUSH_TIME |
OAP | SW_CORE_GRPC_MAX_CONCURRENT_CALL | 50 | 4 | A link between Satellite and OAP, how many requests parallelism is supported | Same with SATELLITE_QUEUE_PARTITION in Satellite |