Data Lifecycle Stages(Hot/Warm/Cold)
Lifecycle Stages provide a mechanism to optimize storage costs and query performance based on the time granularity of records/metrics, specially if you require keep mass of data for a long time.
The data lifecycle includes hot, warm, and cold stages. Each stage has different TTL settings and Segment Creation Policies. Each group of records/metrics can be automatically migrated and stored in different stages according to the configuration.
Stages Definition
- hot: The default first stage of data storage. The data is the newest, can be updated(metrics), and is most frequently queried.
- warm: Optional, the second stage of data storage. The data is less frequently queried than the hot stage, can’t be updated, and still performs well.
- cold: Optional, the third stage of data storage. The data is rarely queried and is stored for a long time. The query performance is significantly lower than the hot/warm stages data.
If necessary, you also can jump the warm stage, and only use hot and cold stages. Then the data will be moved to the cold stage after the TTL of the hot stage.
Configuration Guidelines
The lifecycle stages configuration is under each group settings of the bydb.yml file, for example, the metricsMin group:
metricsMin:
# The settings for the default `hot` stage.
shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_SHARD_NUM:2}
segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:1}
ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_TTL_DAYS:7}
enableWarmStage: ${SW_STORAGE_BANYANDB_GM_MINUTE_ENABLE_WARM_STAGE:false}
enableColdStage: ${SW_STORAGE_BANYANDB_GM_MINUTE_ENABLE_COLD_STAGE:false}
warm:
shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_SHARD_NUM:2}
segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_SI_DAYS:3}
ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_TTL_DAYS:15}
nodeSelector: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_NODE_SELECTOR:"type=warm"}
cold:
shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_SHARD_NUM:2}
segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_SI_DAYS:5}
ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_TTL_DAYS:60}
nodeSelector: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_NODE_SELECTOR:"type=cold"}
- shardNum: The number of shards for the group.
- segmentInterval: The time interval in days for creating a new data segment.
- According to the freshness of the data, the
segmentIntervaldays should:hot<warm<cold.
- ttl: The time-to-live for data within the group, in days.
- enableWarmStage/enableColdStage: Enable the warm/cold stage for the group.
- The
hotstage is always enabled by default. - If the
warmstage is enabled, the data will be moved to thewarmstage after the TTL of thehotstage. - If the
coldstage is enabled andwarmstage is disabled, the data will be moved to thecoldstage after the TTL of thehotstage. - If both
warmandcoldstages are enabled, the data will be moved to thewarmstage after the TTL of thehotstage, and then to thecoldstage after the TTL of thewarmstage. - OAP will query the data from the
hot and warmstage by default if thewarmstage is enabled.
- nodeSelector: Specifying target nodes for this stage.
For more details on configuring segmentIntervalDays and ttlDays, refer to the BanyanDB Rotation documentation.
Procedure and The TTL for Stages
About the TTL can refer to Progressive TTL.
The following diagram illustrates the lifecycle stages, assuming the TTL settings for hot, warm and cold stages are TTL1, TTL2 and TTL3 days respectively:
sequenceDiagram
Data(T0) ->> Hot Data(TTL1): Input
Hot Data(TTL1) -->+ Hot Data(TTL1): TTL1
Hot Data(TTL1) ->>- Warm Data(TTL2): Migrate
Warm Data(TTL2) -->+ Warm Data(TTL2): TTL2
Warm Data(TTL2) ->>- Cold Data(TTL3): Migrate
Cold Data(TTL3) -->+ Cold Data(TTL3): TTL3
Cold Data(TTL3) ->>- Deleted: Delete
Data(T0) --> Hot Data(TTL1): Live TTL1 Days
Data(T0) --> Warm Data(TTL2): Live TTL1+TTL2 Days
Data(T0) --> Cold Data(TTL3): Live TTL1+TTL2+TTL3 Days
- When the data is input, it will be stored in the hot stage and live for
TTL1days. - After
TTL1days, the data will be migrated to the warm stage and live forTTL2days. - After
TTL2days, the data will be migrated to the cold stage and live forTTL3days. - After
TTL3days, the data will be deleted. - The data will live for
TTL1+TTL2+TTL3days in total.
Querying
- According to the lifecycle stages configuration, OAP will query the data from the
hot and warmstage by default if thewarmstage is enabled. Otherwise, OAP will query the data from thehotstage only. - If the
coldstage is enabled, for better query performance, you should specify the stage in the query and OAP will limit the query time range.