Restore BanyanDB Data Using the Restore Tool
This document explains how to use the backup and restore command line tools for BanyanDB.
Overview
-
Backup Tool:
Backs up snapshot data from the BanyanDB data node to a remote storage destination. Files are organized under a time-based directory (e.g., daily or hourly) computed via a configurable time style. -
Restore Tool:
Reads the timedir files from local catalog directories and uses the timestamp within to determine which remote backup snapshot should be applied. Once restoration is successful, it removes all timedir files to avoid repeated restore operations, especially during unexpected scenarios. -
Timedir Utility:
Provides commands to create, list, read, and delete time-dir marker files for each catalog (e.g., stream, measure, and property).
Restore Workflow
In an on-prem deployment, the BanyanDB data node runs on a dedicated server (or virtual machine), and backup/restore operations are performed using the provided command line tool binaries. This mode is ideal for traditional environments where the tools are run manually or scheduled via system cron jobs.
Backup Tool
The backup
binary is used to create backups of BanyanDB snapshots. It connects to the gRPC endpoint of the data node to obtain snapshots and then uploads the backed-up data (organized by timestamp) to remote storage.
Example Backup Command
backup run \
--grpc-addr 127.0.0.1:17912 \
--stream-root-path /data \
--measure-root-path /data \
--property-root-path /data \
--dest file:///backups \
--time-style daily
Notes:
--grpc-addr
: gRPC address for the data node.--stream-root-path
,--measure-root-path
,--property-root-path
: Local directories for the respective catalogs.--dest
: Remote destination URL where backups will be stored (e.g., local filesystem path with thefile://
scheme).--time-style
: Defines the time directory style, such asdaily
orhourly
.
Timedir Utilities
The timedir
tool provides commands to create, read, list, and delete marker files (time directories) that indicate the backup snapshot’s timestamp. These marker files are used by the restore process to identify which snapshot to restore.
List Remote Timedir Files
Before creating a new local timedir file for restoration, use the timedir list command to inspect the remote backup destination. This lets you view all available timedir names and select the expected one.
restore timedir list \
--dest file:///backups
Example output:
Remote time directories:
2025-02-12
2025-02-13
2025-02-14
Examine the list and choose the appropriate timedir (for example, 2025-02-12).
Creating Timedir Files
Before undertaking a pod or service restart, an administrator may create timedir marker files to capture the current backup state:
restore timedir create 2025-02-12 \
--stream-root /data \
--measure-root /data \
--property-root /data
This command writes the specified timestamp (e.g., 2025-02-12) into files named /data/stream/time-dir
, /data/measure/time-dir
, and /data/property/time-dir
in the designated root directories.
Verifying Timedir Files
Ensure that the timedir files have been created with the correct content by reading them:
restore timedir read \
--stream-root /data \
--measure-root /data \
--property-root /data
The output should display the timedir content for each catalog.
Restore Tool
Important: Before triggering a restore, ensure the BanyanDB data node is completely stopped.
The restore
binary restores data from the remote backup storage onto the local machine. It examines the timedir marker files to determine which backup snapshot to apply, then synchronizes the local data directories accordingly. After a successful restore, the tool removes the timedir files to prevent unintended repeated restores.
Example Restore Command
restore run \
--source file:///backups \
--stream-root-path /data \
--measure-root-path /data \
--property-root-path /data
Key Points:
- The tool reads the timedir files (e.g.,
/data/stream/time-dir
) to fetch the appropriate timestamp. - Local data is compared with the remote backup snapshot; orphaned files in local directories are removed.
- Upon success, the timedir marker files are deleted to ensure a clean recovery state.
Kubernetes Deployment
For environments running BanyanDB in Kubernetes, the backup and restore tools can be integrated as sidecar containers. A common pattern is to use an init container for restoring data and a sidecar to manage backup and timedir operations.
Kubernetes Sidecar Deployment Model
-
Sidecar Container:
- The sidecar runs the backup and timedir command binaries alongside the main BanyanDB container.
- Administrators can attach to the sidecar (using
kubectl exec
) to execute timedir operations. - The sidecar is responsible for backing up snapshot data and maintaining timedir files.
-
Init Container for Restore:
- When a pod is (re)created, an init container runs before the main container starts.
- The init container uses the
restore
binary to read the timedir files from shared volumes (mounted from a persistent volume or hostPath). - It restores the data from the remote backup store by processing the timedir markers, and then deletes the marker files to avoid accidental re-triggering.
-
Shared Volumes:
- The timedir files and data directories are shared between the init container, sidecar, and main container (data node) via Kubernetes volumes.
- This shared setup ensures that the restore process can correctly synchronize the data state before the server starts.
By following the guidelines and examples provided in this document, administrators can choose the approach that best fits their deployment environment—whether on-prem or in Kubernetes—to ensure reliable data backup and restoration for BanyanDB.