Schema Barrier RPCs
SchemaBarrierService (proto package banyandb.schema.v1) lets a client block synchronously until the cluster’s schema state matches a target. All three RPCs share the same shape: a request carrying the target plus a timeout, and a response carrying bool applied and a list of NodeLaggard entries when the call did not converge in time.
This page is a reference. For end-to-end workflows that combine barriers with the write and query gates, see Scenarios.
SchemaKey
Several requests reference schemas by SchemaKey:
message SchemaKey {
string kind = 1; // "measure", "stream", "trace", "property",
// "index_rule", "index_rule_binding", "group", "top_n_aggregation"
string group = 2; // group name; empty for kind="group"
string name = 3; // resource name; empty for kind="group" if referring to the group itself
}
For a Group key, set kind = "group", name = "<group-name>", group = "".
NodeLaggard
When applied == false, the response carries one entry per data node that has not converged:
message NodeLaggard {
string node = 1;
int64 current_mod_revision = 2; // populated by AwaitRevisionApplied
repeated SchemaKey missing_keys = 3; // populated by AwaitSchemaApplied
repeated SchemaKey still_present_keys = 4; // populated by AwaitSchemaDeleted
}
In standalone mode there is one NodeLaggard entry at most. In cluster mode (Phase 2) the list reports every laggard node.
AwaitRevisionApplied
Block until every data node’s cache observes mod_revision >= min_revision.
rpc AwaitRevisionApplied(AwaitRevisionAppliedRequest)
returns (AwaitRevisionAppliedResponse);
message AwaitRevisionAppliedRequest {
int64 min_revision = 1;
google.protobuf.Duration timeout = 2;
}
message AwaitRevisionAppliedResponse {
bool applied = 1;
repeated NodeLaggard laggards = 2;
}
Semantics
- The server returns as soon as the watermark reaches
min_revision. Polling is internal; the client gets one synchronous answer. - On timeout the response has
applied = falseand lists every laggard node with itscurrent_mod_revision. The call never returns indefinitely. - A
min_revisionof0returnsapplied = trueimmediately. - A nil or zero
timeoutfalls back to a server default (5 s in the standalone implementation).
Go example
import (
"context"
"time"
schemav1 "github.com/apache/skywalking-banyandb/api/proto/banyandb/schema/v1"
"google.golang.org/protobuf/types/known/durationpb"
)
resp, err := barrier.AwaitRevisionApplied(ctx, &schemav1.AwaitRevisionAppliedRequest{
MinRevision: targetRev,
Timeout: durationpb.New(10 * time.Second),
})
if err != nil {
return err // gRPC-level error; not a timeout
}
if !resp.GetApplied() {
for _, l := range resp.GetLaggards() {
log.Printf("node %s at rev %d, target %d", l.GetNode(), l.GetCurrentModRevision(), targetRev)
}
return fmt.Errorf("schema did not propagate within timeout")
}
AwaitSchemaApplied
Block until each requested key is present at or above its target revision on every node.
rpc AwaitSchemaApplied(AwaitSchemaAppliedRequest)
returns (AwaitSchemaAppliedResponse);
message AwaitSchemaAppliedRequest {
repeated SchemaKey keys = 1;
repeated int64 min_revisions = 2;
google.protobuf.Duration timeout = 3;
}
message AwaitSchemaAppliedResponse {
bool applied = 1;
repeated NodeLaggard laggards = 2;
}
Semantics
keysandmin_revisionsare parallel arrays:min_revisions[i]applies tokeys[i]. Ifmin_revisionsis empty or shorter thankeys, the missing entries default to0(meaning “any revision; just be present”).keysis capped at 10 000 server-side. Exceeding the cap returnsInvalidArgument. Chunk larger sets across multiple calls; mergeappliedresults with logical AND and union laggards.- On timeout, each laggard’s
missing_keysenumerates exactly the keys not yet at the target on that node. applied = trueis returned only when every key on every node is at or above the target.
Go example
keys := []*schemav1.SchemaKey{
{Kind: "measure", Group: "sw_metric", Name: "service_cpm_minute"},
{Kind: "measure", Group: "sw_metric", Name: "service_resp_time"},
}
revs := []int64{firstRev, secondRev}
resp, err := barrier.AwaitSchemaApplied(ctx, &schemav1.AwaitSchemaAppliedRequest{
Keys: keys,
MinRevisions: revs,
Timeout: durationpb.New(15 * time.Second),
})
if err != nil { return err }
if !resp.GetApplied() {
for _, l := range resp.GetLaggards() {
for _, k := range l.GetMissingKeys() {
log.Printf("node %s missing %s/%s/%s", l.GetNode(), k.GetKind(), k.GetGroup(), k.GetName())
}
}
}
AwaitSchemaDeleted
Block until each named key is absent from every node’s cache.
rpc AwaitSchemaDeleted(AwaitSchemaDeletedRequest)
returns (AwaitSchemaDeletedResponse);
message AwaitSchemaDeletedRequest {
repeated SchemaKey keys = 1;
google.protobuf.Duration timeout = 2;
}
message AwaitSchemaDeletedResponse {
bool applied = 1;
repeated NodeLaggard laggards = 2;
}
Semantics
- Same 10 000-key cap as
AwaitSchemaApplied. - On timeout each laggard’s
still_present_keysenumerates the keys the call was waiting to see disappear. - This RPC reports tombstone propagation, not tombstone GC. A key that has tombstoned is “absent” for the purpose of this call even before the tombstone is physically removed by the GC loop.
Go example
resp, err := barrier.AwaitSchemaDeleted(ctx, &schemav1.AwaitSchemaDeletedRequest{
Keys: []*schemav1.SchemaKey{
{Kind: "measure", Group: "sw_metric", Name: "service_cpm_minute"},
},
Timeout: durationpb.New(10 * time.Second),
})
if err != nil { return err }
if !resp.GetApplied() {
// Inspect resp.Laggards[i].StillPresentKeys
}
Choosing the right barrier
| Situation | RPC |
|---|---|
You just got a mod_revision = R back from a Create/Update and want every node to see R or later |
AwaitRevisionApplied(R) |
| You created several schemas at once and need to confirm each one is visible | AwaitSchemaApplied(keys, revs) |
| You want to confirm a deleted schema is gone before re-creating | AwaitSchemaDeleted(keys) |
| You don’t know the revisions but want presence | AwaitSchemaApplied(keys, []int64{}) (zero-filled) |
Common pitfalls
- Setting an absurdly small timeout. Even on a healthy cluster, schema propagation has measurable latency. Set timeouts in the seconds range, not milliseconds. The barrier is a synchronous safety net; treat it like one.
- Forgetting that
min_revisionsis a parallel array. If you pass three keys but only two revisions, the third key is gated on revision0(any revision). That is rarely what the caller meant. - Calling the barrier from the same context that issued the schema mutation, with a context deadline tighter than the timeout. The context cancellation wins. Either propagate enough deadline budget or use a fresh context for the barrier.
- Treating a non-error response with
applied = falseas an error. It is a soft outcome — the client may want to act on the laggards (alert, fall back to the un-gated path) rather than abort.