SkyWalking BanyanDB 0.10.3 is released. Go to downloads page to find release tars.
Bug Fixes
- Persist segment end time in per-segment metadata so boundaries don’t shift across restarts or config changes.
- Fix flaky on-disk integration tests caused by Ginkgo v2 random container shuffling closing gRPC connections prematurely.
- ui: fix query editor refresh/reset behavior and BydbQL keyword highlighting.
- Fix flaky
file_snapshotsubtest in measure/stream/trace by waiting until every introduced mem part has been flushed to disk, instead of only checking the latest snapshot creator. - Fix flaky
TestCollectWithPartialClosedSegmentsby raisingSegmentIdleTimeoutso wall-clock variance on slow CI does not mark still-open segments as idle. - Fix FODC lifecycle cache poisoning where transient
InspectAllfailures were cached for 10 minutes and masked liaison recovery; raise FODC agent and proxy timeouts from 10s to 40s. - Fix FODC
/cluster/lifecycledropping zero-valued group fields (e.g.replicas=0,close=false) underencoding/json+omitempty; switch toprotojsonso all fields are emitted (nil nested messages serialize asnull). - Fix trace
block_writerpanic on out-of-order timestamps within the same traceID, which dropped one trace-write batch per panic in multi-agent SkyWalking deployments. Spans of a single trace originate from independently-clocked services, and trace storage is organized by traceID rather than timestamp, so per-traceID timestamp monotonicity is not a writer invariant. - Fix nil-pointer panic on cold-tier data nodes when FODC
InspectAllraced with idle-segment cleanup. - Add
GroupLifecycleInfo.errorsto surface per-group collection failures from FODCInspectAllinstead of silently dropping the affected node entry. - Fix
CollectDataInfoandCollectLiaisonInfonot handlingCATALOG_PROPERTYgroups. - Close BanyanDB merge write-path durability gap that allowed torn parts to be created by a crash between data write and metadata commit. Metadata files (
metadata.jsonfor trace/measure/stream,manifest.jsonfor sidx, plustraceID.filterandtag.type) now go through a newWriteAtomic(write-tmp + fsync + rename + fsync-dir) sequence; data writers (seqWriter.Close,localFileSystem.Write) now propagate fdatasync errors instead of silently dropping them.mustOpenFilePart/mustOpenPartin each engine cleans up safe post-rename.tmpleftovers on open. (#13862, root cause for #13861) - Fix lifecycle migration where the receiving node could create segments shorter than the configured
SegmentInterval. - Fail fast on incompatible storage version at boot. Previously the server would start in a degraded
SERVINGstate with affected groups un-loaded because the property schema-registry retry loop swallowed the version-incompatibility panic. Compatible versions are listed inbanyand/internal/storage/versions.yml. - Release bluge index writers on segment rotation so
analysisWorkerpools sized fromGOMAXPROCSdon’t accumulate across rotations. Two layered defects kept the existing idle-segment reclaim path from running:segmentIdleTimeoutdefaulted to0(which disabled the 10-minute reclaim ticker), andincRefrefreshedlastAccessedon every rotation tick socloseIdleSegmentsnever observed an idle segment. Defaults totime.Hour, moves thelastAccessedbump to real read/write call sites, and rewritescloseIdleSegmentsto take its own CAS-bumped snapshot so a concurrent reopen cannot have its only ref dropped under the reclaimer (apache/skywalking#13874). - Fix incorrect counts and missing trace fields in the lifecycle migration report.
- Fix trace query identity-tag projection: when
trace_id/span_idare explicitly projected, reconstruct them from span identity at response build time instead of requesting them as stored tags, and preserve tag order with null-filled per-span value alignment in the distributed trace result iterator. - Fix FODC proxy corrupting Prometheus metric types. The agent dropped the
# TYPEline while parsing banyandb/metrics, theStreamMetricsproto carried no type field, and the proxy guessed the type from a name-suffix heuristic — downgrading counters to gauge, mislabeling_count-suffixed counters as histograms, and splitting summaries into two conflicting# TYPElines. Capture the type with the Prometheusexpfmtparser, store it in the flight recorder, thread it through a newMetric.typeenum over gRPC, and emit the real type from the proxy; pre-upgrade (untyped) samples fold into the matching typed family so a mixed-version rollout never emits two conflicting# TYPElines for one metric. - Trace storage metrics now expose the
storagesub-scope, matching thestream_storage_*naming. TheStorageMetricsFactoryfor trace switched from the roottracescope totrace.storage, so per-segment inverted-index metrics (inverted_index_total_updates,inverted_index_total_doc_count,inverted_index_total_term_searchers_started) are now emitted asbanyandb_trace_storage_*instead ofbanyandb_trace_*, aligning the dashboard query names. Other trace metrics (trace_tst_*,trace_scheduler_*) are unchanged. - Fix FODC agent labeling metrics with
node_role="ROLE_UNSPECIFIED". The agent resolved the node role exactly once at startup via a singleGetCurrentNodepoll whose endpoint retries spanned only ~1s; when the sibling lifecycle/banyandb gRPC server was not yet listening (connect: cannot assign requested address) the role fell back toROLE_UNSPECIFIEDpermanently, so most nodes never reported their realROLE_DATA/ROLE_LIAISON. Retry the initial node-role resolution with exponential backoff until a non-unspecified role is obtained or a 25s budget elapses.
Chores
- Regenerate expired TLS test certificate with 100-year validity.
- Set Ginkgo
--repeatto 0 in the flaky-test workflow so the hourly run completes within the 50-minute timeout.