A pool-operator in XAPI-based hypervisors (XenServer, XCP-ng) can disable garbage collection and VHD coalescing on any Storage Repository by setting gc=false and/or coalesce=false in SR.other_config. The SM garbage collector reads these keys at cleanup.py:2052 and cleanup.py:2090 respectively. When disabled, orphan VDIs accumulate and consume storage space without reclamation, and VHD snapshot chains grow unbounded, degrading I/O performance and eventually causing chain-length errors. The SR.other_config field has no map_keys_roles entries for infrastructure keys.
SR.other_config is a Map(String, String) field defined at datamodel.ml:4930-4935 with _R_POOL_OP as the minimum write role.
The SM garbage collector checks these keys during its scan cycle:
cleanup.py:2052:
other_config.get(VDI.DB_GC) == "false"
# When true, GC is disabled for this SR
cleanup.py:2090:
other_config.get(VDI.DB_COALESCE)
# When "false", coalesce is disabled for this SR
The gc key controls whether the garbage collector reclaims orphan VDIs (VDIs with no attached VBDs and no parent references). The coalesce key controls whether VHD chain coalescing runs - the process that merges child VHD images into their parent after snapshot deletion.
Both keys accept arbitrary string values with no validation. The only check is string comparison against "false". Setting either key to "false" silently disables the corresponding operation with no logging, no alert, and no expiration.
The effects are progressive and silent:
GC disabled: Orphan VDIs accumulate. Each snapshot deletion creates an orphan that is never reclaimed. Storage consumption grows monotonically.
Coalesce disabled: VHD chains grow with each snapshot cycle. I/O latency increases as each read must traverse more chain links. Eventually the VHD chain length reaches the kernel limit (typically 20-30 levels) and VDI operations fail.
Missing RBAC protection. SR.other_config has map_keys_roles entries only for UI keys. The gc and coalesce keys are writable by any pool-operator.
Silent disablement. Disabling GC or coalesce produces no alert, no log message at warning level, and no XAPI event. The operator has no indication that storage maintenance has stopped.
No expiration or time-bound. Once set, the keys persist indefinitely. There is no mechanism to automatically re-enable GC/coalesce after a maintenance window.
set_other_config RBAC bypass. The set_other_config method replaces the entire map atomically and bypasses map_keys_roles per-key checks.
| Scenario | Impact | Pre-conditions | Status |
|---|---|---|---|
| Storage exhaustion | Orphan VDIs accumulate until SR is full | pool-operator, set gc=false | Source-traced |
| VHD chain length exceeded | VDI operations fail when chain exceeds kernel limit | pool-operator, set coalesce=false, active snapshot cycle | Source-traced |
| I/O performance degradation | Read latency increases linearly with chain depth | pool-operator, set coalesce=false | Source-traced |
| BOC-1 chain | vm-admin disables GC/coalesce across all SRs via RBAC collapse | vm-admin, BOC-1 | Source-traced |
SR.other_config for the gc and coalesce keysgc=false or coalesce=false on any production SRdisclosure/vendor-detection-guidance.mdSR.other_config entries for gc=false or coalesce=falseAdd map_keys_roles protection. Restrict gc and coalesce keys to _R_POOL_ADMIN in datamodel.ml.
Add alerting. Generate a XAPI alert when GC or coalesce is disabled, so operators are aware of the change.
Add time-bound disablement. If GC/coalesce disablement is a legitimate maintenance operation, add a timeout mechanism that automatically re-enables after a configurable duration.
Upstream patches exist. They are held privately pending coordinated disclosure.
Disclosure:
datamodel.ml:4930-4935 (SR.other_config field definition), cleanup.py:2052 (gc key check), cleanup.py:2090 (coalesce key check)disclosure/advisories/soc-security-advisory.md (SOC-5)research/investigations/sr-other-config.mdDiscovered and reported by Jakob Wolffhechel, Moksha.