A pool-operator in XAPI-based hypervisors (XenServer, XCP-ng) can activate VHD failure injection on production storage by setting SR.other_config:testmode to a recognized test mode string (e.g., vhd_fail_reparent_begin). The SM driver matches the value against the ENV_VAR_VHD_TEST dictionary in LVHDSR.py:109-133. Matching values cause environment variables to be set that instruct vhd-util to simulate failures during reparent, resize, and other structural VHD operations. The resulting failures leave VHD metadata in an inconsistent state - a targeted data corruption attack that appears as a legitimate internal error. The testmode key has no per-key RBAC protection.
SR.other_config is a Map(String, String) field writable by pool-operator. The testmode key is consumed by the LVHD SR driver to activate VHD utility failure simulation.
The code path:
pool-operator calls SR.add_to_other_config(sr, "testmode", "vhd_fail_reparent_begin")LVHDSR.__init__() reads self.testMode = self.other_conf.get('testmode') (LVHDSR.py:196-198)ENV_VAR_VHD_TEST dictionary keys (LVHDSR.py:109-133)VHD_UTIL_TEST_FAIL_REPARENT_BEGIN) is set to "yes"vhd-util reads the environment variable and simulates failure at the specified operation pointKnown test mode triggers in ENV_VAR_VHD_TEST:
| testmode value | Environment variable | Operation disrupted |
|---|---|---|
vhd_fail_reparent_begin |
VHD_UTIL_TEST_FAIL_REPARENT_BEGIN |
VHD reparent start |
vhd_fail_reparent_end |
VHD_UTIL_TEST_FAIL_REPARENT_END |
VHD reparent completion |
vhd_fail_reparent_locator |
VHD_UTIL_TEST_FAIL_REPARENT_LOCATOR |
VHD parent locator update |
vhd_fail_resize_begin |
VHD_UTIL_TEST_FAIL_RESIZE_BEGIN |
VHD resize start |
vhd_fail_resize_end |
VHD_UTIL_TEST_FAIL_RESIZE_END |
VHD resize completion |
vhd_fail_resize_metadata_begin |
VHD_UTIL_TEST_FAIL_RESIZE_METADATA_BEGIN |
VHD metadata resize start |
Test infrastructure exposed in production. VHD failure injection test modes are accessible from a user-writable field in production deployments. Debug/test functionality is not isolated from production code paths.
Missing RBAC protection. SR.other_config has no map_keys_roles entry for testmode. The key inherits the class default _R_POOL_OP.
No environment separation. There is no build flag, runtime configuration, or environment check that disables test mode activation in production. The same code runs in test and production environments.
Corruption appears legitimate. VHD metadata corruption from test mode failures is indistinguishable from genuine I/O errors. The attack produces no indicators of intentional sabotage.
| Scenario | Impact | Pre-conditions | Status |
|---|---|---|---|
| VHD reparent failure | VHD chain becomes inconsistent after snapshot coalesce - data corruption | pool-operator, LVHD SR with active snapshots | Source-traced |
| VHD resize failure | VDI resize fails mid-operation, leaving VHD metadata inconsistent | pool-operator, LVHD SR with VDI resize operations | Source-traced |
| Silent data corruption | Corruption appears as legitimate internal error, no attacker indicators | pool-operator, any LVHD SR | Modeled |
| BOC-1 chain | vm-admin uses BOC-1 S3 to self-grant pool-operator, then activates test mode | vm-admin, BOC-1 + LVHD SR | Source-traced |
SR.other_config for the testmode key - any non-empty value on a production SR is suspiciousTEST_FAIL environment variables setVHD_UTIL_TEST_* variablesdisclosure/vendor-detection-guidance.mdSR.other_config records for testmode keystestmode values from production SRs immediatelyRemove test mode from production. Gate the testmode functionality behind a compile-time flag or a root-only local file check. Do not expose test failure injection via a user-writable API field.
Add map_keys_roles. If the key must remain, protect testmode at _R_LOCAL_ROOT_ONLY in datamodel.ml so it cannot be set via the API.
Upstream patches exist. They are held privately pending coordinated disclosure.
Disclosure:
datamodel.ml:4930-4935 (SR.other_config field definition), LVHDSR.py:109-133 (ENV_VAR_VHD_TEST dictionary), LVHDSR.py:196-198 (testmode read from other_config)disclosure/advisories/soc-security-advisory.md (SOC-3)research/investigations/sr-other-config.mdDiscovered and reported by Jakob Wolffhechel, Moksha.