MOKSHA-2026-0037: VHD Test Mode and Failure Injection via SR.other_config testmode

Advisory IDMOKSHA-2026-0037
Semantic IDSOC-3
Published2026-04-24
CVSS 3.16.5 Medium
CVSS 3.1 VectorAV:N/AC:L/PR:H/UI:N/S:U/C:N/I:H/A:H
CVSS 4.07.0 High
CVSS 4.0 VectorAV:N/AC:L/AT:N/PR:H/UI:N/VC:N/VI:H/VA:L/SC:N/SI:L/SA:N
XAPI ObjectSR
XAPI Fieldother_config:testmode
Entry Rolepool-operator
ResearcherJakob Wolffhechel, Moksha

Affected Products

VendorProductVersions
Citrix / Cloud Software GroupXenServer / Citrix Hypervisorall versions (shared XAPI codebase)
VatesXCP-ng8.3.0

Summary

A pool-operator in XAPI-based hypervisors (XenServer, XCP-ng) can activate VHD failure injection on production storage by setting SR.other_config:testmode to a recognized test mode string (e.g., vhd_fail_reparent_begin). The SM driver matches the value against the ENV_VAR_VHD_TEST dictionary in LVHDSR.py:109-133. Matching values cause environment variables to be set that instruct vhd-util to simulate failures during reparent, resize, and other structural VHD operations. The resulting failures leave VHD metadata in an inconsistent state - a targeted data corruption attack that appears as a legitimate internal error. The testmode key has no per-key RBAC protection.

Vulnerability Description

SR.other_config is a Map(String, String) field writable by pool-operator. The testmode key is consumed by the LVHD SR driver to activate VHD utility failure simulation.

The code path:

  1. pool-operator calls SR.add_to_other_config(sr, "testmode", "vhd_fail_reparent_begin")
  2. LVHDSR.__init__() reads self.testMode = self.other_conf.get('testmode') (LVHDSR.py:196-198)
  3. The value is matched against ENV_VAR_VHD_TEST dictionary keys (LVHDSR.py:109-133)
  4. If a match is found, the corresponding environment variable (e.g., VHD_UTIL_TEST_FAIL_REPARENT_BEGIN) is set to "yes"
  5. vhd-util reads the environment variable and simulates failure at the specified operation point
  6. The VHD structural operation fails mid-stream, leaving metadata inconsistent

Known test mode triggers in ENV_VAR_VHD_TEST:

testmode value Environment variable Operation disrupted
vhd_fail_reparent_begin VHD_UTIL_TEST_FAIL_REPARENT_BEGIN VHD reparent start
vhd_fail_reparent_end VHD_UTIL_TEST_FAIL_REPARENT_END VHD reparent completion
vhd_fail_reparent_locator VHD_UTIL_TEST_FAIL_REPARENT_LOCATOR VHD parent locator update
vhd_fail_resize_begin VHD_UTIL_TEST_FAIL_RESIZE_BEGIN VHD resize start
vhd_fail_resize_end VHD_UTIL_TEST_FAIL_RESIZE_END VHD resize completion
vhd_fail_resize_metadata_begin VHD_UTIL_TEST_FAIL_RESIZE_METADATA_BEGIN VHD metadata resize start

Root Causes

  1. Test infrastructure exposed in production. VHD failure injection test modes are accessible from a user-writable field in production deployments. Debug/test functionality is not isolated from production code paths.

  2. Missing RBAC protection. SR.other_config has no map_keys_roles entry for testmode. The key inherits the class default _R_POOL_OP.

  3. No environment separation. There is no build flag, runtime configuration, or environment check that disables test mode activation in production. The same code runs in test and production environments.

  4. Corruption appears legitimate. VHD metadata corruption from test mode failures is indistinguishable from genuine I/O errors. The attack produces no indicators of intentional sabotage.

Affected Systems

Directly Affected

Indirectly Affected

Exploitation Scenarios

Scenario Impact Pre-conditions Status
VHD reparent failure VHD chain becomes inconsistent after snapshot coalesce - data corruption pool-operator, LVHD SR with active snapshots Source-traced
VHD resize failure VDI resize fails mid-operation, leaving VHD metadata inconsistent pool-operator, LVHD SR with VDI resize operations Source-traced
Silent data corruption Corruption appears as legitimate internal error, no attacker indicators pool-operator, any LVHD SR Modeled
BOC-1 chain vm-admin uses BOC-1 S3 to self-grant pool-operator, then activates test mode vm-admin, BOC-1 + LVHD SR Source-traced

Detection

Remediation

Short-Term Mitigations

Long-Term Fix

Remove test mode from production. Gate the testmode functionality behind a compile-time flag or a root-only local file check. Do not expose test failure injection via a user-writable API field.

Add map_keys_roles. If the key must remain, protect testmode at _R_LOCAL_ROOT_ONLY in datamodel.ml so it cannot be set via the API.

Upstream patches exist. They are held privately pending coordinated disclosure.

Disclosure

Disclosure:

References

Credits

Discovered and reported by Jakob Wolffhechel, Moksha.

Jakob Wolffhechel · Moksha · Copenhagen
jakob@wolffhechel.dk · +45 3170 7337
Published 2026-04-24 08:00 CEST · cna.moksha.dk · shittrix.moksha.dk