← Advisory Index « MOKSHA-2026-0017 MOKSHA-2026-0019 »

MOKSHA-2026-0018: HA Timeout Manipulation via Pool.other_config (Split-Brain/Blindness)

Advisory ID	MOKSHA-2026-0018
Semantic ID	PLOC-2
Published	2026-04-24
CVSS 3.1	7.6 High
CVSS 3.1 Vector	`AV:N/AC:L/PR:H/UI:N/S:C/C:N/I:N/A:H`
CVSS 4.0	8.2 High
CVSS 4.0 Vector	`AV:N/AC:L/AT:N/PR:H/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:H`
XAPI Object	`Pool`
XAPI Field	`other_config:default_ha_timeout`
Entry Role	pool-operator
Researcher	Jakob Wolffhechel, Moksha

Affected Products

Vendor	Product	Versions
Citrix / Cloud Software Group	XenServer / Citrix Hypervisor	all versions (shared XAPI codebase)
Vates	XCP-ng	8.3.0

Summary

A pool-operator can manipulate the High Availability timeout by setting Pool.other_config:default_ha_timeout to an arbitrary integer. The value is read by xapi_ha.ml:278-279 via int_of_string with no range check. Setting the timeout to 1 second causes spurious HA fencing events - hosts are incorrectly marked as dead, triggering cascading false fencing across the pool (split-brain condition). Setting the timeout to 999999 seconds effectively disables HA - actual host failures are not detected for days, leaving HA-protected VMs without failover protection. Both outcomes affect every HA-protected VM across the entire pool.

Vulnerability Description

Pool.other_config is the highest-scope other_config field in the XAPI data model. The default_ha_timeout key overrides the default HA heartbeat timeout used to determine whether a host is alive or dead.

Data Flow

pool-operator calls Pool.add_to_other_config(pool, "default_ha_timeout", "1")
  -> xapi_ha.ml:278-279 reads default_ha_timeout via int_of_string
  -> No range validation performed
  -> HA subsystem uses 1-second timeout for heartbeat monitoring
  -> Normal network latency exceeds 1 second -> all hosts marked dead
  -> Cascading fencing events: hosts reboot each other (split-brain)

pool-operator calls Pool.add_to_other_config(pool, "default_ha_timeout", "999999")
  -> HA subsystem uses ~11.5-day timeout
  -> Host failures not detected for days
  -> HA-protected VMs not restarted after actual host failure

Two Attack Modes

Mode 1 - Split-brain (timeout too low): Setting default_ha_timeout=1 causes the HA daemon to declare hosts dead after 1 second of missed heartbeats. Normal network jitter exceeds this threshold, triggering false fencing events. Multiple hosts simultaneously fence each other, causing a cascading reboot loop. All HA-protected VMs restart repeatedly.

Mode 2 - HA blindness (timeout too high): Setting default_ha_timeout=999999 makes HA unable to detect actual host failures for approximately 11.5 days. During this window, a failed host's HA-protected VMs are not restarted on surviving hosts.

Root Causes

Missing RBAC protection. Pool.other_config has zero map_keys_roles entries for infrastructure keys. The default_ha_timeout key is writable by pool-operator.
No range validation. xapi_ha.ml uses int_of_string with no bounds check. Any integer value is accepted, including values that make HA non-functional.
Pool-wide blast radius. The HA timeout applies to the entire pool. A single key write affects the failure detection behavior for every host and HA-protected VM.
Immediate effect. The changed timeout is read on the next HA monitoring cycle, with no confirmation or cooldown period.

Affected Systems

Directly Affected

XenServer / Citrix Hypervisor - all versions with HA enabled
XCP-ng - all versions with HA enabled

Indirectly Affected

All HA-protected VMs - either spuriously restarted (split-brain) or not restarted after actual failure (blindness)
Storage subsystem - split-brain fencing causes concurrent access violations on shared storage
Pool integrity - cascading fencing can leave the pool in an inconsistent state requiring manual recovery

Exploitation Scenarios

Scenario	Impact	Pre-conditions	Status
Split-brain (timeout=1)	Cascading fencing, all hosts reboot repeatedly	HA enabled	Modeled (code-traced: int_of_string with no range check at xapi_ha.ml:278-279)
HA blindness (timeout=999999)	Host failures undetected for days	HA enabled	Modeled (code-traced)
Storage corruption on split-brain	Concurrent access violations on shared storage during fencing	HA + shared storage	Modeled
BOC-1 chain	vm-admin escalates to pool-operator via BOC-1 S3, then manipulates HA timeout	BOC-1 available	Modeled (two-step chain)

Detection

Monitor Pool.other_config for writes to default_ha_timeout
Alert on HA timeout values outside the expected range (typically 30-120 seconds)
Monitor HA fencing events for unexpected frequency
See rule P-02 in disclosure/vendor-detection-guidance.md

Remediation

Short-Term Mitigations

Audit Pool.other_config for unexpected default_ha_timeout values
Monitor HA fencing event frequency for anomalies
Restrict pool-operator role to trusted administrators

Long-Term Fix

RBAC restriction. Add map_keys_roles entry for default_ha_timeout in datamodel.ml requiring _R_POOL_ADMIN.

Range validation. Validate default_ha_timeout at write time. Enforce a reasonable range (e.g., 10-600 seconds) and reject values outside it.

Write-time type checking. Validate that the value is a valid integer at write time, not just at read time.

Upstream patches exist. They are held privately pending coordinated disclosure.

Disclosure

Disclosure:

Public release: 2026-04-24 08:00 CEST
CERT/CC notified: 2026-04-23
MITRE CVE reservation filed: 2026-04-09 (no response as of publication)
EU CVE registers notified: 2026-04-18 (GCVE/CIRCL, ENISA, DIVD - no response)
Vendor (Cloud Software Group / Citrix): not contacted pre-publication
Downstream maintainer (Vates/XCP-ng): contacted 2026-04-23 with conditional patch offer

References

Related MOKSHA IDs: MOKSHA-2026-0013 (PLOC-6 pool-wide OVS DoS), MOKSHA-2026-0001 (BOC-1 privilege escalation)
XAPI source: xapi_ha.ml:278-279 (default_ha_timeout read via int_of_string), datamodel.ml (Pool field definition)
Thematic advisory: disclosure/advisories/ploc-security-advisory.md (PLOC-2)
Investigation: research/investigations/pool-other-config.md

Credits

Discovered and reported by Jakob Wolffhechel, Moksha.

Machine-readable: MOKSHA-2026-0018.json (CVE JSON 5.1 schema)

← Advisory Index « MOKSHA-2026-0017 MOKSHA-2026-0019 »

Jakob Wolffhechel · Moksha · Copenhagen
jakob@wolffhechel.dk · +45 3170 7337
Published 2026-04-24 08:00 CEST · cna.moksha.dk · shittrix.moksha.dk