NX-G9: AHV Unexpectedly Hangs with Marvell NVMe Device

From: NX-G9: Unexpectedly Hangs with Host Marvell NVMe Device Aborting Reset

Descripton:

Nutanix has identified a critical issue affecting various NX-G9 platforms that may cause unexpected host hangs and trigger HA (High Availability) failover events. This results in user VMs being restarted on other nodes in the cluster, potentially disrupting workloads.

Affected Platforms

Multi-node: NX-3060-G9, NX-3035-G9, NX-1065-G9
Single-node: NX-3155-G9, NX-8155-G9, NX-8150-G9, NX-8170-G9

Symptoms

  • Host appears to fail with critical alerts in Prism
  • CVM and host are pingable, but SSH to host is unresponsive
  • In cluster status services seem to show UP
  • IPMI console login hangs, shows error:
    systemd[1]: Failed to start Journal Service
    and/or
    nvme: nvme0: Device not ready: aborting reset, CSTS=0x1

Solution

For Detailed information or Guidance check the linked KB in the Nutanix Support Portal or Open a Ticket with Nutanix Support

For Single-node Platforms (NX-3155-G9, NX-8155-G9, NX-8150-G9, NX-8170-G9)

Disable NCC check: nvme_raid_checks, which uses in-band commands that may trigger the RAID controller hang
How to disable Health Checks see: Prism 7.0 – Configuring Health Checks

For Multi-node Platforms (NX-3060-G9, NX-3035-G9, NX-1065-G9)

Upgrade to BMC 1.01.10 – as a workaround the BMC have removed Marvell query routine, which is persistent to any AC power cycle or BMC reset

  1. Login to local CVM of the host with the above applicable platforms 
  2. Download the script – kb18663-MRV-sensor-disable.sh [md5 = 7db230e026f232824d3603db131ee970]
    nutanix@CVM:~$ wget https://download.nutanix.com/kbattachments/18663/kb18663-MRV-sensor-disable.sh
  3. Change the permission to executable (-rwx——) 
    nutanix@CVM:~$ chmod 700 kb18663-MRV-sensor-disable.sh

    nutanix@CVM:~$ ls -lhrt kb18663-MRV-sensor-disable.sh
    -rwx------. 1 nutanix nutanix 3.6K May 16 23:51 kb18663-MRV-sensor-disable.sh

  4. Disable the BMC sensor polling by running the script – ./kb18663-MRV-sensor-disable.sh
    nutanix@CVM:~$ ./kb18663-MRV-sensor-disable.sh
    Logging execution (/home/nutanix/tmp/kb18663-MRV-sensor-disable.log)
    Verifying model …
    model_string: 'NX-1065-G9'
    Checking if sensor is disabled …
    Found default setting ' b4' (not disabled)
    Disabling sensor …
    Verifying disabled …
    After setting to 0 to (disable) - result is now ' 00' (Sat May 17 02:16:20 UTC 2025)

  5. You can check the sensor polling status, by running ./kb18663-MRV-sensor-disable.sh status
    nutanix@CVM:~$ ./kb18663-MRV-sensor-disable.sh status
    Logging execution (/home/nutanix/tmp/kb18663-MRV-sensor-disable.log)
    Verifying model …
    model_string: 'NX-1065-G9'
    Checking if sensor is disabled …
    Result ' 00' - 'NX-1065-G9' sensor is verified as disabled

Nutanix KB Article KB-18663

SOURCE: NX-G9: Unexpectedly Hangs with Host Marvell NVMe Device Aborting Reset

Leave a Comment

Your email address will not be published. Required fields are marked *