Intel 25G XXV710 NIC not coming up after a maintenance activity

When you use INTEL XXV710 25G NICs and you reboot your connected switch(es) your connection might randomly go down.
And it will not come back so easily.

Here’s why:

This is due to a bad firmware on your NICs.
Intel NIC firmware between 8.00. and 8.40 are affected.

Which NICs do I use?

A: Check your LCM Inventory
B: From your CVM:
hostssh lspci | grep -i eth
ncc hardware_info show_hardware_info

Which Firmware Version do I currently run on?

Sadly in LCM we only see the Build-Number e.g. 0x8000b703.
To find the version number you can use the following command from your CVM:
hostssh ethtool -i eth5
or
ncc hardware_info show_hardware_info
Here you should get something like this as output:

| Location | eth2 |
| Device name | eth2 |
| Driver name | i40e |
| Firmware version | 8.40 0x8000b52f 1.2063.0 |
| Mac address | 00:00:00:00:00:00 |
| Manufacturer | Intel Corporation |
| Product name | Ethernet Controller XXV710 for 25GbE SFP28 |

There you find the version number of your firmware which might be easier to read and compare.

How to fix the issue:


A: Update your firmware to at least 8.50 via LCM
B: Via IPMI: Completely power off and then start up your node. A reboot via IPMI did not help (at least for me).
This is only a temporary solution to get your nodes up and running again. You still need to update your firmware.

Verification:


To verify the version after the firmware update use these commands again.
hostssh ethtool -i eth5
or
ncc hardware_info show_hardware_info

The fixed Firmware Version shown in LCM are different for G7 and G8 Nodes.
For me it was
G7: 8.50 0x8000c445
G8: 8.50 0x8000b703

Nutanix also has a KB for this issue:
https://portal.nutanix.com/kb/13085

Leave a Comment

Your email address will not be published. Required fields are marked *