When you use INTEL XXV710 25G NICs and you reboot your connected switch(es) your connection might randomly go down.
And it will not come back so easily.
Here’s why:
This is due to a bad firmware on your NICs.
Intel NIC firmware between 8.00. and 8.40 are affected.
Which NICs do I use?
A: Check your LCM Inventory
B: From your CVM:
– hostssh lspci | grep -i eth
– ncc hardware_info show_hardware_info
Which Firmware Version do I currently run on?
Sadly in LCM we only see the Build-Number e.g. 0x8000b703.
To find the version number you can use the following command from your CVM:hostssh ethtool -i eth5
orncc hardware_info show_hardware_info
Here you should get something like this as output:
| Location | eth2 | | Device name | eth2 | | Driver name | i40e | | Firmware version | 8.40 0x8000b52f 1.2063.0 | | Mac address | 00:00:00:00:00:00 | | Manufacturer | Intel Corporation | | Product name | Ethernet Controller XXV710 for 25GbE SFP28 |
There you find the version number of your firmware which might be easier to read and compare.
How to fix the issue:
A: Update your firmware to at least 8.50 via LCM
B: Via IPMI: Completely power off and then start up your node. A reboot via IPMI did not help (at least for me).
This is only a temporary solution to get your nodes up and running again. You still need to update your firmware.
Verification:
To verify the version after the firmware update use these commands again.hostssh ethtool -i eth5
orncc hardware_info show_hardware_inf
o
The fixed Firmware Version shown in LCM are different for G7 and G8 Nodes.
For me it was
G7: 8.50 0x8000c445
G8: 8.50 0x8000b703
Nutanix also has a KB for this issue:
https://portal.nutanix.com/kb/13085