Last month we began the process of upgrading all our UCS domains to the newer 4.1 code release trains to enable new functionality/hardware and resolve some minor bugs. After we completed our first domain we were treated to a new fault code on each Fabric Interconnect, approximately 45 minutes after it’s individual upgrade completed. The fault code thrown was “Partition bootflash on fabric interconnect X is clean but with errors.” As this domain happened to be the very first UCS domain we ever had we assumed it may be an issue with the actual NVRAM in the FI. We triaged the case with TAC and determined that this was just enhanced file system checking in the 4.1 code train and that the fix is to run a e2fsck against the bootflash. In previous versions of UCS this would require the debug utility and manually running some commands. However, in 4.1(2a) and 4.0(4k) Cisco added the ability to run a e2fsck from the UCS CLI on the Fabric Interconnect. We figured this would be a one-off case, isolated to this domain, and didn’t think much of this.
However, we just completed our second domain upgrade last night and low-and-behold one of the two FIs raised the same fault. So now that we’ve encountered this on two of our domains (the second of which is one of our newer domains, but only one of the FIs raised the fault), I’m documenting this for future reference!
Cisco has a public bug report (CCO account required, though) documenting the release of the enhancement: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvq17291
The process to run a e2fsck is as follows:
- Log in to UCS CLI
- Connect to the local-mgmt shell for the FI that has the fault.
connect local-mgmt <a|b>
- Issue the reboot command with the e2fsck argument. This will trigger the FI to reload and run a e2fsck at bootup.
Note: This will obviously cause one of your Fabric Interconnects to be unavailable while it reloads. So ensure you have a maintenance window and verified your equipment is properly connected/setup for failover to survive the reboot.
Note 2: It may take some time still for the fault to clear after the FI reboots from it’s e2fsck. This is normal. If it hasn’t resolved within a few days you should open a TAC case as you may have faulty bootflash.