Cisco UCS

UCS FI bootflash clean but with errors

Last month we began the process of upgrading all our UCS domains to the newer 4.1 code release trains to enable new functionality/hardware and resolve some minor bugs. After we completed our first domain we were treated to a new fault code on each Fabric Interconnect, approximately 45 minutes after it’s individual upgrade completed. The fault code thrown was “Partition bootflash on fabric interconnect X is clean but with errors.” As this domain happened to be the very first UCS domain we ever had we assumed it may be an issue with the actual NVRAM in the FI. We triaged the case with TAC and determined that this was just enhanced file system checking in the 4.1 code train and that the fix is to run a e2fsck against the bootflash. In previous versions of UCS this would require the debug utility and manually running some commands. However, in 4.1(2a) and 4.0(4k) Cisco added the ability to run a e2fsck from the UCS CLI on the Fabric Interconnect. We figured this would be a one-off case, isolated to this domain, and didn’t think much of this.

However, we just completed our second domain upgrade last night and low-and-behold one of the two FIs raised the same fault. So now that we’ve encountered this on two of our domains (the second of which is one of our newer domains, but only one of the FIs raised the fault), I’m documenting this for future reference!

Cisco has a public bug report (CCO account required, though) documenting the release of the enhancement: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvq17291

The process to run a e2fsck is as follows:

Log in to UCS CLI
Connect to the local-mgmt shell for the FI that has the fault.
connect local-mgmt <a|b>
Issue the reboot command with the e2fsck argument. This will trigger the FI to reload and run a e2fsck at bootup.
reboot e2fsck

Note: This will obviously cause one of your Fabric Interconnects to be unavailable while it reloads. So ensure you have a maintenance window and verified your equipment is properly connected/setup for failover to survive the reboot.

Note 2: It may take some time still for the fault to clear after the FI reboots from it’s e2fsck. This is normal. If it hasn’t resolved within a few days you should open a TAC case as you may have faulty bootflash.

Steve Sumichrast

Previous « UCS Reserved VLANs

View Comments

Arnold says:

April 12, 2021 at 3:13 pm

Thank you. Will try this.
Even the new FI's coming straight from the box have these errors after the upgrade.
- Steve Sumichrast says:
  
  June 2, 2021 at 4:17 pm
  
  We haven't had it hit one of our new FIs yet, but definitely seeing it across our fleet. Very frustrating!
Nagy says:

April 24, 2021 at 8:31 am

Had the same issue

Thank you
- Steve Sumichrast says:
  
  June 2, 2021 at 4:17 pm
  
  Hope it helps! Just had to run the procedure on another domain we upgraded. Sheesh this is annoying. :)
  - Gourab Roy says:
    
    May 13, 2022 at 1:20 pm
    
    How much time does it take to complete the file system check and reboot for one FI?
    - Steve Sumichrast says:
      
      January 21, 2023 at 9:25 pm
      
      The check only takes a few minutes. It may take a day or two for the error to clear from UCS manager.