Cisco UCS

UCS FI bootflash clean but with errors

Last month we began the process of upgrading all our UCS domains to the newer 4.1 code release trains to enable new functionality/hardware and resolve some minor bugs. After we completed our first domain we were treated to a new fault code on each Fabric Interconnect, approximately 45 minutes after it’s individual upgrade completed. The fault code thrown was “Partition bootflash on fabric interconnect X is clean but with errors.” As this domain happened to be the very first UCS domain we ever had we assumed it may be an issue with the actual NVRAM in the FI. We triaged the case with TAC and determined that this was just enhanced file system checking in the 4.1 code train and that the fix is to run a e2fsck against the bootflash. In previous versions of UCS this would require the debug utility and manually running some commands. However, in 4.1(2a) and 4.0(4k) Cisco added the ability to run a e2fsck from the UCS CLI on the Fabric Interconnect. We figured this would be a one-off case, isolated to this domain, and didn’t think much of this.

However, we just completed our second domain upgrade last night and low-and-behold one of the two FIs raised the same fault. So now that we’ve encountered this on two of our domains (the second of which is one of our newer domains, but only one of the FIs raised the fault), I’m documenting this for future reference!

Cisco has a public bug report (CCO account required, though) documenting the release of the enhancement: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvq17291

The process to run a e2fsck is as follows:

  1. Log in to UCS CLI
  2. Connect to the local-mgmt shell for the FI that has the fault.
    connect local-mgmt <a|b>
  3. Issue the reboot command with the e2fsck argument. This will trigger the FI to reload and run a e2fsck at bootup.
    reboot e2fsck

Note: This will obviously cause one of your Fabric Interconnects to be unavailable while it reloads. So ensure you have a maintenance window and verified your equipment is properly connected/setup for failover to survive the reboot.

Note 2: It may take some time still for the fault to clear after the FI reboots from it’s e2fsck. This is normal. If it hasn’t resolved within a few days you should open a TAC case as you may have faulty bootflash.

Steve Sumichrast

View Comments

  • Thank you. Will try this.
    Even the new FI's coming straight from the box have these errors after the upgrade.

    • We haven't had it hit one of our new FIs yet, but definitely seeing it across our fleet. Very frustrating!

    • Hope it helps! Just had to run the procedure on another domain we upgraded. Sheesh this is annoying. :)

      • How much time does it take to complete the file system check and reboot for one FI?

        • The check only takes a few minutes. It may take a day or two for the error to clear from UCS manager.

Share
Published by
Steve Sumichrast
Tags: 4.1faultUCS

Recent Posts

UCS Reserved VLANs

Anyone that has spent any time with Cisco equipment should just come to expect that…

4 years ago

FlashArray Host Personalities for ESXi

When Pure Storage released Purity 5.1 for the FlashArray they introduced a new host feature…

5 years ago

Using PowerShell to store S3 objects on Pure Storage FlashBlade

Update 12/5/2018: Pure Storage has modified the returned JSON file.  The AccessKey is no longer…

6 years ago

Booting ESXi in UEFI mode on Cisco UCS

Note: This process should work for Windows and Linux as well.  Verify the EFI boot…

6 years ago

Legrand Home Automation DNS Change

We built our house a few years ago.  Before all the drywall went up I…

6 years ago

Are you registered for VMworld 2018?

It's almost VMworld time! Have you checked out the VMworld site yet and registered?  VMworld…

6 years ago