HPE Warns Of SSD Storage Systems Failure Coming Soon
[German]A counter in the firmware of certain HPE SSD storage systems may overflow in the coming days, causing data loss and system failure. The manufacturer HPE warns against this scenario and asks device administrators to update the firmware.
HPE warns of SSD storage systems failure coming soon
In the affected HPE SSD storage media a counter for the operating hours runs in the firmware. In the SSD firmware version installed on the devices, a drive failure and data loss occurs after 32,768 operating hours. This requires a recovery of data from the backup. Using the drives in a fault tolerant RAID mode (e.g. RAID 0) does not protect against data loss, if more drives fail than are supported by the logical drive in the fault tolerant RAID mode. This scenario is likely to cause the SSDs in the RAID array to reach operating hours and fail almost simultaneously.
[German]Manufacturer HPE warns again of a problem with its SSDs for server storage systems (SSDs). A software bug in the firmware can cause the server storage systems to fail after 40,000 hours. A firmware update is available.
I had already pointed out a similar issue in November 2019 in the article HPE warns of SSD storage systems failure coming soon. At that time the issue was a counter in a firmware which caused a data loss after 32,768 hours of operation. Although the current failure is similar to the case from November 2019, this problem is not related to the SSD problem described last year.
In this support document HPE now warns of a new failure scenario. Certain models of HPE SAS Solid State Drives require a critical firmware update to prevent a drive failure at 40,000 hours of operation. HPE has been notified by a Solid State Drive (SSD) manufacturer of a firmware defect affecting certain SAS SSD models:
The SSD models were built into a range of HPE server and storage products (i.e. HPE ProLiant, Synergy, Apollo 4200, Synergy storage modules, D3000 storage enclosures, StoreEasy 1000 storage). The problem affects SSDs with HPE firmware version prior to HPD7, which causes SSD failure at 40,000 hours of operation (i.e. 4 years, 206 days, 16 hours).
We recently replaced a failed drive in a Smart Array P420i and after a day of work, the drive is showing a warning: Physical Drive State: Predictive failure. This physical drive is predicted to fail soon.
Unlike in HDDs, there are no physical moving platters in SSDs, so they're immune to old hard disk issues. However, while the storage component itself isn't susceptible to mechanical failure, other components are.
You may find that your system reports that a S.M.A.R.T. error has occurred on the hard drive. S.M.A.R.T. errors are a near-term prediction of drive failure. It is important to realize that the drive may appear to be functioning normally. Even some diagnostic tests could still have a PASS status. A S.M.A.R.T. error is a prediction that the diagnostic test will soon fail.
A solid-state storage device is not usually a component of HP 3000 configurations. However, with the onset of virtualizing MPE servers, those drives that do not move, but still store? They are heading for absolute failures. HP is warning customers.
For some, HPD7 firmware is a critical fix. HPE says that Western Digital told the vendor about failures in certain Serial Attached Storage (SAS) models inside HPE server and storage products. Some SAS SSD drives can use external connections to HPE's VMS Itanium servers.
Knowing how to spot the signs of an imminent SSD failure, as well as understanding how to troubleshoot a malfunctioning SSD, can mark the difference between permanent data loss and a trouble-free recovery. Like any storage device, an NVMe SSD will eventually fail; the only variable is when. Unlike hard drives, SSDs can't send an audible warning that something may be going wrong. Yet, while the SSD may be dead, all is not necessarily lost.
SSD problems usually don't become apparent until they begin causing major trouble. The sooner you know there's a problem, the faster you can respond to the situation and minimize the impact. "Make sure you use hardware monitoring software to track ... components for I/O speed, bad blocks and other failure modes so you know as soon as possible [when] something is going south," Adato said.
In the announcement(Opens in a new window) for the Windows 10 Insider Preview Build 20226, Brandon LeBlanc, Senior Program Manager at Microsoft, says this new drive health monitoring feature "is designed to detect hardware abnormalities for NVMe SSDs." When such an abnormality is detected, a notification will appear on the desktop stating, "A storage device may be at risk of failure and requires your attention." There's also a clickable link to load up Windows 10's drive management and backup options, which also provides more detail on why Windows sent the notification.
The Detached Operational Status can occur if the dirty region tracking (DRT) log is full. Storage Spaces uses dirty region tracking (DRT) for mirrored spaces to make sure that when a power failure occurs, any in-flight updates to metadata are logged to make sure that the storage space can redo or undo operations to bring the storage space back into a flexible and consistent state when power is restored and the system comes back up. If the DRT log is full, the virtual disk can't be brought online until the DRT metadata is synchronized and flushed. This process requires running a full scan, which can take several hours to finish.
If you hear strange sounds emanating from your computer, hard drive failure is imminent. Clicking or grinding noises coming from the drive are signals that the read/write head or other mechanical component is on its last legs. Your data may be accessible, for now. It is highly recommended to take immediate action to either backup the data or recover the data to another hard drive.
When your computer says hard disk failure imminent, it means your hard drive right now is not yet dead, but will be sooner or later. It's just a matter of time. It could be in 1 hour, could be in a week. Who knows?