Asm Health Checker Found — 1 New Failures Updated
When you see "ASM Health Checker found 1 new failures updated" in the ASM alert log, follow this systematic diagnostic procedure.
If you are an Oracle Database Administrator (DBA) managing an Oracle Real Application Clusters (RAC) environment, you have likely encountered a cryptic but critical message in your alert logs or monitoring console: "ASM Health Checker found 1 new failures updated."
At first glance, this message can induce panic. Does it mean data loss? Is your disk group about to crash? Will your production database go offline? Fortunately, in most cases, this alert is a proactive warning from Oracle’s Automatic Storage Management (ASM) diagnostics framework. However, ignoring it can lead to severe performance degradation or service interruption.
This comprehensive guide will dissect every aspect of this error message. We will explore what the ASM Health Checker is, why it triggers this alert, how to diagnose the specific failure, and step-by-step remediation strategies.
If the disk is still NORMAL but shows path errors:
Introduction
The message "ASM health checker found 1 new failures updated" signals that a monitoring component (an ASM health checker) has detected and recorded a newly identified failure in a system. This brief notification encapsulates operational realities—detection, state change, and the need for response—and invites examination of its technical meaning, potential causes, implications, and recommended actions. asm health checker found 1 new failures updated
What the message means
Possible contexts and specific interpretations
Likely root causes (examples)
Operational impacts
Recommended immediate steps (triage checklist) When you see "ASM Health Checker found 1
Longer-term remediation and prevention
Communicating about the incident
Conclusion
The single-line notice "ASM health checker found 1 new failures updated" is a prompt to investigate. While one new failure may be harmless in a fault-tolerant system, it can also be the first sign of worsening conditions. Rapid, evidence-based triage followed by durable fixes and improved monitoring reduces risk and operational burden.
In multipath environments (e.g., DM-Multipath on Linux, PowerPath on AIX), a loss of one path to a disk does not immediately offline the disk. However, the ASM Health Checker detects increased I/O latency or path errors and reports a new failure, even if the disk remains online.
asmcmd health check
Or from SQL:
SELECT name, state, type, total_mb, free_mb
FROM v$asm_diskgroup;
This is severe. Run a manual check:
ALTER DISKGROUP data CHECK ALL;
If errors are found, you may need to:
Note: Do not attempt ALTER DISKGROUP ... CHECK REPAIR unless you fully understand the implications.
Scenario A: Transient Failure If the underlying issue was a temporary glitch (e.g., a loose fiber cable or a brief network blip), the disk might still be repairable. If the OS can see the disk again, you may be able to issue:
ALTER DISKGROUP <diskgroup_name> ONLINE DISK <disk_name>;
This will initiate a rebalance operation to resync the data. If the disk is still NORMAL but shows
Scenario B: Permanent Hardware Failure If the disk has physically failed, you must replace it at the hardware level.