Kernel Panic: What to do When the System Hangs?
What is a Kernel Panic?
A Kernel Panic is one of the most serious errors that can occur in a Linux operating system. It’s a situation where the system’s kernel encounters a critical error from which it cannot recover. As a result, the system halts its operations to prevent further data corruption. Typically, a detailed error message is displayed on the screen, and the system becomes unresponsive.
Although a Kernel Panic may look intimidating, it is a defense mechanism. But what should the system do after a panic occurs? This is where the kernel.panic parameter comes in.
What is kernel.panic for?
kernel.panic is a system parameter (sysctl) that allows an administrator to define how the operating system should behave after a Kernel Panic occurs. By default, the system simply “freezes,” waiting for a manual reboot. In server environments, especially those without direct physical access, this behavior is undesirable.
Thanks to kernel.panic, we can command the system to automatically reboot after a specified amount of time, which is crucial for ensuring high availability of services.
Possible Values and Their Meanings
The value of the kernel.panic parameter is an integer that specifies the number of seconds after which the system should automatically reboot following a panic.
You can check and set these values using the sysctl command:
# Check the current value
sysctl kernel.panic
# Set a new value (e.g., reboot after 10 seconds)
sudo sysctl -w kernel.panic=10
To make the change permanent, you should add the corresponding entry to the /etc/sysctl.conf file or a file in /etc/sysctl.d/:
kernel.panic = 10
Here is what the different values mean:
-
kernel.panic = 0(default value)- Behavior: The system does not reboot automatically. After a Kernel Panic, the system halts and waits for administrator intervention (a manual restart). This is useful for diagnostic purposes, as it allows for an examination of the error message on the screen or via a serial console.
-
kernel.panic > 0(a positive value, e.g.,10)- Behavior: The system will automatically reboot after the specified number of seconds. For example, if the value is
10, the system will wait 10 seconds and then restart. This is the most commonly used option on production servers, where a quick return to operation is a priority.
- Behavior: The system will automatically reboot after the specified number of seconds. For example, if the value is
-
kernel.panic < 0(a negative value, e.g.,-1)- Behavior: The system will reboot immediately, without any delay. This option is used in highly critical systems where every second of downtime matters, and a delay for a memory dump (kdump) is not required.
Summary
The kernel.panic parameter is a simple yet powerful tool in a Linux administrator’s arsenal. It allows for the automation of the response to one of the most severe system errors, which translates to greater stability and service availability. On production servers, setting it to a small positive value (e.g., 5 or 10 seconds) is almost a mandatory practice.