Kdump — How to Capture and Analyze Linux Kernel Crashes

Introduction to Kdump

In the previous article, we discussed how to configure the system to automatically reboot after a Kernel Panic using the kernel.panic parameter. But what if we want to understand why the panic occurred? Simply restarting the system solves the availability problem but doesn’t help diagnose the cause. This is where kdump comes in.

kdump is an advanced mechanism in the Linux kernel that allows capturing the contents of system memory (a memory dump or crash dump) at the moment a Kernel Panic occurs. This dump can then be analyzed using specialized tools, such as crash, to identify a faulty driver, a bug in the kernel code, or another cause of the failure.

How Does Kdump Work?

The operation of kdump is based on the concept of two kernels:

System Kernel: This is the main kernel that your operating system runs on.
Capture Kernel: This is a second, minimalistic kernel that is loaded into a specially reserved area of RAM during system startup. It remains inactive during normal operation.

When a Kernel Panic occurs in the system kernel, the kexec mechanism takes over. kexec is a kernel feature that allows loading and running a new kernel directly from memory, bypassing the BIOS/UEFI stage. In the case of kdump, kexec starts the capture kernel.

This second kernel has one job: to collect the contents of the RAM (which contains information about the system’s state at the time of the crash) and save it as a dump file (usually in /var/crash/). Since the system kernel is no longer running, we need a second, independent environment to safely save the data.

Basic Configuration

The configuration of kdump varies depending on the distribution, but the general steps are similar.

Install the tools: On Red Hat-based systems (CentOS, Fedora, RHEL):
```
sudo dnf install kexec-tools
```
On Debian-based systems (Ubuntu):
```
sudo apt-get install kdump-tools
```
Reserve memory for the capture kernel: You need to reserve a fixed area of RAM that will be unavailable to the main kernel. This is done by adding the crashkernel parameter to the kernel command line in the bootloader configuration (e.g., GRUB).

Example in /etc/default/grub:
```
GRUB_CMDLINE_LINUX="... crashkernel=256M"
```
The value 256M is a typical size but may need adjustment depending on the amount of RAM in the system. After the change, you need to update the GRUB configuration, using the appropriate command for your distribution:
- For Red Hat-based systems (Fedora, CentOS, RHEL):
```
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
- For Debian-based systems (Ubuntu, Debian):
```
sudo update-grub
```
IMPORTANT: After updating GRUB, a system reboot is required for the changes to take effect.
Enable and start the kdump service:

The service name also differs between distributions.
- For Red Hat-based systems:
```
sudo systemctl enable kdump.service
sudo systemctl start kdump.service
```
- For Debian-based systems: The service is often called kdump-tools. After installing the package and rebooting, it should already be enabled. You can check its status with:
```
sudo systemctl status kdump-tools.service
```
Test the configuration: To ensure kdump is working, you can intentionally trigger a Kernel Panic:
```
# WARNING: This command will immediately crash your system!
# Use it only in a test environment.
echo c | sudo tee /proc/sysrq-trigger
```
After the system reboots, a new folder with a vmcore file should appear in the /var/crash/ directory – this is the memory dump.

Analyzing the Dump

The crash utility is most commonly used to analyze the vmcore file. It also requires access to the kernel’s debugging symbols, which must be installed. The package names for these symbols differ between distributions.

For Red Hat-based systems (e.g., Fedora):

sudo dnf install crash kernel-debuginfo

For Debian-based systems (e.g., Ubuntu): The package names may vary, but it is often linux-image-$(uname -r)-dbgsym or similar. You should search for it in the repositories.

After installing the appropriate packages, you can start the analysis by providing the path to the vmcore file and the vmlinux file with debugging symbols as arguments:

# Example of starting the analysis
sudo crash /var/crash/127.0.0.1-2025-12-21-21:30:00/vmcore /usr/lib/debug/lib/modules/$(uname -r)/vmlinux

The crash utility provides an interactive shell where you can execute commands like bt (backtrace, to see the call stack), log (to view kernel logs), or ps (to see processes running at the time of the crash).

Summary

kdump is an essential tool for advanced diagnostics of Linux kernel problems. Although its configuration requires some effort, the ability to analyze the system’s state after a crash is invaluable for solving complex and hard-to-reproduce issues. Combined with kernel.panic, kdump forms a solid foundation for a stable and reliable server environment.