Home header
Linux temps réel embarqué et outils de développements Technique

OOM Killer for embedded Linux systems

What is oom killer?

It is a way for Linux 2.6 to recover memory when the system lacks of (memory). The way the kernel do this is by killing a process to recover its memory.

A routine, oom_kill() is called from memory management system when no memory is available to choose and kill a process.

A routine, badness(), is called from select_bad_process itself called from out_of_memory() inside oom_kill.c to determine the process ability to be killed

Problem with embedded systems

On embedded systems, as on real-time systems, we want to be deterministic and may want to implement some kind of degraded mode or reconfiguration.

The actual implementation of oom killer implements a way to reduce the badness of a process by seting the value of oomkilladj which is used to reduce the badness of a process.
This is allready a way for most server or workstations to protect some process from being killed by oom-killer.
But it is quite complicated to use in a deterministic way like implementing groups of processes with predefined protections because the badness of a process, in the actual implementation, is a function of the time and will change during the system live.

Proposed solution for embedded systems

The solution I propose is to add a new /proc/pid/oom_ranking value which can be modified by a reconfiguration process.
The selec_bad_process() function test if the /proc/pid/oom_ranking is under a defined threshold and protect the process in this case.
It is simple and deterministic. The oom_killer algorithm will then choose between the remaining unprotected processes.
Even there is allready the protection of the process by setting the oomkilladj variable to OOM_DISABLE, the interrest of this implementation is the easy way for a reconfiguration process to set different groups of process with different ranking and to adjust the threshold with the evolution of the system.

The implementation is triggered by CONFIG_OOM_EMBEDDED The kernel hook is installed at the begining of the selec_bad_process()

I implemented the change with:
Overall ranking: threshold with minimal value of oom_ranking to kill a process.

if(p->oom_ranking < oom_rank_threshold)

The difference with the use of (p->oomkilladj == OOM_DISABLE) is for degradded mode managment and reconfiguration issue.

One can define different kind of processes:

  • Unkillable: oomkilladj == OOM_DISABLE
  • Protected : oom_ranking < oom_rank_threshold
  • eligible : worst ranked by badness()

Another variable, oom_reconfigure_wanted is incremented when the oom_killer has been invoked.
This let a re-configuration managment task to take apropriate decision to reconfigure the system in a degraded mode.

Another change is the change of the call to panic() when no memory could be recoverd to a call to reboot triggered by CONFIG_OOM_EMBEDDED_REBOOT:

If you do not use this option you can use to set panic_timeout to reboot the system, this let you reboot more cleanly (if possible) and analyse the crash with screen information or crash handler.
The problem with it is that it takes at least one second before rebooting and may also hang the system, a thing we cannot aaccept in embedded systems.

Where to find the patch?

You can find the patch here: /downloads/oomkiller_embeddedsys.patch

You can browse the modified code here:

©M.N.I.S Society | Products | Services | Training | Support | Partners | News | Downloads ©M.N.I.S