Run dpkg --configure -a before apt operations to recover from any previously interrupted package installations.
Runaway Process Killer
Automatic protection against runaway CPU and RAM processes on Ubuntu servers. Prevents hosting provider throttling by killing processes that pin CPU to 95%+ for extended periods or exhaust available memory.
Overview
| Protection | Tool | Trigger | Action |
|---|---|---|---|
| CPU | monit | 95%+ CPU for 5 minutes | Kill all processes matching top CPU consumer |
| RAM | earlyoom | <5% free memory | Kill highest memory consumer |
| Orphan Claude | monit | Every 15 minutes | Kill Claude processes with no TTY or PPID=1 |
Requirements
- Ubuntu 20.04+ (tested on 24.04 LTS)
- Root access
- ~6MB RAM overhead total
Quick Install
curl -fsSL https://git.upfrontops.cloud/upfrontops/runaway-process-killer/raw/branch/main/install.sh | sudo bash
Or clone and run:
git clone https://git.upfrontops.cloud/upfrontops/runaway-process-killer.git
cd runaway-process-killer
sudo ./install.sh
Manual Installation
1. Install earlyoom (RAM Protection)
sudo apt update && sudo apt install -y earlyoom
Edit /etc/default/earlyoom:
EARLYOOM_ARGS="-m 5 -s 5 --avoid '(^|/)(init|systemd|sshd)$' -r 60"
Enable and start:
sudo systemctl enable earlyoom && sudo systemctl restart earlyoom
2. Install monit (CPU Protection)
sudo apt install -y monit
Edit /etc/monit/monitrc, change daemon interval:
set daemon 60
Enable HTTP interface (required for monit status):
set httpd port 2812 and
use address localhost
allow localhost
allow admin:monit
Create /etc/monit/conf.d/cpu-killer:
check system $HOST
if cpu usage > 95% for 5 cycles then exec "/usr/local/bin/kill-top-cpu.sh"
Create /usr/local/bin/kill-top-cpu.sh:
#!/bin/bash
# Kill the process tree using the most CPU (excluding critical ones)
# Find the top CPU consumer (excluding protected and transient processes)
TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1)
TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}')
COMM=$(echo "$TOP_LINE" | awk '{print $2}')
if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then
logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)"
# Kill all processes with this command name
pkill -9 -x "$COMM"
fi
Make executable and enable:
sudo chmod +x /usr/local/bin/kill-top-cpu.sh
sudo systemctl enable monit && sudo systemctl restart monit
Configuration
CPU Threshold Timing
Edit /etc/monit/conf.d/cpu-killer to change timing:
| Cycles | Time (at 60s interval) |
|---|---|
| 2 | 2 minutes |
| 5 | 5 minutes (default) |
| 10 | 10 minutes |
| 30 | 30 minutes |
RAM Threshold
Edit /etc/default/earlyoom:
| Setting | Meaning |
|---|---|
-m 5 |
Kill when free RAM < 5% |
-m 10 |
Kill when free RAM < 10% |
-s 5 |
Kill when free swap < 5% |
Protected Processes
earlyoom protects (via --avoid):
- init, systemd, sshd
kill-top-cpu.sh protects (via grep exclusion):
- systemd, sshd, monit, earlyoom, bash, ps, awk, grep, head
To add more protected processes, edit the grep pattern in /usr/local/bin/kill-top-cpu.sh.
Monitoring
Check Status
# earlyoom status
sudo systemctl status earlyoom
# monit status
sudo monit status
# Combined check
sudo ./scripts/status.sh
View Logs
# earlyoom logs
journalctl -u earlyoom -f
# monit logs
tail -f /var/log/monit.log
# Kill events
journalctl | grep -i "cpu-killer\|earlyoom\|orphan-claude"
Testing
Test CPU Killer
# Install stress tool
sudo apt install -y stress
# For quick testing, temporarily set to 2 cycles in /etc/monit/conf.d/cpu-killer
# then reload: sudo monit reload
# Start CPU stress (will be killed after threshold)
stress --cpu 4 --timeout 300
Test RAM Killer
# This will be killed quickly by earlyoom
stress --vm 4 --vm-bytes 4G --vm-keep --timeout 120
Test Orphan Claude Killer
# Run the detection script manually to see what it would find
sudo /usr/local/bin/kill-orphan-claude.sh
# Check logs for any kills
journalctl | grep orphan-claude-killer
Uninstall
sudo ./uninstall.sh
Or manually:
sudo systemctl stop earlyoom monit
sudo systemctl disable earlyoom monit
sudo apt remove -y earlyoom monit
sudo rm -f /usr/local/bin/kill-top-cpu.sh
sudo rm -f /usr/local/bin/kill-orphan-claude.sh
sudo rm -f /etc/monit/conf.d/cpu-killer
sudo rm -f /etc/monit/conf.d/orphan-claude-killer
Resource Overhead
| Component | RAM | CPU | Disk |
|---|---|---|---|
| earlyoom | ~2MB | Negligible (adaptive polling) | 77KB |
| monit | ~3-4MB | ~28ms per 60s cycle | 1MB |
| kill script | 0 (runs only when triggered) | Milliseconds | <1KB |
Total: ~6MB RAM, essentially 0% CPU during normal operation
How It Works
Orphan Claude Detection (monit)
A Claude process is considered orphaned if:
- Its controlling TTY is
?(no terminal attached), OR - Its parent PID is 1 (adopted by init)
- monit runs the orphan detection script every 15 cycles (15 minutes with 60s daemon interval)
- Script finds all
claudeprocesses viapgrep -x claude - For each process, checks TTY (
ps -o tty=) and PPID (ps -o ppid=) - If orphaned, kills the process tree (children first, then parent)
- Logs details to syslog including PID, reason, start time, CPU%, and memory%
CPU Protection (monit)
- monit checks system CPU every 60 seconds
- If CPU > 95% for 5 consecutive checks (5 minutes), executes kill script
- Kill script identifies the process using most CPU
- Kills ALL processes with that command name (handles multi-worker processes)
- Logs the action to syslog
RAM Protection (earlyoom)
- earlyoom monitors available memory (adaptive polling - more frequent when memory is low)
- When free memory drops below 5%, sends SIGTERM to highest memory consumer
- If process doesn't exit, sends SIGKILL at 2.5% threshold
- Protected processes (init, systemd, sshd) are never killed
Troubleshooting
monit not triggering
# Check monit is running
sudo systemctl status monit
# Check config syntax
sudo monit -t
# Check monit log
tail -f /var/log/monit.log
# Verify CPU threshold is being detected
sudo monit status | grep cpu
earlyoom not killing
# Check earlyoom is running
sudo systemctl status earlyoom
# Check configuration
cat /etc/default/earlyoom
# Watch real-time
journalctl -u earlyoom -f
Kill script not working
# Test manually
sudo /usr/local/bin/kill-top-cpu.sh
# Check script is executable
ls -la /usr/local/bin/kill-top-cpu.sh
# Check for errors
bash -x /usr/local/bin/kill-top-cpu.sh
License
MIT License - Use freely, no warranty.
Author
Created for UpfrontOps infrastructure management.