Ryan T. Murphy f2bbf7c826 Fix interrupted dpkg state before apt-get
Run dpkg --configure -a before apt operations to recover from
any previously interrupted package installations.
2026-01-22 19:00:08 -05:00
2026-01-22 18:46:43 -05:00
2026-01-22 18:46:43 -05:00
2026-01-22 18:46:43 -05:00
2026-01-22 18:46:43 -05:00
2026-01-22 18:46:43 -05:00
2026-01-22 18:46:43 -05:00

Runaway Process Killer

Automatic protection against runaway CPU and RAM processes on Ubuntu servers. Prevents hosting provider throttling by killing processes that pin CPU to 95%+ for extended periods or exhaust available memory.

Overview

Protection Tool Trigger Action
CPU monit 95%+ CPU for 5 minutes Kill all processes matching top CPU consumer
RAM earlyoom <5% free memory Kill highest memory consumer
Orphan Claude monit Every 15 minutes Kill Claude processes with no TTY or PPID=1

Requirements

  • Ubuntu 20.04+ (tested on 24.04 LTS)
  • Root access
  • ~6MB RAM overhead total

Quick Install

curl -fsSL https://git.upfrontops.cloud/upfrontops/runaway-process-killer/raw/branch/main/install.sh | sudo bash

Or clone and run:

git clone https://git.upfrontops.cloud/upfrontops/runaway-process-killer.git
cd runaway-process-killer
sudo ./install.sh

Manual Installation

1. Install earlyoom (RAM Protection)

sudo apt update && sudo apt install -y earlyoom

Edit /etc/default/earlyoom:

EARLYOOM_ARGS="-m 5 -s 5 --avoid '(^|/)(init|systemd|sshd)$' -r 60"

Enable and start:

sudo systemctl enable earlyoom && sudo systemctl restart earlyoom

2. Install monit (CPU Protection)

sudo apt install -y monit

Edit /etc/monit/monitrc, change daemon interval:

set daemon 60

Enable HTTP interface (required for monit status):

set httpd port 2812 and
    use address localhost
    allow localhost
    allow admin:monit

Create /etc/monit/conf.d/cpu-killer:

check system $HOST
  if cpu usage > 95% for 5 cycles then exec "/usr/local/bin/kill-top-cpu.sh"

Create /usr/local/bin/kill-top-cpu.sh:

#!/bin/bash
# Kill the process tree using the most CPU (excluding critical ones)

# Find the top CPU consumer (excluding protected and transient processes)
TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1)
TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}')
COMM=$(echo "$TOP_LINE" | awk '{print $2}')

if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then
    logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)"
    # Kill all processes with this command name
    pkill -9 -x "$COMM"
fi

Make executable and enable:

sudo chmod +x /usr/local/bin/kill-top-cpu.sh
sudo systemctl enable monit && sudo systemctl restart monit

Configuration

CPU Threshold Timing

Edit /etc/monit/conf.d/cpu-killer to change timing:

Cycles Time (at 60s interval)
2 2 minutes
5 5 minutes (default)
10 10 minutes
30 30 minutes

RAM Threshold

Edit /etc/default/earlyoom:

Setting Meaning
-m 5 Kill when free RAM < 5%
-m 10 Kill when free RAM < 10%
-s 5 Kill when free swap < 5%

Protected Processes

earlyoom protects (via --avoid):

  • init, systemd, sshd

kill-top-cpu.sh protects (via grep exclusion):

  • systemd, sshd, monit, earlyoom, bash, ps, awk, grep, head

To add more protected processes, edit the grep pattern in /usr/local/bin/kill-top-cpu.sh.

Monitoring

Check Status

# earlyoom status
sudo systemctl status earlyoom

# monit status
sudo monit status

# Combined check
sudo ./scripts/status.sh

View Logs

# earlyoom logs
journalctl -u earlyoom -f

# monit logs
tail -f /var/log/monit.log

# Kill events
journalctl | grep -i "cpu-killer\|earlyoom\|orphan-claude"

Testing

Test CPU Killer

# Install stress tool
sudo apt install -y stress

# For quick testing, temporarily set to 2 cycles in /etc/monit/conf.d/cpu-killer
# then reload: sudo monit reload

# Start CPU stress (will be killed after threshold)
stress --cpu 4 --timeout 300

Test RAM Killer

# This will be killed quickly by earlyoom
stress --vm 4 --vm-bytes 4G --vm-keep --timeout 120

Test Orphan Claude Killer

# Run the detection script manually to see what it would find
sudo /usr/local/bin/kill-orphan-claude.sh

# Check logs for any kills
journalctl | grep orphan-claude-killer

Uninstall

sudo ./uninstall.sh

Or manually:

sudo systemctl stop earlyoom monit
sudo systemctl disable earlyoom monit
sudo apt remove -y earlyoom monit
sudo rm -f /usr/local/bin/kill-top-cpu.sh
sudo rm -f /usr/local/bin/kill-orphan-claude.sh
sudo rm -f /etc/monit/conf.d/cpu-killer
sudo rm -f /etc/monit/conf.d/orphan-claude-killer

Resource Overhead

Component RAM CPU Disk
earlyoom ~2MB Negligible (adaptive polling) 77KB
monit ~3-4MB ~28ms per 60s cycle 1MB
kill script 0 (runs only when triggered) Milliseconds <1KB

Total: ~6MB RAM, essentially 0% CPU during normal operation

How It Works

Orphan Claude Detection (monit)

A Claude process is considered orphaned if:

  • Its controlling TTY is ? (no terminal attached), OR
  • Its parent PID is 1 (adopted by init)
  1. monit runs the orphan detection script every 15 cycles (15 minutes with 60s daemon interval)
  2. Script finds all claude processes via pgrep -x claude
  3. For each process, checks TTY (ps -o tty=) and PPID (ps -o ppid=)
  4. If orphaned, kills the process tree (children first, then parent)
  5. Logs details to syslog including PID, reason, start time, CPU%, and memory%

CPU Protection (monit)

  1. monit checks system CPU every 60 seconds
  2. If CPU > 95% for 5 consecutive checks (5 minutes), executes kill script
  3. Kill script identifies the process using most CPU
  4. Kills ALL processes with that command name (handles multi-worker processes)
  5. Logs the action to syslog

RAM Protection (earlyoom)

  1. earlyoom monitors available memory (adaptive polling - more frequent when memory is low)
  2. When free memory drops below 5%, sends SIGTERM to highest memory consumer
  3. If process doesn't exit, sends SIGKILL at 2.5% threshold
  4. Protected processes (init, systemd, sshd) are never killed

Troubleshooting

monit not triggering

# Check monit is running
sudo systemctl status monit

# Check config syntax
sudo monit -t

# Check monit log
tail -f /var/log/monit.log

# Verify CPU threshold is being detected
sudo monit status | grep cpu

earlyoom not killing

# Check earlyoom is running
sudo systemctl status earlyoom

# Check configuration
cat /etc/default/earlyoom

# Watch real-time
journalctl -u earlyoom -f

Kill script not working

# Test manually
sudo /usr/local/bin/kill-top-cpu.sh

# Check script is executable
ls -la /usr/local/bin/kill-top-cpu.sh

# Check for errors
bash -x /usr/local/bin/kill-top-cpu.sh

License

MIT License - Use freely, no warranty.

Author

Created for UpfrontOps infrastructure management.

Description
Automatic protection against runaway CPU and RAM processes on Ubuntu servers
Readme MIT 45 KiB
Languages
Shell 100%