# Runaway Process Killer Automatic protection against runaway CPU and RAM processes on Ubuntu servers. Prevents hosting provider throttling by killing processes that pin CPU to 95%+ for extended periods or exhaust available memory. ## Overview | Protection | Tool | Trigger | Action | |------------|------|---------|--------| | CPU | monit | 95%+ CPU for 5 minutes | Kill all processes matching top CPU consumer | | RAM | earlyoom | <5% free memory | Kill highest memory consumer | | Orphan Claude | monit | Every 15 minutes | Kill Claude processes with no TTY or PPID=1 | ## Requirements - Ubuntu 20.04+ (tested on 24.04 LTS) - Root access - ~6MB RAM overhead total ## Quick Install ```bash curl -fsSL https://git.upfrontops.cloud/upfrontops/runaway-process-killer/raw/branch/main/install.sh | sudo bash ``` Or clone and run: ```bash git clone https://git.upfrontops.cloud/upfrontops/runaway-process-killer.git cd runaway-process-killer sudo ./install.sh ``` ## Manual Installation ### 1. Install earlyoom (RAM Protection) ```bash sudo apt update && sudo apt install -y earlyoom ``` Edit `/etc/default/earlyoom`: ```bash EARLYOOM_ARGS="-m 5 -s 5 --avoid '(^|/)(init|systemd|sshd)$' -r 60" ``` Enable and start: ```bash sudo systemctl enable earlyoom && sudo systemctl restart earlyoom ``` ### 2. Install monit (CPU Protection) ```bash sudo apt install -y monit ``` Edit `/etc/monit/monitrc`, change daemon interval: ``` set daemon 60 ``` Enable HTTP interface (required for `monit status`): ``` set httpd port 2812 and use address localhost allow localhost allow admin:monit ``` Create `/etc/monit/conf.d/cpu-killer`: ``` check system $HOST if cpu usage > 95% for 5 cycles then exec "/usr/local/bin/kill-top-cpu.sh" ``` Create `/usr/local/bin/kill-top-cpu.sh`: ```bash #!/bin/bash # Kill the process tree using the most CPU (excluding critical ones) # Find the top CPU consumer (excluding protected and transient processes) TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1) TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}') COMM=$(echo "$TOP_LINE" | awk '{print $2}') if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)" # Kill all processes with this command name pkill -9 -x "$COMM" fi ``` Make executable and enable: ```bash sudo chmod +x /usr/local/bin/kill-top-cpu.sh sudo systemctl enable monit && sudo systemctl restart monit ``` ## Configuration ### CPU Threshold Timing Edit `/etc/monit/conf.d/cpu-killer` to change timing: | Cycles | Time (at 60s interval) | |--------|------------------------| | 2 | 2 minutes | | 5 | 5 minutes (default) | | 10 | 10 minutes | | 30 | 30 minutes | ### RAM Threshold Edit `/etc/default/earlyoom`: | Setting | Meaning | |---------|---------| | `-m 5` | Kill when free RAM < 5% | | `-m 10` | Kill when free RAM < 10% | | `-s 5` | Kill when free swap < 5% | ### Protected Processes **earlyoom** protects (via `--avoid`): - init, systemd, sshd **kill-top-cpu.sh** protects (via grep exclusion): - systemd, sshd, monit, earlyoom, bash, ps, awk, grep, head To add more protected processes, edit the grep pattern in `/usr/local/bin/kill-top-cpu.sh`. ## Monitoring ### Check Status ```bash # earlyoom status sudo systemctl status earlyoom # monit status sudo monit status # Combined check sudo ./scripts/status.sh ``` ### View Logs ```bash # earlyoom logs journalctl -u earlyoom -f # monit logs tail -f /var/log/monit.log # Kill events journalctl | grep -i "cpu-killer\|earlyoom\|orphan-claude" ``` ## Testing ### Test CPU Killer ```bash # Install stress tool sudo apt install -y stress # For quick testing, temporarily set to 2 cycles in /etc/monit/conf.d/cpu-killer # then reload: sudo monit reload # Start CPU stress (will be killed after threshold) stress --cpu 4 --timeout 300 ``` ### Test RAM Killer ```bash # This will be killed quickly by earlyoom stress --vm 4 --vm-bytes 4G --vm-keep --timeout 120 ``` ### Test Orphan Claude Killer ```bash # Run the detection script manually to see what it would find sudo /usr/local/bin/kill-orphan-claude.sh # Check logs for any kills journalctl | grep orphan-claude-killer ``` ## Uninstall ```bash sudo ./uninstall.sh ``` Or manually: ```bash sudo systemctl stop earlyoom monit sudo systemctl disable earlyoom monit sudo apt remove -y earlyoom monit sudo rm -f /usr/local/bin/kill-top-cpu.sh sudo rm -f /usr/local/bin/kill-orphan-claude.sh sudo rm -f /etc/monit/conf.d/cpu-killer sudo rm -f /etc/monit/conf.d/orphan-claude-killer ``` ## Resource Overhead | Component | RAM | CPU | Disk | |-----------|-----|-----|------| | earlyoom | ~2MB | Negligible (adaptive polling) | 77KB | | monit | ~3-4MB | ~28ms per 60s cycle | 1MB | | kill script | 0 (runs only when triggered) | Milliseconds | <1KB | **Total: ~6MB RAM, essentially 0% CPU during normal operation** ## How It Works ### Orphan Claude Detection (monit) A Claude process is considered orphaned if: - Its controlling TTY is `?` (no terminal attached), OR - Its parent PID is 1 (adopted by init) 1. monit runs the orphan detection script every 15 cycles (15 minutes with 60s daemon interval) 2. Script finds all `claude` processes via `pgrep -x claude` 3. For each process, checks TTY (`ps -o tty=`) and PPID (`ps -o ppid=`) 4. If orphaned, kills the process tree (children first, then parent) 5. Logs details to syslog including PID, reason, start time, CPU%, and memory% ### CPU Protection (monit) 1. monit checks system CPU every 60 seconds 2. If CPU > 95% for 5 consecutive checks (5 minutes), executes kill script 3. Kill script identifies the process using most CPU 4. Kills ALL processes with that command name (handles multi-worker processes) 5. Logs the action to syslog ### RAM Protection (earlyoom) 1. earlyoom monitors available memory (adaptive polling - more frequent when memory is low) 2. When free memory drops below 5%, sends SIGTERM to highest memory consumer 3. If process doesn't exit, sends SIGKILL at 2.5% threshold 4. Protected processes (init, systemd, sshd) are never killed ## Troubleshooting ### monit not triggering ```bash # Check monit is running sudo systemctl status monit # Check config syntax sudo monit -t # Check monit log tail -f /var/log/monit.log # Verify CPU threshold is being detected sudo monit status | grep cpu ``` ### earlyoom not killing ```bash # Check earlyoom is running sudo systemctl status earlyoom # Check configuration cat /etc/default/earlyoom # Watch real-time journalctl -u earlyoom -f ``` ### Kill script not working ```bash # Test manually sudo /usr/local/bin/kill-top-cpu.sh # Check script is executable ls -la /usr/local/bin/kill-top-cpu.sh # Check for errors bash -x /usr/local/bin/kill-top-cpu.sh ``` ## License MIT License - Use freely, no warranty. ## Author Created for UpfrontOps infrastructure management.