Replace with B2 archive contents
This commit is contained in:
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
*.log
|
||||
*.bak
|
||||
*.swp
|
||||
.DS_Store
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 UpfrontOps
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
299
README.md
Normal file
299
README.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Runaway Process Killer
|
||||
|
||||
Automatic protection against runaway CPU and RAM processes on Ubuntu servers. Prevents hosting provider throttling by killing processes that pin CPU to 95%+ for extended periods or exhaust available memory.
|
||||
|
||||
## Overview
|
||||
|
||||
| Protection | Tool | Trigger | Action |
|
||||
|------------|------|---------|--------|
|
||||
| CPU | monit | 95%+ CPU for 5 minutes | Kill all processes matching top CPU consumer |
|
||||
| RAM | earlyoom | <5% free memory | Kill highest memory consumer |
|
||||
| Orphan Claude | monit | Every 15 minutes | Kill Claude processes with no TTY or PPID=1 |
|
||||
|
||||
## Requirements
|
||||
|
||||
- Ubuntu 20.04+ (tested on 24.04 LTS)
|
||||
- Root access
|
||||
- ~6MB RAM overhead total
|
||||
|
||||
## Quick Install
|
||||
|
||||
```bash
|
||||
curl -fsSL https://git.upfrontops.cloud/upfrontops/runaway-process-killer/raw/branch/main/install.sh | sudo bash
|
||||
```
|
||||
|
||||
Or clone and run:
|
||||
|
||||
```bash
|
||||
git clone https://git.upfrontops.cloud/upfrontops/runaway-process-killer.git
|
||||
cd runaway-process-killer
|
||||
sudo ./install.sh
|
||||
```
|
||||
|
||||
## Manual Installation
|
||||
|
||||
### 1. Install earlyoom (RAM Protection)
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt install -y earlyoom
|
||||
```
|
||||
|
||||
Edit `/etc/default/earlyoom`:
|
||||
```bash
|
||||
EARLYOOM_ARGS="-m 5 -s 5 --avoid '(^|/)(init|systemd|sshd)$' -r 60"
|
||||
```
|
||||
|
||||
Enable and start:
|
||||
```bash
|
||||
sudo systemctl enable earlyoom && sudo systemctl restart earlyoom
|
||||
```
|
||||
|
||||
### 2. Install monit (CPU Protection)
|
||||
|
||||
```bash
|
||||
sudo apt install -y monit
|
||||
```
|
||||
|
||||
Edit `/etc/monit/monitrc`, change daemon interval:
|
||||
```
|
||||
set daemon 60
|
||||
```
|
||||
|
||||
Enable HTTP interface (required for `monit status`):
|
||||
```
|
||||
set httpd port 2812 and
|
||||
use address localhost
|
||||
allow localhost
|
||||
allow admin:monit
|
||||
```
|
||||
|
||||
Create `/etc/monit/conf.d/cpu-killer`:
|
||||
```
|
||||
check system $HOST
|
||||
if cpu usage > 95% for 5 cycles then exec "/usr/local/bin/kill-top-cpu.sh"
|
||||
```
|
||||
|
||||
Create `/usr/local/bin/kill-top-cpu.sh`:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Kill the process tree using the most CPU (excluding critical ones)
|
||||
|
||||
# Find the top CPU consumer (excluding protected and transient processes)
|
||||
TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1)
|
||||
TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}')
|
||||
COMM=$(echo "$TOP_LINE" | awk '{print $2}')
|
||||
|
||||
if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then
|
||||
logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)"
|
||||
# Kill all processes with this command name
|
||||
pkill -9 -x "$COMM"
|
||||
fi
|
||||
```
|
||||
|
||||
Make executable and enable:
|
||||
```bash
|
||||
sudo chmod +x /usr/local/bin/kill-top-cpu.sh
|
||||
sudo systemctl enable monit && sudo systemctl restart monit
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### CPU Threshold Timing
|
||||
|
||||
Edit `/etc/monit/conf.d/cpu-killer` to change timing:
|
||||
|
||||
| Cycles | Time (at 60s interval) |
|
||||
|--------|------------------------|
|
||||
| 2 | 2 minutes |
|
||||
| 5 | 5 minutes (default) |
|
||||
| 10 | 10 minutes |
|
||||
| 30 | 30 minutes |
|
||||
|
||||
### RAM Threshold
|
||||
|
||||
Edit `/etc/default/earlyoom`:
|
||||
|
||||
| Setting | Meaning |
|
||||
|---------|---------|
|
||||
| `-m 5` | Kill when free RAM < 5% |
|
||||
| `-m 10` | Kill when free RAM < 10% |
|
||||
| `-s 5` | Kill when free swap < 5% |
|
||||
|
||||
### Protected Processes
|
||||
|
||||
**earlyoom** protects (via `--avoid`):
|
||||
- init, systemd, sshd
|
||||
|
||||
**kill-top-cpu.sh** protects (via grep exclusion):
|
||||
- systemd, sshd, monit, earlyoom, bash, ps, awk, grep, head
|
||||
|
||||
To add more protected processes, edit the grep pattern in `/usr/local/bin/kill-top-cpu.sh`.
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check Status
|
||||
|
||||
```bash
|
||||
# earlyoom status
|
||||
sudo systemctl status earlyoom
|
||||
|
||||
# monit status
|
||||
sudo monit status
|
||||
|
||||
# Combined check
|
||||
sudo ./scripts/status.sh
|
||||
```
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# earlyoom logs
|
||||
journalctl -u earlyoom -f
|
||||
|
||||
# monit logs
|
||||
tail -f /var/log/monit.log
|
||||
|
||||
# Kill events
|
||||
journalctl | grep -i "cpu-killer\|earlyoom\|orphan-claude"
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Test CPU Killer
|
||||
|
||||
```bash
|
||||
# Install stress tool
|
||||
sudo apt install -y stress
|
||||
|
||||
# For quick testing, temporarily set to 2 cycles in /etc/monit/conf.d/cpu-killer
|
||||
# then reload: sudo monit reload
|
||||
|
||||
# Start CPU stress (will be killed after threshold)
|
||||
stress --cpu 4 --timeout 300
|
||||
```
|
||||
|
||||
### Test RAM Killer
|
||||
|
||||
```bash
|
||||
# This will be killed quickly by earlyoom
|
||||
stress --vm 4 --vm-bytes 4G --vm-keep --timeout 120
|
||||
```
|
||||
|
||||
### Test Orphan Claude Killer
|
||||
|
||||
```bash
|
||||
# Run the detection script manually to see what it would find
|
||||
sudo /usr/local/bin/kill-orphan-claude.sh
|
||||
|
||||
# Check logs for any kills
|
||||
journalctl | grep orphan-claude-killer
|
||||
```
|
||||
|
||||
## Uninstall
|
||||
|
||||
```bash
|
||||
sudo ./uninstall.sh
|
||||
```
|
||||
|
||||
Or manually:
|
||||
|
||||
```bash
|
||||
sudo systemctl stop earlyoom monit
|
||||
sudo systemctl disable earlyoom monit
|
||||
sudo apt remove -y earlyoom monit
|
||||
sudo rm -f /usr/local/bin/kill-top-cpu.sh
|
||||
sudo rm -f /usr/local/bin/kill-orphan-claude.sh
|
||||
sudo rm -f /etc/monit/conf.d/cpu-killer
|
||||
sudo rm -f /etc/monit/conf.d/orphan-claude-killer
|
||||
```
|
||||
|
||||
## Resource Overhead
|
||||
|
||||
| Component | RAM | CPU | Disk |
|
||||
|-----------|-----|-----|------|
|
||||
| earlyoom | ~2MB | Negligible (adaptive polling) | 77KB |
|
||||
| monit | ~3-4MB | ~28ms per 60s cycle | 1MB |
|
||||
| kill script | 0 (runs only when triggered) | Milliseconds | <1KB |
|
||||
|
||||
**Total: ~6MB RAM, essentially 0% CPU during normal operation**
|
||||
|
||||
## How It Works
|
||||
|
||||
### Orphan Claude Detection (monit)
|
||||
|
||||
A Claude process is considered orphaned if:
|
||||
- Its controlling TTY is `?` (no terminal attached), OR
|
||||
- Its parent PID is 1 (adopted by init)
|
||||
|
||||
1. monit runs the orphan detection script every 15 cycles (15 minutes with 60s daemon interval)
|
||||
2. Script finds all `claude` processes via `pgrep -x claude`
|
||||
3. For each process, checks TTY (`ps -o tty=`) and PPID (`ps -o ppid=`)
|
||||
4. If orphaned, kills the process tree (children first, then parent)
|
||||
5. Logs details to syslog including PID, reason, start time, CPU%, and memory%
|
||||
|
||||
### CPU Protection (monit)
|
||||
|
||||
1. monit checks system CPU every 60 seconds
|
||||
2. If CPU > 95% for 5 consecutive checks (5 minutes), executes kill script
|
||||
3. Kill script identifies the process using most CPU
|
||||
4. Kills ALL processes with that command name (handles multi-worker processes)
|
||||
5. Logs the action to syslog
|
||||
|
||||
### RAM Protection (earlyoom)
|
||||
|
||||
1. earlyoom monitors available memory (adaptive polling - more frequent when memory is low)
|
||||
2. When free memory drops below 5%, sends SIGTERM to highest memory consumer
|
||||
3. If process doesn't exit, sends SIGKILL at 2.5% threshold
|
||||
4. Protected processes (init, systemd, sshd) are never killed
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### monit not triggering
|
||||
|
||||
```bash
|
||||
# Check monit is running
|
||||
sudo systemctl status monit
|
||||
|
||||
# Check config syntax
|
||||
sudo monit -t
|
||||
|
||||
# Check monit log
|
||||
tail -f /var/log/monit.log
|
||||
|
||||
# Verify CPU threshold is being detected
|
||||
sudo monit status | grep cpu
|
||||
```
|
||||
|
||||
### earlyoom not killing
|
||||
|
||||
```bash
|
||||
# Check earlyoom is running
|
||||
sudo systemctl status earlyoom
|
||||
|
||||
# Check configuration
|
||||
cat /etc/default/earlyoom
|
||||
|
||||
# Watch real-time
|
||||
journalctl -u earlyoom -f
|
||||
```
|
||||
|
||||
### Kill script not working
|
||||
|
||||
```bash
|
||||
# Test manually
|
||||
sudo /usr/local/bin/kill-top-cpu.sh
|
||||
|
||||
# Check script is executable
|
||||
ls -la /usr/local/bin/kill-top-cpu.sh
|
||||
|
||||
# Check for errors
|
||||
bash -x /usr/local/bin/kill-top-cpu.sh
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - Use freely, no warranty.
|
||||
|
||||
## Author
|
||||
|
||||
Created for UpfrontOps infrastructure management.
|
||||
2
config/cpu-killer.conf
Normal file
2
config/cpu-killer.conf
Normal file
@@ -0,0 +1,2 @@
|
||||
check system $HOST
|
||||
if cpu usage > 95% for 5 cycles then exec "/usr/local/bin/kill-top-cpu.sh"
|
||||
21
config/earlyoom.conf
Normal file
21
config/earlyoom.conf
Normal file
@@ -0,0 +1,21 @@
|
||||
# Default settings for earlyoom. This file is sourced by /bin/sh from
|
||||
# /etc/init.d/earlyoom or by systemd from earlyoom.service.
|
||||
|
||||
# Options to pass to earlyoom
|
||||
EARLYOOM_ARGS="-m 5 -s 5 --avoid '(^|/)(init|systemd|sshd)$' -r 60"
|
||||
|
||||
# Examples:
|
||||
|
||||
# Print memory report every second instead of every minute
|
||||
# EARLYOOM_ARGS="-r 1"
|
||||
|
||||
# Available minimum memory 5%
|
||||
# EARLYOOM_ARGS="-m 5"
|
||||
|
||||
# Available minimum memory 15% and free minimum swap 5%
|
||||
# EARLYOOM_ARGS="-m 15 -s 5"
|
||||
|
||||
# Avoid killing processes whose name matches this regexp
|
||||
# EARLYOOM_ARGS="--avoid '(^|/)(init|X|sshd|firefox)$'"
|
||||
|
||||
# See more at `earlyoom -h'
|
||||
3
config/orphan-claude-killer.conf
Normal file
3
config/orphan-claude-killer.conf
Normal file
@@ -0,0 +1,3 @@
|
||||
check program orphan-claude-killer with path "/usr/local/bin/kill-orphan-claude.sh"
|
||||
every 15 cycles
|
||||
if status != 0 then alert
|
||||
182
install.sh
Executable file
182
install.sh
Executable file
@@ -0,0 +1,182 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Runaway Process Killer - Installation Script
|
||||
# Protects against CPU and RAM runaway processes
|
||||
|
||||
echo "============================================"
|
||||
echo "Runaway Process Killer - Installation"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Check root
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "Error: Please run as root (sudo ./install.sh)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Detect script directory
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Configuration
|
||||
CPU_CYCLES="${CPU_CYCLES:-5}" # 5 minutes default (5 cycles * 60s)
|
||||
RAM_THRESHOLD="${RAM_THRESHOLD:-5}" # 5% free memory threshold
|
||||
SWAP_THRESHOLD="${SWAP_THRESHOLD:-5}" # 5% free swap threshold
|
||||
|
||||
echo "Configuration:"
|
||||
echo " CPU kill threshold: ${CPU_CYCLES} minutes of 95%+ CPU"
|
||||
echo " RAM kill threshold: <${RAM_THRESHOLD}% free memory"
|
||||
echo ""
|
||||
|
||||
# Install packages
|
||||
echo "[1/8] Installing packages..."
|
||||
apt-get update -qq
|
||||
apt-get install -y -qq earlyoom monit
|
||||
|
||||
# Configure earlyoom
|
||||
echo "[2/8] Configuring earlyoom (RAM protection)..."
|
||||
cat > /etc/default/earlyoom << EOF
|
||||
# earlyoom configuration - managed by runaway-process-killer
|
||||
EARLYOOM_ARGS="-m ${RAM_THRESHOLD} -s ${SWAP_THRESHOLD} --avoid '(^|/)(init|systemd|sshd)\$' -r 60"
|
||||
EOF
|
||||
|
||||
# Configure monit daemon interval
|
||||
echo "[3/8] Configuring monit daemon..."
|
||||
if grep -q "^set daemon" /etc/monit/monitrc; then
|
||||
sed -i 's/^set daemon.*/set daemon 60/' /etc/monit/monitrc
|
||||
else
|
||||
sed -i '1i set daemon 60' /etc/monit/monitrc
|
||||
fi
|
||||
|
||||
# Enable monit HTTP interface (required for monit status)
|
||||
if ! grep -q "^set httpd port 2812" /etc/monit/monitrc; then
|
||||
cat >> /etc/monit/monitrc << 'EOF'
|
||||
|
||||
# HTTP interface for monit status command
|
||||
set httpd port 2812 and
|
||||
use address localhost
|
||||
allow localhost
|
||||
allow admin:monit
|
||||
EOF
|
||||
fi
|
||||
|
||||
# Create CPU killer config
|
||||
echo "[4/8] Creating CPU monitoring rule..."
|
||||
cat > /etc/monit/conf.d/cpu-killer << EOF
|
||||
check system \$HOST
|
||||
if cpu usage > 95% for ${CPU_CYCLES} cycles then exec "/usr/local/bin/kill-top-cpu.sh"
|
||||
EOF
|
||||
|
||||
# Create kill script
|
||||
echo "[5/8] Installing CPU kill script..."
|
||||
cat > /usr/local/bin/kill-top-cpu.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Kill the process tree using the most CPU (excluding critical ones)
|
||||
|
||||
# Find the top CPU consumer (excluding protected and transient processes)
|
||||
TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1)
|
||||
TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}')
|
||||
COMM=$(echo "$TOP_LINE" | awk '{print $2}')
|
||||
|
||||
if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then
|
||||
logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)"
|
||||
# Kill all processes with this command name
|
||||
pkill -9 -x "$COMM"
|
||||
fi
|
||||
EOF
|
||||
chmod +x /usr/local/bin/kill-top-cpu.sh
|
||||
|
||||
# Create orphan Claude killer script
|
||||
echo "[6/8] Installing orphan Claude killer script..."
|
||||
cat > /usr/local/bin/kill-orphan-claude.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Kill orphaned Claude processes (no TTY or parent is init)
|
||||
# An orphaned Claude process is one where:
|
||||
# - TTY is "?" (no terminal attached), OR
|
||||
# - Parent PID is 1 (adopted by init)
|
||||
|
||||
LOG_TAG="orphan-claude-killer"
|
||||
|
||||
# Find all claude processes
|
||||
CLAUDE_PIDS=$(pgrep -x claude 2>/dev/null)
|
||||
|
||||
if [ -z "$CLAUDE_PIDS" ]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
for PID in $CLAUDE_PIDS; do
|
||||
# Get TTY and PPID for this process
|
||||
PROC_INFO=$(ps -o tty=,ppid= -p "$PID" 2>/dev/null)
|
||||
|
||||
if [ -z "$PROC_INFO" ]; then
|
||||
# Process already gone
|
||||
continue
|
||||
fi
|
||||
|
||||
PROC_TTY=$(echo "$PROC_INFO" | awk '{print $1}')
|
||||
PARENT_PID=$(echo "$PROC_INFO" | awk '{print $2}')
|
||||
|
||||
ORPHANED=false
|
||||
REASON=""
|
||||
|
||||
# Check if TTY is "?" (no terminal)
|
||||
if [ "$PROC_TTY" = "?" ]; then
|
||||
ORPHANED=true
|
||||
REASON="no TTY attached"
|
||||
fi
|
||||
|
||||
# Check if parent PID is 1 (adopted by init)
|
||||
if [ "$PARENT_PID" = "1" ]; then
|
||||
ORPHANED=true
|
||||
REASON="parent is init (PPID=1)"
|
||||
fi
|
||||
|
||||
if [ "$ORPHANED" = true ]; then
|
||||
# Get process start time for logging
|
||||
START_TIME=$(ps -o lstart= -p "$PID" 2>/dev/null)
|
||||
CPU=$(ps -o %cpu= -p "$PID" 2>/dev/null)
|
||||
MEM=$(ps -o %mem= -p "$PID" 2>/dev/null)
|
||||
|
||||
logger "$LOG_TAG: Killing orphaned claude process PID=$PID ($REASON) started='$START_TIME' cpu=$CPU% mem=$MEM%"
|
||||
|
||||
# Kill the process tree (claude may have child processes)
|
||||
pkill -9 -P "$PID" 2>/dev/null
|
||||
kill -9 "$PID" 2>/dev/null
|
||||
fi
|
||||
done
|
||||
EOF
|
||||
chmod +x /usr/local/bin/kill-orphan-claude.sh
|
||||
|
||||
# Create orphan Claude killer config
|
||||
echo "[7/8] Creating orphan Claude monitoring rule..."
|
||||
cat > /etc/monit/conf.d/orphan-claude-killer << 'EOF'
|
||||
check program orphan-claude-killer with path "/usr/local/bin/kill-orphan-claude.sh"
|
||||
every 15 cycles
|
||||
if status != 0 then alert
|
||||
EOF
|
||||
|
||||
# Enable and start services
|
||||
echo "[8/8] Enabling and starting services..."
|
||||
systemctl enable earlyoom monit
|
||||
systemctl restart earlyoom monit
|
||||
|
||||
# Verify
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo "Installation Complete!"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo "Status:"
|
||||
systemctl is-active earlyoom && echo " earlyoom: running" || echo " earlyoom: FAILED"
|
||||
systemctl is-active monit && echo " monit: running" || echo " monit: FAILED"
|
||||
echo ""
|
||||
echo "Configuration:"
|
||||
echo " CPU: Kill after ${CPU_CYCLES} minutes of 95%+ usage"
|
||||
echo " RAM: Kill when free memory < ${RAM_THRESHOLD}%"
|
||||
echo " Orphan Claude: Check every 15 minutes"
|
||||
echo ""
|
||||
echo "Commands:"
|
||||
echo " sudo monit status # Check monit status"
|
||||
echo " journalctl -u earlyoom # View earlyoom logs"
|
||||
echo " tail -f /var/log/monit.log # View monit logs"
|
||||
echo ""
|
||||
55
scripts/kill-orphan-claude.sh
Executable file
55
scripts/kill-orphan-claude.sh
Executable file
@@ -0,0 +1,55 @@
|
||||
#!/bin/bash
|
||||
# Kill orphaned Claude processes (no TTY or parent is init)
|
||||
# An orphaned Claude process is one where:
|
||||
# - TTY is "?" (no terminal attached), OR
|
||||
# - Parent PID is 1 (adopted by init)
|
||||
|
||||
LOG_TAG="orphan-claude-killer"
|
||||
|
||||
# Find all claude processes
|
||||
CLAUDE_PIDS=$(pgrep -x claude 2>/dev/null)
|
||||
|
||||
if [ -z "$CLAUDE_PIDS" ]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
for PID in $CLAUDE_PIDS; do
|
||||
# Get TTY and PPID for this process
|
||||
PROC_INFO=$(ps -o tty=,ppid= -p "$PID" 2>/dev/null)
|
||||
|
||||
if [ -z "$PROC_INFO" ]; then
|
||||
# Process already gone
|
||||
continue
|
||||
fi
|
||||
|
||||
PROC_TTY=$(echo "$PROC_INFO" | awk '{print $1}')
|
||||
PARENT_PID=$(echo "$PROC_INFO" | awk '{print $2}')
|
||||
|
||||
ORPHANED=false
|
||||
REASON=""
|
||||
|
||||
# Check if TTY is "?" (no terminal)
|
||||
if [ "$PROC_TTY" = "?" ]; then
|
||||
ORPHANED=true
|
||||
REASON="no TTY attached"
|
||||
fi
|
||||
|
||||
# Check if parent PID is 1 (adopted by init)
|
||||
if [ "$PARENT_PID" = "1" ]; then
|
||||
ORPHANED=true
|
||||
REASON="parent is init (PPID=1)"
|
||||
fi
|
||||
|
||||
if [ "$ORPHANED" = true ]; then
|
||||
# Get process start time for logging
|
||||
START_TIME=$(ps -o lstart= -p "$PID" 2>/dev/null)
|
||||
CPU=$(ps -o %cpu= -p "$PID" 2>/dev/null)
|
||||
MEM=$(ps -o %mem= -p "$PID" 2>/dev/null)
|
||||
|
||||
logger "$LOG_TAG: Killing orphaned claude process PID=$PID ($REASON) started='$START_TIME' cpu=$CPU% mem=$MEM%"
|
||||
|
||||
# Kill the process tree (claude may have child processes)
|
||||
pkill -9 -P "$PID" 2>/dev/null
|
||||
kill -9 "$PID" 2>/dev/null
|
||||
fi
|
||||
done
|
||||
13
scripts/kill-top-cpu.sh
Executable file
13
scripts/kill-top-cpu.sh
Executable file
@@ -0,0 +1,13 @@
|
||||
#!/bin/bash
|
||||
# Kill the process tree using the most CPU (excluding critical ones)
|
||||
|
||||
# Find the top CPU consumer (excluding protected and transient processes)
|
||||
TOP_LINE=$(ps -eo pid,comm,%cpu --sort=-%cpu | grep -v -E '(PID|systemd|sshd|monit|earlyoom|bash|ps|awk|grep|head)' | head -1)
|
||||
TARGET_PID=$(echo "$TOP_LINE" | awk '{print $1}')
|
||||
COMM=$(echo "$TOP_LINE" | awk '{print $2}')
|
||||
|
||||
if [ -n "$TARGET_PID" ] && [ -n "$COMM" ]; then
|
||||
logger "monit cpu-killer: Killing all '$COMM' processes (detected high CPU on PID $TARGET_PID)"
|
||||
# Kill all processes with this command name
|
||||
pkill -9 -x "$COMM"
|
||||
fi
|
||||
52
scripts/status.sh
Executable file
52
scripts/status.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
# Status check for runaway process killer
|
||||
|
||||
echo "============================================"
|
||||
echo "Runaway Process Killer - Status"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
echo "=== Services ==="
|
||||
echo -n "earlyoom: "
|
||||
systemctl is-active earlyoom 2>/dev/null || echo "not running"
|
||||
|
||||
echo -n "monit: "
|
||||
systemctl is-active monit 2>/dev/null || echo "not running"
|
||||
|
||||
echo ""
|
||||
echo "=== Current System Load ==="
|
||||
echo -n "CPU: "
|
||||
top -bn1 | grep "Cpu(s)" | awk '{print $2 "% user, " $4 "% system"}'
|
||||
|
||||
echo -n "RAM: "
|
||||
free -h | awk '/Mem:/ {printf "%s used / %s total (%.1f%% used)\n", $3, $2, $3/$2*100}'
|
||||
|
||||
echo -n "Swap: "
|
||||
free -h | awk '/Swap:/ {if ($2 != "0B") printf "%s used / %s total\n", $3, $2; else print "disabled"}'
|
||||
|
||||
echo ""
|
||||
echo "=== Configuration ==="
|
||||
if [ -f /etc/monit/conf.d/cpu-killer ]; then
|
||||
CYCLES=$(grep "for.*cycles" /etc/monit/conf.d/cpu-killer | grep -oP '\d+(?= cycles)')
|
||||
echo "CPU threshold: ${CYCLES} minutes at 95%+"
|
||||
else
|
||||
echo "CPU threshold: NOT CONFIGURED"
|
||||
fi
|
||||
|
||||
if [ -f /etc/default/earlyoom ]; then
|
||||
RAM_PCT=$(grep -oP '(?<=-m )\d+' /etc/default/earlyoom)
|
||||
echo "RAM threshold: <${RAM_PCT}% free memory"
|
||||
else
|
||||
echo "RAM threshold: NOT CONFIGURED"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Recent Kill Events ==="
|
||||
echo "CPU kills (last 24h):"
|
||||
journalctl --since "24 hours ago" 2>/dev/null | grep "cpu-killer" | tail -3 || echo " None"
|
||||
|
||||
echo ""
|
||||
echo "RAM kills (last 24h):"
|
||||
journalctl -u earlyoom --since "24 hours ago" 2>/dev/null | grep -E "SIGTERM|SIGKILL" | tail -3 || echo " None"
|
||||
|
||||
echo ""
|
||||
105
scripts/test.sh
Executable file
105
scripts/test.sh
Executable file
@@ -0,0 +1,105 @@
|
||||
#!/bin/bash
|
||||
# Test script for runaway process killer
|
||||
# WARNING: This will stress your system!
|
||||
|
||||
set -e
|
||||
|
||||
echo "============================================"
|
||||
echo "Runaway Process Killer - Test Suite"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo "WARNING: This will temporarily stress your CPU and RAM!"
|
||||
echo ""
|
||||
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "Error: Please run as root (sudo ./test.sh)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check stress is installed
|
||||
if ! command -v stress &> /dev/null; then
|
||||
echo "Installing stress tool..."
|
||||
apt-get install -y -qq stress
|
||||
fi
|
||||
|
||||
# Backup current config
|
||||
ORIG_CYCLES=$(grep "for.*cycles" /etc/monit/conf.d/cpu-killer | grep -oP '\d+(?= cycles)')
|
||||
echo "Current CPU threshold: ${ORIG_CYCLES} cycles"
|
||||
echo "Temporarily setting to 2 cycles for testing..."
|
||||
|
||||
# Set to 2 cycles for testing
|
||||
sed -i "s/for ${ORIG_CYCLES} cycles/for 2 cycles/" /etc/monit/conf.d/cpu-killer
|
||||
monit reload
|
||||
sleep 2
|
||||
|
||||
echo ""
|
||||
echo "=== TEST 1: CPU Protection ==="
|
||||
echo "Starting CPU stress (expect kill in ~2 minutes)..."
|
||||
echo ""
|
||||
|
||||
stress --cpu 4 --timeout 300 &
|
||||
STRESS_PID=$!
|
||||
sleep 2
|
||||
|
||||
START=$(date +%s)
|
||||
while kill -0 $STRESS_PID 2>/dev/null; do
|
||||
sleep 5
|
||||
NOW=$(date +%s)
|
||||
ELAPSED=$((NOW - START))
|
||||
|
||||
if [ $ELAPSED -ge 180 ]; then
|
||||
echo "FAILED: CPU stress not killed after 3 minutes"
|
||||
kill $STRESS_PID 2>/dev/null || true
|
||||
break
|
||||
fi
|
||||
|
||||
if ! pgrep -x stress > /dev/null 2>&1; then
|
||||
echo "PASSED: CPU stress killed after ${ELAPSED}s"
|
||||
break
|
||||
fi
|
||||
|
||||
[ $((ELAPSED % 30)) -eq 0 ] && echo "[${ELAPSED}s] Stress still running..."
|
||||
done
|
||||
|
||||
sleep 2
|
||||
|
||||
echo ""
|
||||
echo "=== TEST 2: RAM Protection ==="
|
||||
echo "Starting RAM stress (expect quick kill)..."
|
||||
echo ""
|
||||
|
||||
stress --vm 4 --vm-bytes 4G --vm-keep --timeout 300 &
|
||||
STRESS_PID=$!
|
||||
sleep 2
|
||||
|
||||
START=$(date +%s)
|
||||
while pgrep -x stress > /dev/null 2>&1; do
|
||||
sleep 5
|
||||
NOW=$(date +%s)
|
||||
ELAPSED=$((NOW - START))
|
||||
|
||||
if [ $ELAPSED -ge 120 ]; then
|
||||
echo "FAILED: RAM stress not killed after 2 minutes"
|
||||
pkill stress 2>/dev/null || true
|
||||
break
|
||||
fi
|
||||
|
||||
if ! pgrep -x stress > /dev/null 2>&1; then
|
||||
echo "PASSED: RAM stress killed after ${ELAPSED}s"
|
||||
break
|
||||
fi
|
||||
|
||||
MEM_FREE=$(free -m | awk '/Mem:/ {print $7}')
|
||||
echo "[${ELAPSED}s] Stress running, ${MEM_FREE}MB free"
|
||||
done
|
||||
|
||||
# Restore original config
|
||||
echo ""
|
||||
echo "Restoring original CPU threshold (${ORIG_CYCLES} cycles)..."
|
||||
sed -i "s/for 2 cycles/for ${ORIG_CYCLES} cycles/" /etc/monit/conf.d/cpu-killer
|
||||
monit reload
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo "Tests Complete"
|
||||
echo "============================================"
|
||||
48
uninstall.sh
Executable file
48
uninstall.sh
Executable file
@@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Runaway Process Killer - Uninstallation Script
|
||||
|
||||
echo "============================================"
|
||||
echo "Runaway Process Killer - Uninstallation"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Check root
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "Error: Please run as root (sudo ./uninstall.sh)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
read -p "This will remove earlyoom and monit. Continue? [y/N] " -n 1 -r
|
||||
echo ""
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "Aborted."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "[1/4] Stopping services..."
|
||||
systemctl stop earlyoom monit 2>/dev/null || true
|
||||
|
||||
echo "[2/4] Disabling services..."
|
||||
systemctl disable earlyoom monit 2>/dev/null || true
|
||||
|
||||
echo "[3/4] Removing packages..."
|
||||
apt-get remove -y earlyoom monit
|
||||
|
||||
echo "[4/4] Cleaning up config files..."
|
||||
rm -f /usr/local/bin/kill-top-cpu.sh
|
||||
rm -f /usr/local/bin/kill-orphan-claude.sh
|
||||
rm -f /etc/monit/conf.d/cpu-killer
|
||||
rm -f /etc/monit/conf.d/orphan-claude-killer
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo "Uninstallation Complete!"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo "Note: /etc/default/earlyoom and /etc/monit/monitrc may still exist."
|
||||
echo "Remove manually if needed:"
|
||||
echo " sudo rm -f /etc/default/earlyoom"
|
||||
echo " sudo apt purge monit # removes /etc/monit/"
|
||||
echo ""
|
||||
Reference in New Issue
Block a user