System Health Check

Diagnose system health issues: disk space, memory pressure, CPU usage, and common developer environment problems. The “what’s wrong” before “how to fix.”

Platform: macOS (with Linux alternatives noted) Use Case: “Something’s slow” / “Builds are failing” / “Machine feels sluggish”

Mindset: Design Rules say “fail noisily and early” - surface system problems before they cascade.

Resource Hint: sonnet - System health diagnostics with accurate assessment.

When to Use

Machine feels slow or unresponsive during development
Builds or tests are failing unexpectedly
Before running storage cleanup or tool updates (baseline check)

Execution Flow

┌─────────────────────────────────────────────────────────────┐
│  1. DISK         Check available space, large consumers     │
│         ↓                                                   │
│  2. MEMORY       Check RAM usage, swap pressure             │
│         ↓                                                   │
│  3. CPU          Check load, runaway processes              │
│         ↓                                                   │
│  4. PROCESSES    Find resource hogs                         │
│         ↓                                                   │
│  5. DEV TOOLS    Check dev environment health               │
│         ↓                                                   │
│  6. REPORT       Summary with recommendations               │
└─────────────────────────────────────────────────────────────┘

Quick Health Check

Run this for a fast overview:

echo "=== Disk ===" && df -h / | tail -1
echo "=== Memory ===" && vm_stat | head -5
echo "=== CPU Load ===" && uptime
echo "=== Top Processes ===" && ps aux | sort -nrk 3,3 | head -6

Step 1: Disk Health

Check Available Space

# Overall disk usage
df -h /

# Check if approaching limits
USAGE=$(df -h / | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$USAGE" -gt 80 ]; then
  echo "WARNING: Disk usage at ${USAGE}%"
fi

Find Large Directories

# Top 10 largest directories in home
du -sh ~/* 2>/dev/null | sort -hr | head -10

# Developer-specific large directories
du -sh ~/Library/Developer 2>/dev/null
du -sh ~/Library/Caches 2>/dev/null
du -sh ~/.docker 2>/dev/null
du -sh node_modules 2>/dev/null

Thresholds:

Usage	Status	Action
< 70%	Healthy	None needed
70-85%	Warning	Consider `/pb-storage`
> 85%	Critical	Run `/pb-storage` immediately

Step 2: Memory Health

Check Memory Pressure

# macOS memory stats
vm_stat

# Human-readable summary
vm_stat | awk '
  /Pages free/ {free=$3}
  /Pages active/ {active=$3}
  /Pages inactive/ {inactive=$3}
  /Pages wired/ {wired=$3}
  END {
    page=4096/1024/1024
    print "Free: " free*page " GB"
    print "Active: " active*page " GB"
    print "Wired: " wired*page " GB"
  }
'

# Check for memory pressure (macOS)
memory_pressure

Check Swap Usage

# Swap usage (high swap = memory pressure)
sysctl vm.swapusage

# If swap is being used heavily, memory is constrained

Find Memory Hogs

# Top 10 by memory usage
ps aux --sort=-%mem | head -11

# Or using top (snapshot)
top -l 1 -n 10 -o mem

Thresholds:

Indicator	Healthy	Warning	Critical
Memory Pressure	Normal	Warn	Critical (yellow/red in Activity Monitor)
Swap Used	< 1GB	1-4GB	> 4GB
Free + Inactive	> 2GB	1-2GB	< 1GB

Step 3: CPU Health

Check Load Average

# Current load
uptime

# Load interpretation:
# - Load < cores: healthy
# - Load = cores: fully utilized
# - Load > cores: overloaded
sysctl -n hw.ncpu  # Number of cores

Find CPU Hogs

# Top 10 by CPU
ps aux --sort=-%cpu | head -11

# Real-time view (quit with 'q')
top -o cpu

# Find processes using > 50% CPU
ps aux | awk '$3 > 50 {print $0}'

Check for Runaway Processes

# Processes running > 1 hour with high CPU
ps -eo pid,etime,pcpu,comm | awk '$3 > 50 && $2 ~ /-/ {print}'

Thresholds:

Cores	Healthy Load	Warning	Overloaded
8	< 6	6-10	> 10
10	< 8	8-12	> 12
12	< 10	10-15	> 15

Step 4: Process Analysis

Find Resource Hogs

# Combined CPU + Memory view
ps aux | awk 'NR==1 || $3 > 10 || $4 > 5' | head -20

Common Developer Culprits

# Check known resource hogs
for proc in "node" "webpack" "docker" "java" "Xcode" "Simulator" "Chrome"; do
  pgrep -f "$proc" > /dev/null && echo "$proc is running"
done

# Docker specifically
docker stats --no-stream 2>/dev/null | head -10

Zombie Processes

# Find zombie processes
ps aux | awk '$8 ~ /Z/ {print}'

Step 5: Developer Environment Health

Check Critical Tools

echo "=== Git ===" && git --version
echo "=== Node ===" && node --version 2>/dev/null || echo "Not installed"
echo "=== npm ===" && npm --version 2>/dev/null || echo "Not installed"
echo "=== Python ===" && python3 --version 2>/dev/null || echo "Not installed"
echo "=== Docker ===" && docker --version 2>/dev/null || echo "Not installed/running"
echo "=== Homebrew ===" && brew --version 2>/dev/null | head -1 || echo "Not installed"

Check for Outdated Tools

# Homebrew outdated
brew outdated 2>/dev/null | head -10

# npm outdated globals
npm outdated -g 2>/dev/null | head -10

Check Docker Health

# Docker disk usage
docker system df 2>/dev/null

# Docker running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null

Check Xcode (if installed)

# Xcode version and path
xcode-select -p 2>/dev/null && xcodebuild -version 2>/dev/null | head -2

# Xcode disk usage
du -sh ~/Library/Developer/Xcode 2>/dev/null

Step 6: Generate Report

After running diagnostics, summarize:

=== SYSTEM HEALTH REPORT ===

DISK:     [OK/WARNING/CRITICAL] - XX% used (XX GB free)
MEMORY:   [OK/WARNING/CRITICAL] - XX GB active, XX GB swap
CPU:      [OK/WARNING/CRITICAL] - Load: X.XX (X cores)
DOCKER:   [OK/WARNING/N/A] - XX GB used

TOP RESOURCE CONSUMERS:
1. Process A - XX% CPU, XX% MEM
2. Process B - XX% CPU, XX% MEM
3. Process C - XX% CPU, XX% MEM

RECOMMENDATIONS:
- [ ] Run /pb-storage to free disk space
- [ ] Kill process X (runaway)
- [ ] Restart Docker (high memory)

User Interaction Flow

When executing this playbook:

Run full diagnostic - All checks above
Present findings - Show health status per category
Prioritize issues - Critical first, then warnings
Offer remediation - Link to relevant playbooks

AskUserQuestion Structure

After Report:

Question: "What would you like to address first?"
Options:
  - Free disk space (/pb-storage)
  - Kill resource hogs (I'll show which)
  - Update outdated tools (/pb-update)
  - Just wanted the report, thanks

Automated Health Script

Save as ~/bin/doctor.sh:

#!/bin/bash

echo "=== DISK ==="
df -h / | tail -1

echo -e "\n=== MEMORY ==="
memory_pressure 2>/dev/null || vm_stat | head -5

echo -e "\n=== CPU LOAD ==="
uptime

echo -e "\n=== TOP PROCESSES (CPU) ==="
ps aux --sort=-%cpu | head -6

echo -e "\n=== TOP PROCESSES (MEM) ==="
ps aux --sort=-%mem | head -6

echo -e "\n=== DOCKER ==="
docker system df 2>/dev/null || echo "Not running"

echo -e "\n=== OUTDATED BREW ==="
brew outdated 2>/dev/null | head -5 || echo "N/A"

Troubleshooting

Symptom	Likely Cause	Solution
High CPU, nothing obvious	Background indexing (Spotlight, Time Machine)	Wait, or exclude dev dirs from Spotlight
High memory, no heavy apps	Memory leaks in long-running processes	Restart Docker, browsers, IDEs
Disk full suddenly	node_modules, Docker images, Xcode	Run `/pb-storage`
Everything slow	Multiple causes	Check all metrics, address worst first
Fan running constantly	High CPU process	Find and kill, or improve ventilation

/pb-storage - Free disk space
/pb-ports - Check port usage and conflicts
/pb-update - Update outdated tools
/pb-debug - Deep debugging methodology
/pb-git-hygiene - Git repository health audit (branches, large objects, secrets)

Run monthly or when machine feels slow. Good first step before any cleanup.

Keyboard shortcuts

Engineering Playbook