Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

System Health Check

Diagnose system health issues: disk space, memory pressure, CPU usage, and common developer environment problems. The “what’s wrong” before “how to fix.”

Platform: macOS (with Linux alternatives noted) Use Case: “Something’s slow” / “Builds are failing” / “Machine feels sluggish”

Mindset: Design Rules say “fail noisily and early” - surface system problems before they cascade.

Resource Hint: sonnet - System health diagnostics with accurate assessment.

When to Use

  • Machine feels slow or unresponsive during development
  • Builds or tests are failing unexpectedly
  • Before running storage cleanup or tool updates (baseline check)

Execution Flow

┌─────────────────────────────────────────────────────────────┐
│  1. DISK         Check available space, large consumers     │
│         ↓                                                   │
│  2. MEMORY       Check RAM usage, swap pressure             │
│         ↓                                                   │
│  3. CPU          Check load, runaway processes              │
│         ↓                                                   │
│  4. PROCESSES    Find resource hogs                         │
│         ↓                                                   │
│  5. DEV TOOLS    Check dev environment health               │
│         ↓                                                   │
│  6. REPORT       Summary with recommendations               │
└─────────────────────────────────────────────────────────────┘

Quick Health Check

Run this for a fast overview:

echo "=== Disk ===" && df -h / | tail -1
echo "=== Memory ===" && vm_stat | head -5
echo "=== CPU Load ===" && uptime
echo "=== Top Processes ===" && ps aux | sort -nrk 3,3 | head -6

Step 1: Disk Health

Check Available Space

# Overall disk usage
df -h /

# Check if approaching limits
USAGE=$(df -h / | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$USAGE" -gt 80 ]; then
  echo "WARNING: Disk usage at ${USAGE}%"
fi

Find Large Directories

# Top 10 largest directories in home
du -sh ~/* 2>/dev/null | sort -hr | head -10

# Developer-specific large directories
du -sh ~/Library/Developer 2>/dev/null
du -sh ~/Library/Caches 2>/dev/null
du -sh ~/.docker 2>/dev/null
du -sh node_modules 2>/dev/null

Thresholds:

UsageStatusAction
< 70%HealthyNone needed
70-85%WarningConsider /pb-storage
> 85%CriticalRun /pb-storage immediately

Step 2: Memory Health

Check Memory Pressure

# macOS memory stats
vm_stat

# Human-readable summary
vm_stat | awk '
  /Pages free/ {free=$3}
  /Pages active/ {active=$3}
  /Pages inactive/ {inactive=$3}
  /Pages wired/ {wired=$3}
  END {
    page=4096/1024/1024
    print "Free: " free*page " GB"
    print "Active: " active*page " GB"
    print "Wired: " wired*page " GB"
  }
'

# Check for memory pressure (macOS)
memory_pressure

Check Swap Usage

# Swap usage (high swap = memory pressure)
sysctl vm.swapusage

# If swap is being used heavily, memory is constrained

Find Memory Hogs

# Top 10 by memory usage
ps aux --sort=-%mem | head -11

# Or using top (snapshot)
top -l 1 -n 10 -o mem

Thresholds:

IndicatorHealthyWarningCritical
Memory PressureNormalWarnCritical (yellow/red in Activity Monitor)
Swap Used< 1GB1-4GB> 4GB
Free + Inactive> 2GB1-2GB< 1GB

Step 3: CPU Health

Check Load Average

# Current load
uptime

# Load interpretation:
# - Load < cores: healthy
# - Load = cores: fully utilized
# - Load > cores: overloaded
sysctl -n hw.ncpu  # Number of cores

Find CPU Hogs

# Top 10 by CPU
ps aux --sort=-%cpu | head -11

# Real-time view (quit with 'q')
top -o cpu

# Find processes using > 50% CPU
ps aux | awk '$3 > 50 {print $0}'

Check for Runaway Processes

# Processes running > 1 hour with high CPU
ps -eo pid,etime,pcpu,comm | awk '$3 > 50 && $2 ~ /-/ {print}'

Thresholds:

CoresHealthy LoadWarningOverloaded
8< 66-10> 10
10< 88-12> 12
12< 1010-15> 15

Step 4: Process Analysis

Find Resource Hogs

# Combined CPU + Memory view
ps aux | awk 'NR==1 || $3 > 10 || $4 > 5' | head -20

Common Developer Culprits

# Check known resource hogs
for proc in "node" "webpack" "docker" "java" "Xcode" "Simulator" "Chrome"; do
  pgrep -f "$proc" > /dev/null && echo "$proc is running"
done

# Docker specifically
docker stats --no-stream 2>/dev/null | head -10

Zombie Processes

# Find zombie processes
ps aux | awk '$8 ~ /Z/ {print}'

Step 5: Developer Environment Health

Check Critical Tools

echo "=== Git ===" && git --version
echo "=== Node ===" && node --version 2>/dev/null || echo "Not installed"
echo "=== npm ===" && npm --version 2>/dev/null || echo "Not installed"
echo "=== Python ===" && python3 --version 2>/dev/null || echo "Not installed"
echo "=== Docker ===" && docker --version 2>/dev/null || echo "Not installed/running"
echo "=== Homebrew ===" && brew --version 2>/dev/null | head -1 || echo "Not installed"

Check for Outdated Tools

# Homebrew outdated
brew outdated 2>/dev/null | head -10

# npm outdated globals
npm outdated -g 2>/dev/null | head -10

Check Docker Health

# Docker disk usage
docker system df 2>/dev/null

# Docker running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null

Check Xcode (if installed)

# Xcode version and path
xcode-select -p 2>/dev/null && xcodebuild -version 2>/dev/null | head -2

# Xcode disk usage
du -sh ~/Library/Developer/Xcode 2>/dev/null

Step 6: Generate Report

After running diagnostics, summarize:

=== SYSTEM HEALTH REPORT ===

DISK:     [OK/WARNING/CRITICAL] - XX% used (XX GB free)
MEMORY:   [OK/WARNING/CRITICAL] - XX GB active, XX GB swap
CPU:      [OK/WARNING/CRITICAL] - Load: X.XX (X cores)
DOCKER:   [OK/WARNING/N/A] - XX GB used

TOP RESOURCE CONSUMERS:
1. Process A - XX% CPU, XX% MEM
2. Process B - XX% CPU, XX% MEM
3. Process C - XX% CPU, XX% MEM

RECOMMENDATIONS:
- [ ] Run /pb-storage to free disk space
- [ ] Kill process X (runaway)
- [ ] Restart Docker (high memory)

User Interaction Flow

When executing this playbook:

  1. Run full diagnostic - All checks above
  2. Present findings - Show health status per category
  3. Prioritize issues - Critical first, then warnings
  4. Offer remediation - Link to relevant playbooks

AskUserQuestion Structure

After Report:

Question: "What would you like to address first?"
Options:
  - Free disk space (/pb-storage)
  - Kill resource hogs (I'll show which)
  - Update outdated tools (/pb-update)
  - Just wanted the report, thanks

Automated Health Script

Save as ~/bin/doctor.sh:

#!/bin/bash

echo "=== DISK ==="
df -h / | tail -1

echo -e "\n=== MEMORY ==="
memory_pressure 2>/dev/null || vm_stat | head -5

echo -e "\n=== CPU LOAD ==="
uptime

echo -e "\n=== TOP PROCESSES (CPU) ==="
ps aux --sort=-%cpu | head -6

echo -e "\n=== TOP PROCESSES (MEM) ==="
ps aux --sort=-%mem | head -6

echo -e "\n=== DOCKER ==="
docker system df 2>/dev/null || echo "Not running"

echo -e "\n=== OUTDATED BREW ==="
brew outdated 2>/dev/null | head -5 || echo "N/A"

Troubleshooting

SymptomLikely CauseSolution
High CPU, nothing obviousBackground indexing (Spotlight, Time Machine)Wait, or exclude dev dirs from Spotlight
High memory, no heavy appsMemory leaks in long-running processesRestart Docker, browsers, IDEs
Disk full suddenlynode_modules, Docker images, XcodeRun /pb-storage
Everything slowMultiple causesCheck all metrics, address worst first
Fan running constantlyHigh CPU processFind and kill, or improve ventilation

  • /pb-storage - Free disk space
  • /pb-ports - Check port usage and conflicts
  • /pb-update - Update outdated tools
  • /pb-debug - Deep debugging methodology
  • /pb-git-hygiene - Git repository health audit (branches, large objects, secrets)

Run monthly or when machine feels slow. Good first step before any cleanup.