Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Optimization & Scalability

Make systems faster without breaking them. Measure, optimize the right thing, verify improvements.


Purpose

Performance matters:

  • Users leave sites that are slow (every 100ms delay = 1% users gone)
  • Slow systems cost money (more servers, more bandwidth)
  • Performance bugs are production bugs (optimize before scaling)

Key principle: Measure first, optimize what matters, prove it works.

Mindset: Performance optimization requires /pb-preamble thinking (measure, challenge assumptions) and /pb-design-rules thinking (especially Optimization: prototype before polishing, measure before optimizing).

Question assumptions about slowness. Challenge whether optimization is worth the complexity cost. Measure before and after-don’t assume. Surface trade-offs explicitly (speed vs. maintainability, simplicity vs. performance).

Resource Hint: sonnet - Performance optimization follows structured measurement and analysis workflows.


When to Optimize

[NO] DON’T Optimize:

  • Too early: Before you have users / load
  • Without measurement: Guessing slows you down more
  • Working features: If it works fine for current users, leave it
  • Premature: “This might be slow someday”
  • Diminishing returns: Optimizing 1% of total time

[YES] DO Optimize:

  • When users complain: “Site is slow”
  • When metrics show problem: P99 latency > target
  • When load tests show bottleneck: Load test reveals breaking point
  • When cost is high: More servers than should be needed
  • Hot paths: Code that runs for every user request

Performance Profiling: Find the Problem

Rule 1: Measure First

Most developers guess wrong about what’s slow.

Without profiling (80% wrong):
  "The database must be slow"
  → Actually: JSON serialization is slow (60% of time)

With profiling (100% correct):
  "Database queries are 15% of time, JSON serialization is 60%"
  → Optimize JSON serialization first (biggest payoff)

Tools by Layer

Frontend Performance:

  • Chrome DevTools > Performance tab (record, identify slow frames)
  • Lighthouse (scores performance, provides fixes)
  • WebPageTest (waterfall chart of load time)
  • Bundle analyzer (webpack-bundle-analyzer shows package size)

Backend Performance:

  • Profilers: py-spy (Python), node –prof (Node), JProfiler (Java)
  • Benchmarking: timeit (Python), benchmark (Node), JMH (Java)
  • Database: EXPLAIN ANALYZE (query plan), slow query log
  • Tracing: See /pb-observability for OpenTelemetry

Load Testing:

  • ab (Apache Bench) - simple HTTP load
  • wrk - fast, scriptable load testing
  • k6 - load testing as code
  • Locust - Python-based, distributed load testing

Profiling Example: Python

# Quick profiling with cProfile
import cProfile
import pstats

cProfile.run('my_function()', 'output.prof')
stats = pstats.Stats('output.prof')
stats.sort_stats('cumulative').print_stats(10)  # Show top 10 by time

# Result:
#   ncalls  tottime  cumtime
#   100     0.050    2.340  <- Slow! 2.3 seconds per 100 calls
#   100000  1.500    1.800  <- Hot! 1.8 seconds across 100k calls

Profiling Example: Node.js

# Run with profiler
node --prof app.js

# Process output
node --prof-process isolate-*.log > profile.txt

# Shows:
# [Shared libraries]: 50ms
# app.js:123 handleRequest(): 450ms  <- HOT SPOT
# database.js:45 query(): 320ms      <- Second hottest

Common Performance Bottlenecks

Bottleneck 1: Database Queries (Often 60-80% of time)

Symptoms:

  • P99 latency high
  • Database CPU at 100%
  • Slow query log full

Root causes:

1. N+1 queries: Loop and query inside loop
   Bad:    for user in users:
             user.orders = db.query("SELECT * FROM orders WHERE user_id = ?")
   Good:   orders = db.query("SELECT * FROM orders WHERE user_id IN (?)", user_ids)

2. Missing index: Query scans whole table
   Bad:    SELECT * FROM users WHERE created_at > ?  (no index)
   Good:   CREATE INDEX idx_created_at ON users(created_at)

3. SELECT * with large tables
   Bad:    SELECT * FROM users  (returns 50 columns, but you use 5)
   Good:   SELECT id, name, email FROM users

4. Slow JOIN: Join large tables with poor keys
   Bad:    SELECT * FROM users JOIN orders ON users.id = orders.user_id WHERE status IN (...)
   Good:   Add index on orders(user_id, status)

Solutions:

# N+1 solution: Batch load
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
orders = db.query("SELECT * FROM orders WHERE user_id IN ?", user_ids)
for user in users:
    user.orders = [o for o in orders if o.user_id == user.id]

# Missing index solution
db.execute("CREATE INDEX idx_email ON users(email)")
db.execute("ANALYZE TABLE users")  # Update stats

# SELECT * solution
cursor.execute("SELECT id, name, email FROM users")  # Only columns needed

Bottleneck 2: Serialization/Deserialization (Often 30-40% of time)

Symptoms:

  • CPU high but database responsive
  • Memory usage spiking
  • Frontend slow receiving responses

Root causes:

1. Serializing large objects
   Bad:    return User.objects.all()  (serializes 100k users)
   Good:   return User.objects.all()[:100]  (paginate)

2. JSON serialization inefficient
   Bad:    json.dumps(large_dict)  (Python's json is slow)
   Good:   import ujson; ujson.dumps(large_dict)  (3x faster)

3. Encoding/decoding mismatch
   Bad:    UTF-8 → Latin-1 → UTF-8 conversion
   Good:   Use UTF-8 consistently

4. Compression disabled
   Bad:    Response Content-Length: 5MB (no compression)
   Good:   Content-Encoding: gzip, Size: 500KB (100x smaller)

Solutions:

# Pagination solution
# Before: 10 seconds to serialize 100k users
users = User.objects.all()  # DON'T
users = User.objects.all()[:100]  # DO

# Fast JSON solution
import ujson  # or orjson, which is even faster
response = ujson.dumps(data)  # 3-5x faster

# Enable compression
from flask import Flask, compress
app = Flask(__name__)
compress = Compress(app)  # Automatic gzip on responses

# Selective serialization
# Bad: serialize everything
return User.to_dict()  # includes password, tokens, etc

# Good: serialize only needed fields
return {
    'id': user.id,
    'name': user.name,
    'email': user.email
}

Bottleneck 3: Caching Missing (40-60% speedup possible)

Symptoms:

  • Same queries running repeatedly
  • Same calculations done repeatedly
  • Database CPU high from repeated work

Solutions by layer:

1. HTTP Caching (Fastest, on client)

# Tell browsers to cache responses
@app.route('/api/products/<id>')
def get_product(id):
    resp = make_response(product_json)
    resp.cache_control.max_age = 3600  # Cache 1 hour
    resp.cache_control.public = True   # OK to cache in CDN
    return resp

# Result: 99% of requests served from browser cache, 0 DB queries

2. CDN Caching (Very fast, geographic distribution)

# Cloudflare, CloudFront, Fastly configure:
# - Cache static assets forever (add hash to filename for updates)
# - Cache API responses (5-60 minutes)
# - Gzip compression automatic

GET /api/products/123
# First request: 200ms (origin)
# Next 1000 requests: 5ms (CDN in user's region)

3. Application Caching (In-memory, very fast)

# Redis cache expensive queries
from flask_caching import Cache

cache = Cache(app, config={'CACHE_TYPE': 'redis'})

@app.route('/api/trending')
@cache.cached(timeout=300)  # Cache 5 minutes
def get_trending():
    # This query runs once every 5 minutes (not 1000x/minute)
    return db.query("SELECT * FROM products ORDER BY views DESC LIMIT 10")

# Result: 30 seconds → 30ms (1000x faster)

Cache invalidation: See /pb-adr for cache invalidation patterns (event-driven, TTL, manual, hybrid).

Bottleneck 4: Inefficient Algorithms (Often 10-20% of time)

Symptoms:

  • CPU high, database responsive
  • Scales poorly (10x users → 100x slower)
  • Memory usage high

Examples:

# BAD: O(n²) algorithm
def find_duplicates(items):
    result = []
    for i, item1 in enumerate(items):
        for j, item2 in enumerate(items):  # WRONG: Inner loop
            if item1 == item2 and i != j:
                result.append(item1)
    return result
# 10,000 items = 100M comparisons

# GOOD: O(n) algorithm
def find_duplicates(items):
    seen = set()
    duplicates = set()
    for item in items:
        if item in seen:
            duplicates.add(item)
        seen.add(item)
    return duplicates
# 10,000 items = 10k comparisons (10,000x faster!)

# BAD: String concatenation in loop
result = ""
for line in lines:
    result += line  # Creates new string each time, O(n²)

# GOOD: List join
result = "".join(lines)  # Single allocation, O(n)

Bottleneck 5: Synchronous I/O (Often 70-90% of time)

Symptoms:

  • Server CPU low (40% used)
  • But slow requests (P99 > 1s)
  • Can’t handle concurrent users

Root cause: Waiting for I/O (database, API calls, disk)

Solutions:

# BAD: Synchronous, blocks everything
@app.route('/checkout')
def checkout():
    validate_cart()        # 50ms
    charge_card()          # 500ms (blocked, waiting for payment processor)
    send_email()           # 200ms (blocked, waiting for mail server)
    return "Done"          # 750ms total

# GOOD: Async, parallelizes I/O
import asyncio

@app.route('/checkout')
async def checkout():
    await asyncio.gather(
        validate_cart(),   # 50ms
        charge_card(),     # 500ms (parallel)
        send_email()       # 200ms (parallel)
    )
    return "Done"          # 500ms total (payment time, email parallel)

# GOOD: Queue for non-blocking
@app.route('/checkout')
def checkout():
    validate_cart()        # 50ms
    charge_card()          # 500ms
    queue_email_job.delay(user_id)  # 5ms (async task queue)
    return "Done"          # 555ms (email sent in background)

Load Testing: Find Breaking Point

Before Optimizing

Run load test to find what breaks under load.

# Simple load test: 1000 requests, 10 concurrent
wrk -t 10 -c 10 -d 10s http://localhost:8000/

# Results:
Requests/sec:   150.5  (good, or slow?)
Latency avg:    66ms
Latency max:    250ms
99th percentile: 195ms

# Question: Is this good?
# Answer: Depends on target
#   If target is 1000 req/sec: FAIL (150 vs 1000)
#   If target is 500 users: FAIL (need to handle 500x more)
#   If current is 50 req/sec: PASS (3x improvement)

Load Test Your Bottleneck

# Test specific endpoint known to be slow
wrk -t 20 -c 100 -d 60s -s optimize.lua http://localhost:8000/api/search

# Results before optimization: 150 req/sec, P99 = 800ms
# Run optimization...
# Results after optimization: 500 req/sec, P99 = 150ms
# Improvement: 3.3x throughput, 5.3x latency (GOOD)

Optimization by Layer

Layer 1: Frontend (Browsers, 30-50% of load time)

Don’t optimize if:

  • Server latency is 500ms, frontend is 100ms (server is bigger problem)
  • Users complain about features, not speed (add features first)

Do optimize if:

  • Frontend is > 40% of total time
  • Users complain “site feels slow” (even if server fast)
  • Lighthouse score is red (< 50)

Quick wins:

1. Lazy load images (Intersection Observer)
   Before: Load 50 images on page load
   After: Load only visible images, rest on scroll
   Impact: 50% faster initial load

2. Code splitting (load JS only for pages needed)
   Before: app.js (5MB) - load everything
   After: app.js (500KB) + pages/*js (500KB each)
   Impact: 90% faster initial page load

3. Defer non-critical CSS
   Before: <link rel="stylesheet" href="style.css">
   After: <link rel="stylesheet" href="critical.css"> (in head)
          <link rel="stylesheet" href="non-critical.css"> (defer loading)
   Impact: 30% faster first paint

4. Remove unused dependencies
   Before: moment.js (67KB) for date formatting
   After: date-fns (5KB) or native Date
   Impact: 90% smaller bundle

Layer 2: API Server (30-50% of load time)

Quick wins:

1. Add caching (HTTP, CDN, Redis)
   Before: Every request hits database
   After: 95% served from cache
   Impact: 10-100x faster

2. Add compression (gzip)
   Before: 5MB response
   After: 500KB (gzipped)
   Impact: 10x smaller, 100x faster on slow networks

3. Batch API calls (N+1 → N/10)
   Before: 100 requests to load 100 users' orders
   After: 10 batch requests
   Impact: 90% fewer connections

4. Increase parallelization (async/await)
   Before: Chain calls (call A, then B, then C = A+B+C time)
   After: Parallel calls (call A, B, C together = MAX(A,B,C) time)
   Impact: 50-70% faster if A=B=C

Layer 3: Database (40-70% of load time)

Quick wins:

1. Add indexes
   Before: Full table scan 50,000 rows
   After: Index lookup 1 row
   Impact: 100-1000x faster

2. Fix N+1 queries
   Before: 100 separate queries for 100 items
   After: 1 query with batch load
   Impact: 100x fewer DB connections

3. Denormalize data
   Before: JOIN 5 tables to get one row of data
   After: Precompute and cache joined result
   Impact: 10-50x faster queries

4. Shard data
   Before: All 100M users in one table
   After: 100 shards (1M users each)
   Impact: Parallel queries, better scalability

Layer 4: Infrastructure (Rare, only if other layers maxed)

Quick wins:

1. Increase instance size (vertical scaling)
   Before: t2.small (1 CPU, 1GB RAM)
   After: t3.xlarge (4 CPU, 16GB RAM)
   Impact: 3-4x more throughput (diminishing)

2. Add more instances (horizontal scaling)
   Before: 1 server serving 1000 users
   After: 10 servers serving 1000 users each
   Impact: Linear scaling (10x throughput)

3. Use better algorithm for infrastructure
   Before: Single database with replicas
   After: Sharded database (parallel queries)
   Impact: 10-100x more throughput

Optimization Checklist

Before Optimizing

  • Measure current performance (baseline)
  • Define target (P99 < 200ms? Throughput > 10k req/sec?)
  • Profile to find bottleneck
  • Run load test to see breaking point

While Optimizing

  • Change one thing at a time (measure impact of each)
  • Run load test after each change
  • Keep track of improvements
  • Don’t over-optimize (diminishing returns)

After Optimizing

  • Verify improvement with load test
  • Set up monitoring for metric (so it doesn’t regress)
  • Document changes (what changed, why, what improved)
  • Check side effects (did you break something else?)

Common Optimization Mistakes

[NO] Mistake 1: Optimize Wrong Layer

Problem: "Website slow"
Blind optimization: Spend 2 weeks optimizing frontend
Measure first: Actually, frontend 100ms, API 800ms
Right fix: Optimize API (80% of problem)
Lesson: Measure first, optimize biggest impact

[NO] Mistake 2: Optimize Before Growth

Situation: Brand new startup, 10 users
Blind: Spend 3 months optimizing for 10k users
Reality: Spend time on features instead
Lesson: Optimize when you need to (when traffic grows or metrics slip)

[NO] Mistake 3: Premature Microservices

Problem: App slow
Blind: "Let's use microservices!"
Reality: Microservices slower (network latency between services)
Lesson: Monolith fast, microservices slow (use when you need independent scaling)

[NO] Mistake 4: Cache Everything

Problem: "Cache will make it faster"
Blind: Cache expensive query (updates hourly)
Reality: Cache becomes stale, users see wrong data
Lesson: Cache read-heavy data, not mutable data

Integration with Playbook

Part of design and deployment:

  • /pb-guide - Section 4.4 covers performance requirements
  • /pb-observability - Set up monitoring to catch performance regressions
  • /pb-adr - Architecture decisions affect performance
  • /pb-release - Load test before releasing at scale

Related Commands:

  • /pb-observability - Monitor P99 latency and throughput
  • /pb-guide - Performance requirements during design phase
  • /pb-incident - Performance degradation is incident (if sudden)

Performance Optimization Checklist

Planning Phase

  • Define performance targets (P99, throughput, user experience)
  • Benchmark current state (baseline)
  • Profile to identify bottleneck
  • Run load test to see current breaking point

Optimization Phase

  • Optimize Layer 1 (if 40%+ of time): Frontend, bundle size
  • Optimize Layer 2 (if 40%+ of time): API caching, compression, batching
  • Optimize Layer 3 (if 40%+ of time): Database indexes, N+1 fixes
  • Optimize Layer 4 (if other layers maxed): Infrastructure scaling
  • Measure impact after each change
  • Don’t over-optimize (diminishing returns)

Verification Phase

  • Load test reaches target throughput
  • P99 latency < target
  • No side effects (features still work)
  • Set up monitoring to track metric
  • Document changes (what and why)

  • /pb-observability - Set up monitoring to track performance metrics
  • /pb-review-hygiene - Code review for performance regressions
  • /pb-patterns-core - Architectural patterns that affect performance

Created: 2026-01-11 | Category: Planning | Tier: M/L