Performance Optimization & Scalability

Make systems faster without breaking them. Measure, optimize the right thing, verify improvements.

Purpose

Performance matters:

Users leave sites that are slow (every 100ms delay = 1% users gone)
Slow systems cost money (more servers, more bandwidth)
Performance bugs are production bugs (optimize before scaling)

Key principle: Measure first, optimize what matters, prove it works.

Mindset: Performance optimization requires /pb-preamble thinking (measure, challenge assumptions) and /pb-design-rules thinking (especially Optimization: prototype before polishing, measure before optimizing).

Question assumptions about slowness. Challenge whether optimization is worth the complexity cost. Measure before and after-don’t assume. Surface trade-offs explicitly (speed vs. maintainability, simplicity vs. performance).

Resource Hint: sonnet - Performance optimization follows structured measurement and analysis workflows.

When to Optimize

[NO] DON’T Optimize:

Too early: Before you have users / load
Without measurement: Guessing slows you down more
Working features: If it works fine for current users, leave it
Premature: “This might be slow someday”
Diminishing returns: Optimizing 1% of total time

[YES] DO Optimize:

When users complain: “Site is slow”
When metrics show problem: P99 latency > target
When load tests show bottleneck: Load test reveals breaking point
When cost is high: More servers than should be needed
Hot paths: Code that runs for every user request

Performance Profiling: Find the Problem

Rule 1: Measure First

Most developers guess wrong about what’s slow.

Without profiling (80% wrong):
  "The database must be slow"
  → Actually: JSON serialization is slow (60% of time)

With profiling (100% correct):
  "Database queries are 15% of time, JSON serialization is 60%"
  → Optimize JSON serialization first (biggest payoff)

Tools by Layer

Frontend Performance:

Chrome DevTools > Performance tab (record, identify slow frames)
Lighthouse (scores performance, provides fixes)
WebPageTest (waterfall chart of load time)
Bundle analyzer (webpack-bundle-analyzer shows package size)

Backend Performance:

Profilers: py-spy (Python), node –prof (Node), JProfiler (Java)
Benchmarking: timeit (Python), benchmark (Node), JMH (Java)
Database: EXPLAIN ANALYZE (query plan), slow query log
Tracing: See /pb-observability for OpenTelemetry

Load Testing:

ab (Apache Bench) - simple HTTP load
wrk - fast, scriptable load testing
k6 - load testing as code
Locust - Python-based, distributed load testing

Profiling Example: Python

# Quick profiling with cProfile
import cProfile
import pstats

cProfile.run('my_function()', 'output.prof')
stats = pstats.Stats('output.prof')
stats.sort_stats('cumulative').print_stats(10)  # Show top 10 by time

# Result:
#   ncalls  tottime  cumtime
#   100     0.050    2.340  <- Slow! 2.3 seconds per 100 calls
#   100000  1.500    1.800  <- Hot! 1.8 seconds across 100k calls

Profiling Example: Node.js

# Run with profiler
node --prof app.js

# Process output
node --prof-process isolate-*.log > profile.txt

# Shows:
# [Shared libraries]: 50ms
# app.js:123 handleRequest(): 450ms  <- HOT SPOT
# database.js:45 query(): 320ms      <- Second hottest

Common Performance Bottlenecks

Bottleneck 1: Database Queries (Often 60-80% of time)

Symptoms:

P99 latency high
Database CPU at 100%
Slow query log full

Root causes:

1. N+1 queries: Loop and query inside loop
   Bad:    for user in users:
             user.orders = db.query("SELECT * FROM orders WHERE user_id = ?")
   Good:   orders = db.query("SELECT * FROM orders WHERE user_id IN (?)", user_ids)

2. Missing index: Query scans whole table
   Bad:    SELECT * FROM users WHERE created_at > ?  (no index)
   Good:   CREATE INDEX idx_created_at ON users(created_at)

3. SELECT * with large tables
   Bad:    SELECT * FROM users  (returns 50 columns, but you use 5)
   Good:   SELECT id, name, email FROM users

4. Slow JOIN: Join large tables with poor keys
   Bad:    SELECT * FROM users JOIN orders ON users.id = orders.user_id WHERE status IN (...)
   Good:   Add index on orders(user_id, status)

Solutions:

# N+1 solution: Batch load
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
orders = db.query("SELECT * FROM orders WHERE user_id IN ?", user_ids)
for user in users:
    user.orders = [o for o in orders if o.user_id == user.id]

# Missing index solution
db.execute("CREATE INDEX idx_email ON users(email)")
db.execute("ANALYZE TABLE users")  # Update stats

# SELECT * solution
cursor.execute("SELECT id, name, email FROM users")  # Only columns needed

Bottleneck 2: Serialization/Deserialization (Often 30-40% of time)

Symptoms:

CPU high but database responsive
Memory usage spiking
Frontend slow receiving responses

Root causes:

1. Serializing large objects
   Bad:    return User.objects.all()  (serializes 100k users)
   Good:   return User.objects.all()[:100]  (paginate)

2. JSON serialization inefficient
   Bad:    json.dumps(large_dict)  (Python's json is slow)
   Good:   import ujson; ujson.dumps(large_dict)  (3x faster)

3. Encoding/decoding mismatch
   Bad:    UTF-8 → Latin-1 → UTF-8 conversion
   Good:   Use UTF-8 consistently

4. Compression disabled
   Bad:    Response Content-Length: 5MB (no compression)
   Good:   Content-Encoding: gzip, Size: 500KB (100x smaller)

Solutions:

# Pagination solution
# Before: 10 seconds to serialize 100k users
users = User.objects.all()  # DON'T
users = User.objects.all()[:100]  # DO

# Fast JSON solution
import ujson  # or orjson, which is even faster
response = ujson.dumps(data)  # 3-5x faster

# Enable compression
from flask import Flask, compress
app = Flask(__name__)
compress = Compress(app)  # Automatic gzip on responses

# Selective serialization
# Bad: serialize everything
return User.to_dict()  # includes password, tokens, etc

# Good: serialize only needed fields
return {
    'id': user.id,
    'name': user.name,
    'email': user.email
}

Bottleneck 3: Caching Missing (40-60% speedup possible)

Symptoms:

Same queries running repeatedly
Same calculations done repeatedly
Database CPU high from repeated work

Solutions by layer:

1. HTTP Caching (Fastest, on client)

# Tell browsers to cache responses
@app.route('/api/products/<id>')
def get_product(id):
    resp = make_response(product_json)
    resp.cache_control.max_age = 3600  # Cache 1 hour
    resp.cache_control.public = True   # OK to cache in CDN
    return resp

# Result: 99% of requests served from browser cache, 0 DB queries

2. CDN Caching (Very fast, geographic distribution)

# Cloudflare, CloudFront, Fastly configure:
# - Cache static assets forever (add hash to filename for updates)
# - Cache API responses (5-60 minutes)
# - Gzip compression automatic

GET /api/products/123
# First request: 200ms (origin)
# Next 1000 requests: 5ms (CDN in user's region)

3. Application Caching (In-memory, very fast)

# Redis cache expensive queries
from flask_caching import Cache

cache = Cache(app, config={'CACHE_TYPE': 'redis'})

@app.route('/api/trending')
@cache.cached(timeout=300)  # Cache 5 minutes
def get_trending():
    # This query runs once every 5 minutes (not 1000x/minute)
    return db.query("SELECT * FROM products ORDER BY views DESC LIMIT 10")

# Result: 30 seconds → 30ms (1000x faster)

Cache invalidation: See /pb-adr for cache invalidation patterns (event-driven, TTL, manual, hybrid).

Bottleneck 4: Inefficient Algorithms (Often 10-20% of time)

Symptoms:

CPU high, database responsive
Scales poorly (10x users → 100x slower)
Memory usage high

Examples:

# BAD: O(n²) algorithm
def find_duplicates(items):
    result = []
    for i, item1 in enumerate(items):
        for j, item2 in enumerate(items):  # WRONG: Inner loop
            if item1 == item2 and i != j:
                result.append(item1)
    return result
# 10,000 items = 100M comparisons

# GOOD: O(n) algorithm
def find_duplicates(items):
    seen = set()
    duplicates = set()
    for item in items:
        if item in seen:
            duplicates.add(item)
        seen.add(item)
    return duplicates
# 10,000 items = 10k comparisons (10,000x faster!)

# BAD: String concatenation in loop
result = ""
for line in lines:
    result += line  # Creates new string each time, O(n²)

# GOOD: List join
result = "".join(lines)  # Single allocation, O(n)

Bottleneck 5: Synchronous I/O (Often 70-90% of time)

Symptoms:

Server CPU low (40% used)
But slow requests (P99 > 1s)
Can’t handle concurrent users

Root cause: Waiting for I/O (database, API calls, disk)

Solutions:

# BAD: Synchronous, blocks everything
@app.route('/checkout')
def checkout():
    validate_cart()        # 50ms
    charge_card()          # 500ms (blocked, waiting for payment processor)
    send_email()           # 200ms (blocked, waiting for mail server)
    return "Done"          # 750ms total

# GOOD: Async, parallelizes I/O
import asyncio

@app.route('/checkout')
async def checkout():
    await asyncio.gather(
        validate_cart(),   # 50ms
        charge_card(),     # 500ms (parallel)
        send_email()       # 200ms (parallel)
    )
    return "Done"          # 500ms total (payment time, email parallel)

# GOOD: Queue for non-blocking
@app.route('/checkout')
def checkout():
    validate_cart()        # 50ms
    charge_card()          # 500ms
    queue_email_job.delay(user_id)  # 5ms (async task queue)
    return "Done"          # 555ms (email sent in background)

Load Testing: Find Breaking Point

Before Optimizing

Run load test to find what breaks under load.

# Simple load test: 1000 requests, 10 concurrent
wrk -t 10 -c 10 -d 10s http://localhost:8000/

# Results:
Requests/sec:   150.5  (good, or slow?)
Latency avg:    66ms
Latency max:    250ms
99th percentile: 195ms

# Question: Is this good?
# Answer: Depends on target
#   If target is 1000 req/sec: FAIL (150 vs 1000)
#   If target is 500 users: FAIL (need to handle 500x more)
#   If current is 50 req/sec: PASS (3x improvement)

Load Test Your Bottleneck

# Test specific endpoint known to be slow
wrk -t 20 -c 100 -d 60s -s optimize.lua http://localhost:8000/api/search

# Results before optimization: 150 req/sec, P99 = 800ms
# Run optimization...
# Results after optimization: 500 req/sec, P99 = 150ms
# Improvement: 3.3x throughput, 5.3x latency (GOOD)

Optimization by Layer

Layer 1: Frontend (Browsers, 30-50% of load time)

Don’t optimize if:

Server latency is 500ms, frontend is 100ms (server is bigger problem)
Users complain about features, not speed (add features first)

Do optimize if:

Frontend is > 40% of total time
Users complain “site feels slow” (even if server fast)
Lighthouse score is red (< 50)

Quick wins:

1. Lazy load images (Intersection Observer)
   Before: Load 50 images on page load
   After: Load only visible images, rest on scroll
   Impact: 50% faster initial load

2. Code splitting (load JS only for pages needed)
   Before: app.js (5MB) - load everything
   After: app.js (500KB) + pages/*js (500KB each)
   Impact: 90% faster initial page load

3. Defer non-critical CSS
   Before: <link rel="stylesheet" href="style.css">
   After: <link rel="stylesheet" href="critical.css"> (in head)
          <link rel="stylesheet" href="non-critical.css"> (defer loading)
   Impact: 30% faster first paint

4. Remove unused dependencies
   Before: moment.js (67KB) for date formatting
   After: date-fns (5KB) or native Date
   Impact: 90% smaller bundle

Layer 2: API Server (30-50% of load time)

Quick wins:

1. Add caching (HTTP, CDN, Redis)
   Before: Every request hits database
   After: 95% served from cache
   Impact: 10-100x faster

2. Add compression (gzip)
   Before: 5MB response
   After: 500KB (gzipped)
   Impact: 10x smaller, 100x faster on slow networks

3. Batch API calls (N+1 → N/10)
   Before: 100 requests to load 100 users' orders
   After: 10 batch requests
   Impact: 90% fewer connections

4. Increase parallelization (async/await)
   Before: Chain calls (call A, then B, then C = A+B+C time)
   After: Parallel calls (call A, B, C together = MAX(A,B,C) time)
   Impact: 50-70% faster if A=B=C

Layer 3: Database (40-70% of load time)

Quick wins:

1. Add indexes
   Before: Full table scan 50,000 rows
   After: Index lookup 1 row
   Impact: 100-1000x faster

2. Fix N+1 queries
   Before: 100 separate queries for 100 items
   After: 1 query with batch load
   Impact: 100x fewer DB connections

3. Denormalize data
   Before: JOIN 5 tables to get one row of data
   After: Precompute and cache joined result
   Impact: 10-50x faster queries

4. Shard data
   Before: All 100M users in one table
   After: 100 shards (1M users each)
   Impact: Parallel queries, better scalability

Layer 4: Infrastructure (Rare, only if other layers maxed)

Quick wins:

1. Increase instance size (vertical scaling)
   Before: t2.small (1 CPU, 1GB RAM)
   After: t3.xlarge (4 CPU, 16GB RAM)
   Impact: 3-4x more throughput (diminishing)

2. Add more instances (horizontal scaling)
   Before: 1 server serving 1000 users
   After: 10 servers serving 1000 users each
   Impact: Linear scaling (10x throughput)

3. Use better algorithm for infrastructure
   Before: Single database with replicas
   After: Sharded database (parallel queries)
   Impact: 10-100x more throughput

SEO & LLM Discoverability

Performance extends beyond speed. If users and AI agents can’t find your site, speed doesn’t matter. Audit discoverability alongside performance.

Search Engine Optimization

Every page has a unique <title> tag (under 60 characters, primary keyword included)
Every page has a unique <meta name="description"> (under 160 characters)
Canonical URL tag present on all pages (<link rel="canonical">)
Open Graph and Twitter Card meta tags present for social sharing
Structured data (JSON-LD) appropriate for the content type (Article, Product, FAQ, etc.)
XML sitemap exists and is current (/sitemap.xml)
robots.txt is correctly configured (not accidentally blocking important pages)

LLM & AI Agent Discoverability

llms.txt exists at site root with a clear, structured summary of the site for AI crawlers
llms-full.txt provides comprehensive site context if the site is content-heavy
robots.txt policy on AI crawlers is intentional (explicitly allow or block, don’t leave ambiguous)
Key content is in semantic HTML, not locked behind JavaScript rendering (AI crawlers may not execute JS)
Site structure is navigable via links and headings, not dependent on interactive widgets

Site Root Convention Files

robots.txt — crawler policy and sitemap reference (served at /robots.txt, not nested)
sitemap.xml — page index for search engines
llms.txt — structured site summary for AI agents
humans.txt — team, tools, and attribution (humanstxt.org convention)

All four should be at the site root, not nested under subdirectories. Verify they return their own content, not a catch-all fallback.

Why this matters: AI agents increasingly mediate how users discover and interact with products. A site invisible to AI crawlers loses a growing discovery channel. humans.txt signals craftsmanship and provides attribution context that both humans and AI agents use.

Optimization Checklist

Before Optimizing

Measure current performance (baseline)
Define target (P99 < 200ms? Throughput > 10k req/sec?)
Profile to find bottleneck
Run load test to see breaking point

While Optimizing

Change one thing at a time (measure impact of each)
Run load test after each change
Keep track of improvements
Don’t over-optimize (diminishing returns)

After Optimizing

Verify improvement with load test
Set up monitoring for metric (so it doesn’t regress)
Document changes (what changed, why, what improved)
Check side effects (did you break something else?)

Common Optimization Mistakes

[NO] Mistake 1: Optimize Wrong Layer

Problem: "Website slow"
Blind optimization: Spend 2 weeks optimizing frontend
Measure first: Actually, frontend 100ms, API 800ms
Right fix: Optimize API (80% of problem)
Lesson: Measure first, optimize biggest impact

[NO] Mistake 2: Optimize Before Growth

Situation: Brand new startup, 10 users
Blind: Spend 3 months optimizing for 10k users
Reality: Spend time on features instead
Lesson: Optimize when you need to (when traffic grows or metrics slip)

[NO] Mistake 3: Premature Microservices

Problem: App slow
Blind: "Let's use microservices!"
Reality: Microservices slower (network latency between services)
Lesson: Monolith fast, microservices slow (use when you need independent scaling)

[NO] Mistake 4: Cache Everything

Problem: "Cache will make it faster"
Blind: Cache expensive query (updates hourly)
Reality: Cache becomes stale, users see wrong data
Lesson: Cache read-heavy data, not mutable data

Integration with Playbook

Part of design and deployment:

/pb-guide - Section 4.4 covers performance requirements
/pb-observability - Set up monitoring to catch performance regressions
/pb-adr - Architecture decisions affect performance
/pb-release - Load test before releasing at scale

Related Commands:

/pb-observability - Monitor P99 latency and throughput
/pb-guide - Performance requirements during design phase
/pb-incident - Performance degradation is incident (if sudden)

Performance Optimization Checklist

Planning Phase

Define performance targets (P99, throughput, user experience)
Benchmark current state (baseline)
Profile to identify bottleneck
Run load test to see current breaking point

Optimization Phase

Optimize Layer 1 (if 40%+ of time): Frontend, bundle size
Optimize Layer 2 (if 40%+ of time): API caching, compression, batching
Optimize Layer 3 (if 40%+ of time): Database indexes, N+1 fixes
Optimize Layer 4 (if other layers maxed): Infrastructure scaling
Measure impact after each change
Don’t over-optimize (diminishing returns)

Verification Phase

Load test reaches target throughput
P99 latency < target
No side effects (features still work)
Set up monitoring to track metric
Document changes (what and why)

/pb-observability - Set up monitoring to track performance metrics
/pb-review-hygiene - Code review for performance regressions
/pb-patterns-core - Architectural patterns that affect performance

Created: 2026-01-11 | Category: Planning | Tier: M/L

Keyboard shortcuts

Engineering Playbook