Architecture Decision Record (ADR)

Document significant architectural decisions to capture the context, alternatives considered, and rationale for future reference.

Why this matters: ADRs enforce /pb-preamble thinking (peer challenges, transparent reasoning) and apply /pb-design-rules (correct system design).

When you write an ADR:

Preamble: You must consider alternatives, document trade-offs explicitly, and explain reasoning so decisions can be challenged
Design Rules: Your architecture is guided by Clarity, Simplicity, Modularity, Extensibility-not arbitrary choices
Together: Better decisions that survive challenge and stand the test of time

Good ADRs show both: sound reasoning (preamble) and sound design (design rules).

Resource Hint: opus - Architectural decisions require deep trade-off analysis and long-term reasoning.

When to Write an ADR

Write an ADR when:

Choosing between multiple valid technical approaches
Adopting a new technology, library, or pattern
Making decisions that affect system architecture
Changing existing architectural patterns
Decisions that will be hard to reverse

Don’t write an ADR for:

Obvious implementation choices
Temporary workarounds (document differently)
Decisions that can easily be changed later

ADR Template

Create ADR files at: docs/adr/NNNN-title-with-dashes.md

# ADR-NNNN: [Title]

**Date:** YYYY-MM-DD
**Status:** [Proposed | Accepted | Deprecated | Superseded by ADR-XXXX]
**Deciders:** [Names/roles involved]

## Context

[What is the issue we're addressing? What forces are at play?
Include technical constraints, business requirements, and team context.
Be specific about the problem, not the solution.]

## Decision

[What is the change we're proposing and/or doing?
State the decision clearly and directly.]

## Alternatives Considered

### Option A: [Name]
[Brief description]

**Pros:**
- [Pro 1]
- [Pro 2]

**Cons:**
- [Con 1]
- [Con 2]

### Option B: [Name]
[Brief description]

**Pros:**
- [Pro 1]

**Cons:**
- [Con 1]

### Option C: [Name] (Selected)
[Brief description]

**Pros:**
- [Pro 1]
- [Pro 2]

**Cons:**
- [Con 1]

## Rationale

[Why did we choose this option over the others?
What were the deciding factors?
What trade-offs are we accepting?]

## Consequences

**Positive:**
- [Benefit 1]
- [Benefit 2]

**Negative:**
- [Drawback 1]
- [Drawback 2]

**Neutral:**
- [Side effect that's neither good nor bad]

## What's Intentionally Not Here

[Document what you deliberately chose NOT to build, support, or include - and why.
This prevents future engineers from re-proposing rejected ideas without context.
Each exclusion should have a reason.]

- [Excluded approach/feature]: [Why it was rejected]
- [Excluded approach/feature]: [Why it was rejected]

## Implementation Notes

[Any specific implementation guidance.
Things to watch out for.
Migration steps if applicable.]

## References

- [Link to relevant docs, issues, or discussions]
- [Related ADRs]

ADR Numbering

Use sequential 4-digit numbers:

0001-initial-architecture.md
0002-database-selection.md
0003-authentication-strategy.md

Example ADR

# ADR-0015: Self-Hosted Fonts Instead of Google Fonts

**Date:** 2026-01-04
**Status:** Accepted
**Deciders:** Engineering team

## Context

The application uses multiple custom fonts for different themes. Currently loading
from Google Fonts CDN, which introduces:
- External dependency and privacy concerns
- Render-blocking requests
- FOUT (Flash of Unstyled Text) on slow connections

Performance audits show font loading accounts for 400ms+ of blocking time.

## Decision

Self-host all fonts using @fontsource packages. Implement lazy loading for
theme-specific fonts.

## Alternatives Considered

### Option A: Keep Google Fonts
**Pros:** Zero maintenance, CDN caching
**Cons:** Privacy, render-blocking, external dependency

### Option B: Self-host with preload all
**Pros:** No external dependency, control over loading
**Cons:** Large initial payload, wasted bandwidth for unused themes

### Option C: Self-host with lazy loading (Selected)
**Pros:** Control over loading, minimal initial payload, load only what's needed
**Cons:** Slight complexity in implementation

## Rationale

Option C provides the best balance: eliminates external dependency while
minimizing payload through lazy loading of theme-specific fonts.

## Consequences

**Positive:**
- 87% reduction in render-blocking time
- No external dependencies
- Privacy-friendly (no Google tracking)

**Negative:**
- Slightly larger bundle (fonts in assets)
- Need to update fonts manually

## Implementation Notes

- Critical fonts (Inter, Noto Serif Devanagari) preloaded
- Theme fonts loaded on theme selection
- Font files in `/public/fonts/`

Example ADRs (Additional)

Example 2: Database Selection (PostgreSQL vs MongoDB)

# ADR-0001: PostgreSQL for Primary Database

**Date:** 2026-01-05
**Status:** Accepted
**Deciders:** Engineering team, Tech lead

## Context

Building a new SaaS application. Need to select primary data store for user accounts, billing,
and product data. Team has experience with both SQL and NoSQL. Requirements:
- Strong consistency (financial transactions)
- Complex queries across related data
- ACID transactions required
- Expected growth: 100M+ records over 5 years

## Decision

Use PostgreSQL as primary database. Use Redis for caching and sessions.

## Alternatives Considered

### Option A: PostgreSQL (Selected)
**Pros:**
- ACID guarantees for transactions
- Complex queries with JOINs
- Strong consistency
- Mature tooling and libraries
- Battle-tested at scale

**Cons:**
- Requires schema design upfront
- Vertical scaling limitations (horizontal scaling complex)
- Not ideal for unstructured data

### Option B: MongoDB
**Pros:**
- Flexible schema (iterate quickly)
- Built-in horizontal scaling
- Good for unstructured data
- Document-oriented (natural data model for some use cases)

**Cons:**
- Eventual consistency (problematic for financial data)
- Complex transactions until v4.0+
- Higher memory footprint
- Harder to query across documents

### Option C: Multi-database (PostgreSQL + MongoDB)
**Pros:**
- Best of both worlds
- Flexibility by data type

**Cons:**
- Operational complexity
- Data sync challenges
- Increased maintenance burden

## Rationale

Financial data (billing, subscriptions, payments) demands ACID guarantees. Complex reporting
queries (user analytics, revenue reports) benefit from SQL. PostgreSQL's maturity and
proven scaling strategies at companies like Stripe, Pinterest, Instagram make it the best fit.

## Consequences

**Positive:**
- Data integrity guaranteed
- Complex queries fast and efficient
- Excellent ecosystem (ORMs, migration tools, monitoring)
- Smaller operational footprint than MongoDB

**Negative:**
- Schema migrations required when data model changes
- Developers must think about schema design upfront
- Scaling read load requires replication setup

**Neutral:**
- Network latency same as MongoDB for single-node setup

## Implementation Notes

- Use connection pooling (PgBouncer) from day 1
- Set up read replicas before launch for analytics queries
- Configure backup strategy (WAL archiving, pg_basebackup)
- Monitor table bloat and run VACUUM regularly
- Use indexes strategically (query plans matter)

Example 3: Authentication Strategy (JWT vs OAuth2 vs Session-based)

# ADR-0002: JWT with Refresh Tokens for Authentication

**Date:** 2026-01-07
**Status:** Accepted
**Deciders:** Engineering team, Security lead

## Context

Building SPA (React) + mobile app (iOS/Android) + backend. Need stateless authentication
that works across multiple clients. Requirements:
- Support web, iOS, Android clients
- Stateless backend (can scale horizontally)
- Secure token revocation (logout)
- Standard industry practice

## Decision

Use JWT (JSON Web Tokens) with refresh token rotation. Short-lived access tokens (15 min),
longer-lived refresh tokens (7 days) with rotation on each refresh.

## Alternatives Considered

### Option A: Session-based (traditional)
**Pros:**
- Simple to understand
- Easy token revocation
- Built-in CSRF protection (when using cookies)
- Server controls session lifetime

**Cons:**
- Requires server-side session storage
- Doesn't scale well horizontally (session affinity needed or shared store)
- Poor mobile experience (cookies not ideal)
- Logout requires server cleanup

### Option B: JWT without refresh tokens
**Pros:**
- Stateless, scales horizontally
- Works great for mobile/SPA

**Cons:**
- Long token lifetime = security risk if token stolen
- Can't revoke tokens (except via blacklist, defeating statelessness)
- Logout doesn't actually log you out (token still valid)

### Option C: JWT with refresh tokens (Selected)
**Pros:**
- Stateless backend (scales horizontally)
- Secure: access token short-lived, refresh token rotated
- Logout works (invalidate refresh token)
- Works for web, mobile, SPA
- Standard industry practice

**Cons:**
- More complex than simple sessions
- Requires client-side refresh token storage (secure HttpOnly cookie recommended)
- Extra network call when token expires

## Rationale

Refresh token rotation provides security benefits of short-lived tokens without
logout UX issues. Industry standard used by Auth0, Firebase, AWS Cognito.

## Consequences

**Positive:**
- Horizontal scaling without session store
- Logout is instant (revoke refresh token)
- Security: token theft has limited window
- Mobile-friendly

**Negative:**
- Slightly more implementation complexity
- Requires secure refresh token storage
- Extra API call on token refresh

**Neutral:**
- Network latency barely noticeable (typical 20-50ms refresh call)

## Implementation Notes

- Access token lifetime: 15 minutes (tradeoff between security and UX)
- Refresh token lifetime: 7 days
- Rotate refresh token on each use (new refresh token returned)
- Store refresh token in httpOnly, secure cookie (not localStorage)
- Include token fingerprint to prevent token reuse attacks
- Implement refresh token revocation list for logout

Example 4: Caching Strategy (Redis vs In-memory vs CDN)

# ADR-0003: Tiered Caching Strategy (CDN + Redis + In-memory)

**Date:** 2026-01-08
**Status:** Accepted
**Deciders:** Engineering team, Infrastructure team

## Context

Application serves millions of requests daily with 30% cache-able content (product data,
user profiles, configurations). Current approach (no caching) causes N+1 queries and
slow response times. Need to balance cost, complexity, and performance.

Requirements:
- <100ms p99 latency
- 50M+ requests/day
- Global users (US + EU)
- Cache invalidation must be reliable

## Decision

Implement three-tier caching:
1. CDN (CloudFront) for static assets and API responses
2. Redis for session data and frequently accessed objects
3. In-memory application cache for hot data

## Alternatives Considered

### Option A: Redis only
**Pros:**
- Simple to understand
- Works globally (with replication)

**Cons:**
- Extra network hop (vs in-memory)
- Database load on cache misses
- Single point of failure (high availability needed)
- Expensive at scale

### Option B: In-memory only
**Pros:**
- Fastest possible (no network)
- No operational overhead

**Cons:**
- Data lost on restart
- Doesn't work for distributed systems
- Cache invalidation complexity across instances
- Can't share session data across servers

### Option C: Tiered caching (Selected)
**Pros:**
- Best performance (hit CDN first, Redis second, in-memory third)
- Cost-effective (CDN is cheap for static content)
- Resilient (fallback if one layer fails)
- Scales to billions of requests

**Cons:**
- More complex (three systems to manage)
- Cache invalidation across layers
- Potential stale data issues

## Rationale

Real-world performance requires multiple cache layers. Netflix, Uber, Airbnb use similar
patterns. Each layer serves different purposes: CDN for geographic distribution, Redis
for shared state, in-memory for hot data.

## Consequences

**Positive:**
- P99 latency drops from 500ms to 50ms
- Reduced database load (70% hit rate)
- Global performance (CDN)
- Cost-effective at scale

**Negative:**
- Operational complexity (managing 3 systems)
- Cache invalidation harder to reason about
- Potential stale data (eventual consistency)

**Neutral:**
- Need to monitor cache hit rates separately

## Implementation Notes

### TTL Strategy
- CDN cache TTL: 1 hour for product data, 5 min for user data
- Redis TTL: 15 minutes
- In-memory TTL: 5 minutes

### Cache Invalidation Patterns

**Event-Driven Invalidation** (Recommended)
- On data change (create/update/delete), emit event
- Webhook or event stream triggers cache purge
- Pros: Immediate consistency, minimal stale data
- Cons: Requires event infrastructure
- Example: User updates profile → publish event → invalidate user cache in all layers

**Time-Based TTL** (Default Fallback)
- Cache expires naturally based on TTL
- Appropriate for data that's acceptable to be slightly stale
- No invalidation infrastructure needed
- Cons: Must tolerate eventual consistency

**Manual Invalidation** (For Emergencies)
- Admin API to force cache purge
- Used for critical fixes (security patches, data corrections)
- Explicit purge endpoints for sensitive data
- Never sole invalidation strategy

**Hybrid Approach** (Best Practice)
- Short TTL on frequently-changing data (5-15 min)
- Longer TTL on stable data (1 hour)
- Event-driven invalidation for critical changes
- Manual purge capability for emergencies

### Monitoring
- Cache hit rates (track per layer)
- Eviction rates (sign of undersized cache)
- Memory usage (Redis and in-memory)
- Invalidation latency (how quickly purges propagate)

Example 5: API Versioning Strategy (URL Path vs Header vs Media Type)

# ADR-0004: URL Path Versioning for Public APIs

**Date:** 2026-01-10
**Status:** Accepted
**Deciders:** Engineering team, Platform team

## Context

Public API used by 50+ third-party integrations and mobile apps. Need long-term
backwards compatibility (3-5 year minimum). Currently tracking 3 legacy API versions
in production. Team needs clear strategy for introducing breaking changes without
disrupting existing clients.

Requirements:
- Support 2-3 API versions simultaneously
- Clear client migration path
- Trackable version adoption
- Minimize API server complexity

## Decision

Use URL path versioning (/v1/, /v2/, /v3/). Maintain 2 major versions in production
at any time, deprecate oldest version 6 months after new version launch.

## Alternatives Considered

### Option A: URL Path Versioning (Selected)
**Pros:**
- Most explicit (version visible in URL)
- Easy to track usage (via logs/metrics)
- Different code paths for versions clear
- Browser-friendly (can test with URL bar)

**Cons:**
- URL pollution (endpoints duplicated across versions)
- Code duplication for compatibility
- Routing complexity in API framework

### Option B: Header-Based Versioning
**Pros:**
- Cleaner URLs
- Backward compatible (same URL serves multiple versions)

**Cons:**
- Version not visible in logs/monitoring by default
- Harder to test (requires setting headers)
- Client confusion (which version am I using?)

### Option C: Media Type Versioning
**Pros:**
- RESTful (follows HTTP semantics)
- Single URL for resource

**Cons:**
- Complex (custom media types like `application/vnd.myapi.v2+json`)
- Not widely used (client confusion)
- Requires Accept header understanding

## Rationale

URL path versioning is the most transparent for third-party integrations. Mobile and
web clients can easily see their API version in request logs. Team can deprecate versions
explicitly with clear migration timelines published 6 months in advance.

## Consequences

**Positive:**
- Clear version tracking (metrics, logs, monitoring)
- Explicit deprecation path (v1 → v2 → v3)
- Easy client communication (migrate by Jan 1, 2027)
- Different teams can own version-specific logic

**Negative:**
- Code duplication (shared logic extracted to internal modules)
- More endpoints to maintain and document
- Larger API surface area

**Neutral:**
- Routing slightly more complex (but manageable with versioned routers)

## Implementation Notes

- Use URL pattern: `/api/v1/users`, `/api/v2/users`
- Share business logic via internal modules (v1, v2 handlers call shared UserService)
- Version deprecation timeline: Support for 18 months after new version launch
- Announce deprecation 6 months in advance
- Provide automated migration guide (v1 → v2 breaking changes)
- Feature flags for gradual rollout of v2 endpoints

ADR Lifecycle

Proposed → Accepted → [Active]
                   ↓
              Deprecated (no longer applies)
                   or
              Superseded (replaced by new ADR)

When superseding:

Create new ADR with updated decision
Update old ADR status to “Superseded by ADR-XXXX”
Reference old ADR in new ADR’s context

Directory Structure

docs/
└── adr/
    ├── 0001-initial-architecture.md
    ├── 0002-database-selection.md
    ├── 0003-authentication-strategy.md
    ├── ...
    └── README.md  # Index of all ADRs

ADR Index Template

# Architecture Decision Records

| ADR | Title | Status | Date |
|-----|-------|--------|------|
| [0001](0001-initial-architecture.md) | Initial Architecture | Accepted | 2025-01-01 |
| [0002](0002-database-selection.md) | PostgreSQL for Primary Database | Accepted | 2025-01-05 |

Tips for Good ADRs

Write in present tense - “We decide” not “We decided”
Be specific - Vague context leads to vague decisions
Include alternatives - Shows you considered options
State trade-offs - No decision is perfect, acknowledge downsides
Keep it concise - 1-2 pages max
Link to context - Reference issues, PRs, discussions

/pb-plan - Planning workflow that may generate ADRs
/pb-sketch - Decision forks often become ADRs once resolved
/pb-think - Deep analysis for complex architectural decisions
/pb-design-rules - Design principles that inform ADR decisions
/pb-patterns-core - Reference patterns when documenting alternatives

Decisions as code. Future you will thank present you.

Keyboard shortcuts

Engineering Playbook