Core Architecture & Design Patterns

Proven solutions to recurring problems. Patterns speed up design and prevent mistakes.

Purpose

Patterns:

Accelerate design: Don’t solve the same problem twice
Share knowledge: Common vocabulary for discussion
Prevent mistakes: Patterns have gotchas documented
Improve quality: Use proven solutions, not experimental ones
Enable communication: “Let’s use the retry pattern” means something

Mindset: Every pattern has trade-offs. Use /pb-preamble thinking (challenge assumptions, surface costs) and /pb-design-rules thinking (does this pattern serve Clarity, Simplicity, Modularity?).

Challenge whether this pattern is the right fit for your constraints. Surface the actual costs. Understand the alternatives. A pattern is a starting point, not a law.

Resource Hint: sonnet - Pattern reference and application; implementation-level design decisions.

When to Use Patterns

Use patterns when:

Problem is common (many projects have this issue)
Solution is proven (multiple implementations work well)
Trade-offs are understood (know pros/cons)
Context fits (pattern matches your system)

Don’t use patterns when:

Problem is unique (no precedent)
Pattern seems forced (doesn’t fit naturally)
Simple solution exists (YAGNI - You Aren’t Gonna Need It)
System is too small (overkill)

Architectural Patterns

Pattern: Service-Oriented Architecture (SOA)

Problem: Monolithic system is too big, scales badly, hard to test.

Solution: Break into independent services, each handling one thing.

Structure:

Monolith:
  [All code - Orders, Payments, Users, Inventory in one codebase]

SOA:
  [Order Service] ←→ [Payment Service]
       ↓ API calls
  [User Service] ←→ [Inventory Service]

How it works:

1. Each service owns its data (no shared database)
2. Services communicate via API (HTTP, gRPC, etc.)
3. Each service deployed independently
4. Each service has its own database

Example: E-commerce

- Order Service: Creates orders, tracks status
- Payment Service: Processes payments, refunds
- Inventory Service: Tracks stock, decrements
- User Service: Manages users, profiles
- Notification Service: Sends emails, SMS

Each service:
  - Has own database
  - Exposed via REST API
  - Deployed separately
  - Developed by own team

Pros:

Independent scaling (payment service under load? Scale just that)
Independent deployment (order service update doesn’t affect payments)
Technology flexibility (use Node for one, Python for another)
Clear boundaries (easy to understand what each does)

Cons:

Operational complexity (many services to manage)
Network latency (services talking over network)
Data consistency harder (each has own database)
Debugging harder (request spans multiple services)

When to use:

Team size > 10 people (each team owns a service)
Different parts scale differently (payments need more resources)
Different parts use different tech stacks
System is too large for one team

Gotchas:

1. "Too fine-grained services" - 20 services, each service per endpoint
   Bad: Too much operational overhead
   Good: 3-5 services, each service per business domain

2. "Synchronous everywhere" - Service A calls B calls C
   Bad: Slow, cascading failures
   Good: Async messaging (service A publishes event, B listens)

3. "Sharing databases" - All services use same DB
   Bad: Defeats purpose (tightly coupled)
   Good: Each service owns its data

Pattern: Event-Driven Architecture

Problem: Systems are tightly coupled (Order service must know about Payment service).

Solution: Services publish events, others listen. No direct coupling.

How it works:

Traditional (Tightly coupled):
  1. User submits order
  2. Order Service calls Payment Service
  3. Payment Service calls Inventory Service
  4. Inventory Service calls Notification Service

Problem: If Payment Service is slow, Order Service blocks

Event-Driven (Loosely coupled):
  1. User submits order
  2. Order Service creates order → publishes "order.created" event
  3. Payment Service listens, charges payment
  4. Inventory Service listens, decrements stock
  5. Notification Service listens, sends email

Benefit: Services don't know about each other

Technology:

Event bus: RabbitMQ, Kafka, AWS SNS/SQS, Google Pub/Sub
Event format: JSON events with type and data

Example event:

{
  "type": "order.created",
  "timestamp": "2026-01-11T14:30:00Z",
  "order_id": "order_123",
  "customer_id": "cust_456",
  "items": [
    {"product_id": "prod_1", "quantity": 2}
  ],
  "total": 99.99,
  "version": 1
}

Note: Include version field for event versioning (critical for schema evolution)

Service subscribing:

eventBus.subscribe('order.created', async (event) => {
  console.log(`Processing order ${event.order_id}`);

  // Decrement inventory
  await inventoryService.decrementStock(event.items);

  // Publish event for others
  await eventBus.publish('inventory.updated', {
    order_id: event.order_id,
    status: 'decremented'
  });
});

Pros:

Loose coupling (services don’t know about each other)
Scalable (can add listeners without changing publisher)
Resilient (if one service is slow, doesn’t block others)
Debuggable (event history is audit trail)

Cons:

Harder to debug (request spans multiple services asynchronously)
Eventual consistency (order created, payment might fail later)
Operational complexity (need event broker)
Ordering challenges (events might arrive out of order)

Gotchas:

1. "Event published but nobody listening"
   Bad: Event disappears, nobody processes it
   Good: Monitor for unprocessed events, alert if missing listeners

2. "Event processed twice"
   Bad: Payment processed twice, customer charged twice
   Good: Idempotent processing (processing same event twice = safe)

3. "No ordering guarantees"
   Bad: "order.created" arrives before "order.confirmed"
   Good: Listeners handle events arriving in any order

Resilience Patterns

See /pb-patterns-resilience for Retry, Circuit Breaker, Rate Limiting, Cache-Aside, and Bulkhead patterns – defensive patterns for making systems reliable under failure.

Data Access Patterns

Pattern: Repository Pattern

Problem: Data access code scattered everywhere. Hard to test. Hard to change database.

Solution: Central place for data access. All queries go through repository.

Structure:

Without Repository:
  User Service → SQL queries directly → Database
  Order Service → SQL queries directly → Database
  (Duplication, hard to test)

With Repository:
  User Service → User Repository → Database
  Order Service → Order Repository → Database
  (Centralized, easy to test)

Example:

class UserRepository:
    def __init__(self, db):
        self.db = db

    def get_by_id(self, user_id):
        """Get user by ID."""
        return self.db.query("SELECT * FROM users WHERE id = ?", user_id)

    def create(self, email, name):
        """Create new user."""
        result = self.db.execute(
            "INSERT INTO users (email, name) VALUES (?, ?)",
            email, name
        )
        return result.lastrowid

    def update(self, user_id, email=None, name=None):
        """Update user."""
        if email:
            self.db.execute("UPDATE users SET email = ? WHERE id = ?", email, user_id)
        if name:
            self.db.execute("UPDATE users SET name = ? WHERE id = ?", name, user_id)

    def delete(self, user_id):
        """Delete user."""
        self.db.execute("DELETE FROM users WHERE id = ?", user_id)

# Usage
repo = UserRepository(db)
user = repo.get_by_id(123)
repo.update(123, name="New Name")

Benefits:

Centralized data access (one place to change queries)
Easy to test (mock repository for unit tests)
Easy to swap databases (change repository, not whole app)
Consistency (same query patterns everywhere)

Pattern: DTO (Data Transfer Object)

Problem: Return database object directly. If database schema changes, API breaks.

Solution: Create separate object for API responses. API only returns DTOs.

How it works:

Without DTO (Tight coupling):
  Database: user {id, email, password_hash, created_at, updated_at}
  API returns entire user object
  Client sees password_hash (security issue!)
  Schema change breaks API

With DTO (Loose coupling):
  Database: user {id, email, password_hash, created_at, updated_at}
  API: class UserDTO {id, email, name}
  API returns only DTO fields
  Schema changes, API unchanged

Example:

# Database model (has extra fields)
class User:
    id: int
    email: str
    password_hash: str  # Don't expose!
    created_at: datetime
    updated_at: datetime
    last_login: datetime

# API DTO (only expose necessary)
class UserDTO:
    id: int
    email: str
    name: str

# API endpoint
@app.get("/users/{user_id}")
def get_user(user_id: int):
    user = db.query(User).filter(User.id == user_id).first()

    # Convert to DTO
    dto = UserDTO(
        id=user.id,
        email=user.email,
        name=user.name
    )

    return dto  # Only return DTO, not User object

Benefits:

Security (don’t expose internal fields)
Flexibility (database schema ≠ API contract)
Clarity (API shows exactly what’s available)

API Design Patterns

See /pb-patterns-api for API design patterns including Pagination, Versioning, REST, GraphQL, and gRPC.

Integration Patterns

Pattern: Strangler Fig Pattern

Problem: Have old system, want to replace with new one. Can’t rewrite everything at once.

Solution: New system gradually takes over. Old and new run together.

How it works:

Phase 1: Build new system alongside old
  Requests → Old System (still handling everything)
            → New System (not used yet)

Phase 2: Migrate one thing at a time
  Requests → Router → New System (for payments)
                   → Old System (for everything else)

Phase 3: Keep migrating
  Requests → Router → New System (for payments, orders)
                   → Old System (for legacy parts)

Phase 4: Remove old system when everything migrated
  Requests → New System (complete replacement)

Benefits:

No downtime (systems run in parallel)
Gradual migration (low risk)
Ability to rollback (old system still there)
Real traffic testing (new system handles real requests)

Antipatterns: When Patterns Fail

Patterns are powerful but can backfire. Learn from failures.

SOA Gone Wrong: Too Many Services

What happened: Uber’s early architecture (2009-2011)

Decision: "Decompose everything into services"
Result: 200+ services, too fine-grained

Problems:
- Service discovery nightmare (which service talks to which?)
- Testing hell (integration tests spanning 200 services)
- Deployment chaos (coordinating 200 deploys)
- Latency spikes (request spans 15 services)
- Ops complexity (200 services to monitor)

Lesson:
  Services should map to business domains, not functions
  Keep manageable: 3-10 services per team
  Not every function deserves its own service

Event-Driven Gone Wrong: Ordering Problems

What happened: Payment system with async events

Expected:
  1. order.created
  2. payment.processed
  3. order.confirmed

What actually happened:
  1. payment.processed ← arrived first!
  2. order.created
  3. order.confirmed

Why:
  Different services publish events asynchronously
  Network jitter (payment response faster)
  Message broker delays

Problem:
  Processing payment for order that doesn't exist
  Orphaned payments (no matching order)
  Data inconsistency

Lesson:
  Design events to handle out-of-order arrival
  Use idempotent processing (same event twice = safe)
  Add timestamp/sequence numbers to events

Repository Pattern Gone Wrong: Over-Abstraction

What happened: Repository for every entity

Result: 50+ Repository classes, all similar
  class UserRepository { ... }
  class AddressRepository { ... }
  class PaymentRepository { ... }
  ... 47 more ...

Problems:
- Boilerplate explosion
- Hides details under abstraction
- Over-generalized
- Slow to change (modify 50 files)

Lesson:
  Use Repository for complex entities
  Simple queries? Direct database calls are fine
  Patterns are tools, not dogma
  Sometimes simple > abstract

Pattern Interactions: How Patterns Work Together

Real systems combine multiple patterns. Understanding how they interact prevents conflicts.

Example: E-Commerce Order Processing

Architectural Level:

SOA: Separate Order, Payment, Inventory services
Event-Driven: Services communicate via events (not direct calls)

Service Internal Level:

Repository Pattern: Data access layer in each service
Cache-Aside: Redis cache in front of database
Connection Pooling: Database connection reuse

Communication Level:

Retry with Backoff: Retry failed calls to other services
Circuit Breaker: Stop calling failed service for a time
Bulkhead: Thread pool per service prevents resource starvation

Data Level:

DTO: API returns only public fields
Pagination: List endpoints return pages, not all records

System Design:

User Request
  ↓
API Gateway (Rate limiting, auth)
  ↓
[Order Service]
  • Repository for data access
  • Cache-Aside for product cache
  • Connection pool for DB
  ↓
[Event: order.created]
  ↓
Payment Service (Circuit Breaker)
  • Retry with backoff on failure
  • Bulkhead prevents thread exhaustion
  ↓
[Event: payment.processed] OR [Event: payment.failed]
  ↓
Inventory Service
  • Same pattern repetition
  ↓
[Event: order.completed]
  ↓
Notification Service
  • Job queue for emails (don't block response)

For resilience pattern interactions (Circuit Breaker + Retry, Cache-Aside + Bulkhead), see /pb-patterns-resilience.

SOA + Event-Driven + Saga Pattern

Real-World Scenario: Payment Processing

Service A (Order Service):
  Receives order
  Publishes: "payment_required"
  State: AWAITING_PAYMENT

Service B (Payment Service):
  Listens: "payment_required"
  Attempts payment with Retry + Circuit Breaker
  If success: Publishes "payment_received"
  If failure after retries: Publishes "payment_failed"

Service A (compensation):
  Listens: "payment_failed"
  Performs compensating action: Cancel order

Service C (Inventory):
  Listens: "payment_received"
  Decrements stock with Repository pattern
  Publishes: "stock_decremented"

DTO + Pagination + API Versioning

For Pagination and Versioning details, see /pb-patterns-api.

Real-World API Response

Old API (v1):
GET /users?page=1&per_page=20
{
  "users": [{id, email, password_hash, created_at, ...}],
  "page": 1,
  "per_page": 20,
  "total": 523
}

New API (v2, with DTO):
GET /v2/users?page=1&per_page=20
{
  "data": [{id, email, name}],  // DTO, no password_hash
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 523,
    "has_next": true
  }
}

Benefits:
- DTO: Security (password_hash not exposed)
- Pagination: Prevents huge responses
- Versioning: Can change API without breaking v1 clients

When to Apply Patterns

Too many patterns:

[NO] Every new problem → find a pattern
[NO] Using Strangler Fig, Event-Driven, Microservices, Circuit Breaker, etc.
[NO] System is complex to understand

Right amount of patterns:

[YES] Use patterns for recurring problems
[YES] Only when simpler solution doesn't work
[YES] Understand pattern before using it
[YES] Document why pattern was chosen

Pattern checklist:

☐ Problem is common (not unique to this system)
☐ Pattern is proven (multiple successful implementations)
☐ Context fits (system matches pattern requirements)
☐ Trade-offs understood (know pros and cons)
☐ Simpler solution tried (patterns are last resort)
☐ Team understands (can maintain, debug, extend)

Integration with Playbook

Pattern Family: This is the core patterns command. It covers foundational architectural, design, data access, and API patterns.

Related Pattern Commands (Pattern Family):

/pb-patterns-async - Async patterns (callbacks, promises, async/await, reactive, workers, job queues)
/pb-patterns-db - Database patterns (connection pooling, optimization, replication, sharding)
/pb-patterns-distributed - Distributed patterns (saga, CQRS, eventual consistency, 2PC)

How They Work Together:

pb-patterns-core → Foundation (SOA, Event-Driven, Repository, DTO, Strangler Fig)
    ↓
pb-patterns-async → Async operations (implement Event-Driven, job queues)
    ↓
pb-patterns-db → Database implementation (pooling for performance)
    ↓
pb-patterns-distributed → Multi-service coordination (saga, CQRS)

Architecture & Design Decision:

/pb-adr - Document why specific patterns chosen
/pb-guide - System design and pattern selection
/pb-deployment - How patterns affect deployment strategy

Testing & Operations:

/pb-security - Security patterns and secure code
/pb-performance - Performance optimization using patterns
/pb-testing - Testing pattern implementations
/pb-incident - Handling pattern failures

/pb-patterns-resilience - Resilience patterns (Retry, Circuit Breaker, Rate Limiting, Cache-Aside, Bulkhead)
/pb-patterns-async - Async patterns for non-blocking operations
/pb-patterns-db - Database patterns for data access
/pb-patterns-distributed - Distributed patterns for multi-service coordination
/pb-adr - Document pattern selection decisions

Created: 2026-01-11 | Category: Architecture | Tier: L

Keyboard shortcuts

Engineering Playbook