Backend Review: Infrastructure & Reliability Focus

Multi-perspective code review combining Alex Chen (Infrastructure & Resilience) and Jordan Okonkwo (Testing & Reliability) expertise.

When to use: Backend features, API endpoints, services, database operations, infrastructure changes.

Resource Hint: opus - Systems thinking + gap detection. Parallel execution of both agents recommended.

How This Works

Two expert perspectives review in parallel, then synthesize:

Alex’s Review - Infrastructure lens
- What could fail? How do we recover?
- Graceful degradation. Systems thinking. Observability.
- Does this scale? Can we deploy it safely?
Jordan’s Review - Reliability lens
- What gaps exist in testing? What could go wrong?
- Error cases. Edge cases. Concurrency. Data integrity.
- Would tests catch production bugs?
Synthesize - Combined perspective
- Identify trade-offs (resilience vs complexity?)
- Surface disagreements (if any)
- Recommend approval or revisions

Alex’s Infrastructure Review

See /pb-alex-infra for the comprehensive infrastructure review framework and checklist.

For backend-specific review, focus on:

Failure Modes: What database/service failures could cascade? How quickly detected?
Graceful Degradation: If DB is slow, does API hang or return cached data?
Deployment Safety: Is rollout gradual? Can rollback happen in < 5 minutes?
Observability: Do logs include request context? Are metrics collected?
Capacity Planning: Are database connection limits set? Load tested?

Alex’s Red Flags for Backend:

No health checks on database connections
Single point of failure in service architecture
Manual recovery process (can’t auto-rollback)
No monitoring of critical database queries

Jordan’s Testing Review

See /pb-jordan-testing for the comprehensive testing review framework and checklist.

For backend-specific review, focus on:

Error Path Testing: Are timeouts, connection failures, and database errors tested?
Concurrency & Race Conditions: Are async handlers tested under load? Shared state mutations safe?
Data Invariants: Are database constraints enforced? Could data corruption happen?
Integration Testing: Are real database queries tested (not just mocks)? Connection pooling validated?
Gap Detection: What edge cases could cause production bugs? What’s untested?

Jordan’s Red Flags for Backend:

Only happy path tested; error cases ignored
All database calls mocked; real queries never executed
No concurrency testing for async handlers
Data invariants undocumented or untested

Combined Perspective: Backend Review Synthesis

When Alex & Jordan Agree:

✅ Infrastructure is sound AND tests are comprehensive
✅ Approve for merging

When They Disagree: Common disagreement: “Should this be async or sync?”

Alex says: “Async is more resilient (decouples services)”
Jordan says: “Async is harder to test (race conditions)”
Resolution: Design for testability first; if tests can’t verify it, don’t do it.

Trade-offs to Surface:

Complexity vs Resilience
- More resilient = more complex
- More complex = more to test
- Find the sweet spot
Speed of Recovery vs Prevention
- Prevent all failures = expensive
- Recover quickly from failures = cost-effective
- Alex leans toward recovery; Jordan toward prevention
Coverage vs Diminishing Returns
- Perfect test coverage costs time
- 80% coverage catches 90% of bugs
- Know your stopping point

Review Checklist

Before Review Starts

Self-review already completed (author did /pb-cycle step 1-2)
Quality gates passed (lint, type check, tests all pass)
PR description explains what and why

During Alex’s Review

Failure modes identified
Observability sufficient
Deployment plan is safe
Graceful degradation considered

During Jordan’s Review

Tests cover critical paths
Error handling is tested
Edge cases considered
No race conditions

After Both Reviews

Feedback synthesized
Trade-offs explained
Blockers identified or cleared
Approval given (or revisions requested)

Review Decision Tree

1. Does infrastructure design pass Alex's review?
   NO → Ask for infrastructure changes before testing review
   YES → Continue

2. Does testing pass Jordan's review?
   NO → Ask for test changes (or architecture changes if tests can't isolate)
   YES → Continue

3. Are there trade-off disagreements?
   YES → Discuss (often both perspectives are right)
   NO → Continue

4. Is code ready to merge?
   YES → Approve
   NO → Request specific revisions

Example: Payment Service Review

Code Being Reviewed: New payment processing API

Alex’s Review:

Infrastructure Check:

❌ Problem: No retry logic for payment processor failures
❌ Problem: No health check for payment service
✅ Good: Database transactions are atomic
✅ Good: Deployment is gradual

Alex’s Recommendation: Add retry logic with exponential backoff. Add health check.

Jordan’s Review:

Testing Check:

❌ Problem: Only tests success case
❌ Problem: No test for network timeout
✅ Good: Concurrency is tested
✅ Good: Data invariants verified

Jordan’s Recommendation: Add tests for payment processor down, network timeout, invalid card response.

Synthesis:

Trade-off Identified: Retry logic adds complexity. Do tests verify it correctly?

If yes: Implement with tests
If no: Simplify retry logic until tests can verify it

Approval: Conditional on both changes.

Comment Register

Findings posted as PR/issue comments follow ~/.claude/CLAUDE.md § GitHub Artifact Register: one load-bearing observation per comment, one sentence per finding, no narration or severity adjectives.

/pb-review-code – General code review framework both agents apply
/pb-review-infrastructure – Infrastructure-focused review for backend dependencies
/pb-alex-infra – Alex’s deep dive: systems thinking, failure modes, resilience
/pb-jordan-testing – Jordan’s deep dive: gap detection, test coverage, reliability

When to Escalate

Escalate to Linus (Security) if:

Code handles payment, authentication, PII, or secrets
Protocol/cryptography choices need validation
Authorization boundaries need review

Escalate to Maya (Product) if:

API design affects user experience
Feature scope is unclear or growing
Product implications uncertain

Escalate to Sam (Documentation) if:

API needs clear documentation
Complex system needs architecture explanation
Knowledge transfer is important

Backend review: Infrastructure that doesn’t fail + tests that prove it

Keyboard shortcuts

Engineering Playbook