Logging Strategy & Standards
Comprehensive guidance for designing effective logging that aids troubleshooting without creating noise.
Principle: Good logging embodies /pb-preamble thinking (reveal assumptions, surface problems) and /pb-design-rules thinking (especially Transparency and Silence: systems should be observable when important, quiet otherwise).
Logs must invite scrutiny. They should reveal assumptions and make failures obvious, not hide them with verbosity or silence.
Resource Hint: sonnet - logging standards review is structured and pattern-based
When to Use
- Setting up logging for a new service or module
- Reviewing logging practices during code review
- Investigating noisy or insufficient logs in production
- Standardizing logging across a codebase
Purpose
Logging is critical for observability in production. This guide helps you:
- Determine appropriate log levels for different events
- Eliminate redundant and noisy logs
- Ensure logs are actionable and context-rich
- Standardize logging across your codebase
- Verify security and compliance requirements
Log Levels: When to Use Each
DEBUG
Use for: Detailed troubleshooting information
logger.debug("Entered function process_order()", extra={"user_id": 123})
logger.debug("Query took 45ms", extra={"query": "SELECT ...", "rows": 50})
logger.debug("Cache hit for key: user_profile_123")
Characteristics:
- Enabled only during development or when investigating specific issues
- Includes variable values, loop iterations, internal state
- Should not be logged to production by default (configure via log level)
Pitfalls:
- Not useful in production (logging is disabled anyway)
- Creates noise if left at DEBUG level unnecessarily
INFO
Use for: Important business events and state changes
logger.info("User registered", extra={"user_id": 456, "email": "user@example.com"})
logger.info("Order created", extra={"order_id": "ORD-789", "customer_id": 456, "total": 99.99})
logger.info("Payment processed successfully", extra={"payment_id": "PAY-123", "amount": 99.99})
logger.info("Job completed", extra={"job_id": 999, "duration_ms": 5000, "status": "success"})
Characteristics:
- Visible in production
- Tracks user-visible actions and business events
- Includes IDs and relevant context
- Follows “Verb + noun + context” pattern
Pitfalls:
- “Processing user” - too vague
- “Got here” - non-actionable
- “User registration initiated” - clear and actionable
WARNING
Use for: Recoverable problems and unexpected but handled situations
logger.warning("Slow database query detected", extra={
"query_ms": 2500,
"threshold_ms": 1000,
"query": "SELECT ... FROM orders WHERE customer_id = ?"
})
logger.warning("External service degraded, retrying", extra={
"service": "payment_provider",
"retry_count": 2,
"timeout_ms": 5000
})
logger.warning("Cache miss spike detected", extra={
"miss_rate": 0.45,
"threshold": 0.20,
"duration_sec": 60
})
Characteristics:
- Indicates something unexpected happened but the system recovered
- Usually indicates fallback behavior
- Includes metrics or context for investigation
Pitfalls:
- Warning for every retried request (too noisy)
- Warning for expected rate limit responses (should be INFO if handled)
- Warning for unusual patterns: slow queries, high error rates
ERROR
Use for: Genuine error conditions that need attention
logger.error("Failed to charge payment", extra={
"payment_id": "PAY-456",
"reason": "Card declined",
"error_code": "card_declined",
"stack_trace": "..." # Include only if helpful for root cause
})
logger.error("Database connection failed", extra={
"host": "db.prod.example.com",
"error": "connection timeout",
"timeout_ms": 5000,
"attempt": 3
})
Characteristics:
- Operation failed; action is required
- Include enough context to investigate without access to customer data
- Include error codes, error messages, and relevant context
- Stack traces helpful only for unexpected errors
Critical:
- Never log passwords, API keys, PII, or sensitive data
- Log error codes and codes that help identify the issue
CRITICAL
Use for: System-wide failures requiring immediate action
logger.critical("Database unavailable - all requests failing", extra={
"service": "primary_database",
"status": "connection_refused",
"impact": "total_outage"
})
logger.critical("Authentication service down", extra={
"service": "auth_service",
"response_code": 503,
"health_check": "failed"
})
Characteristics:
- System is down or severely degraded
- Triggers page/alert to on-call
- Should be rare (aim for < 1 per month)
Pitfalls:
- Using CRITICAL for issues that only affect one user
- Using CRITICAL only for platform-wide outages
Common Logging Patterns
Authentication & Authorization
# [YES] Good: Log security events without exposing credentials
logger.info("User login successful", extra={
"user_id": 789,
"login_method": "email_password",
"ip_address": "203.0.113.42"
})
logger.warning("Failed login attempt", extra={
"email": "user@example.com", # OK to log email, not password
"attempt": 3,
"reason": "invalid_password"
})
logger.error("Account locked after failed attempts", extra={
"user_id": 789,
"failed_attempts": 5,
"lockout_duration_min": 30
})
# [NO] Bad: Logging credentials
logger.debug("Login attempt", extra={"username": "user@example.com", "password": "secret123"})
External Service Calls
# [YES] Good: Log request, response, and timing
logger.info("Payment service called", extra={
"service": "stripe",
"method": "charge",
"amount": 99.99,
"request_id": "req_123abc"
})
logger.warning("Payment service slow", extra={
"service": "stripe",
"latency_ms": 3500,
"timeout_ms": 5000
})
logger.error("Payment service error", extra={
"service": "stripe",
"status_code": 500,
"error_message": "Internal Server Error",
"request_id": "req_123abc"
})
Database Operations
# [YES] Good: Log queries that matter
logger.info("Order created in database", extra={
"order_id": "ORD-999",
"customer_id": 456,
"items_count": 3
})
logger.warning("Slow database query", extra={
"query": "SELECT * FROM orders ...",
"duration_ms": 2000,
"rows_returned": 50000
})
# [NO] Bad: Logging every SELECT (creates noise)
logger.debug("SELECT user WHERE id = 123")
logger.debug("SELECT orders WHERE customer_id = 456")
Job/Task Processing
# [YES] Good: Log job lifecycle
logger.info("Background job started", extra={
"job_id": 999,
"job_type": "send_email",
"user_id": 456
})
logger.info("Background job completed", extra={
"job_id": 999,
"duration_ms": 5000,
"status": "success"
})
logger.error("Background job failed", extra={
"job_id": 999,
"error": "SMTP connection timeout",
"retries_remaining": 2,
"retry_after_sec": 60
})
Structured Logging Best Practices
Consistent Format
# [YES] Good: JSON structured logging
import json
import logging
class JSONFormatter(logging.Formatter):
def format(self, record):
return json.dumps({
"timestamp": self.formatTime(record),
"level": record.levelname,
"message": record.getMessage(),
"logger": record.name,
**record.extra if hasattr(record, 'extra') else {}
})
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
# Usage:
logger.info("User registered", extra={
"user_id": 123,
"email": "user@example.com"
})
Include Correlation IDs (Microservices)
import uuid
from contextvars import ContextVar
correlation_id_var: ContextVar[str] = ContextVar('correlation_id', default='')
def log_with_correlation(message, level, **context):
"""Log with automatic correlation ID for request tracing."""
context['correlation_id'] = correlation_id_var.get()
logger.log(level, message, extra=context)
# Middleware to set correlation ID
def correlation_id_middleware(request):
correlation_id = request.headers.get('X-Correlation-ID', str(uuid.uuid4()))
correlation_id_var.set(correlation_id)
return request
Context and Exception Handling
# [YES] Good: Include exception context
try:
process_payment(order)
except PaymentError as e:
logger.error("Payment processing failed", extra={
"order_id": order.id,
"error_code": e.code,
"error_message": str(e),
"exception": type(e).__name__
})
raise
Log Level Configuration by Environment
Development
DEBUG: All levels enabled (catch all issues early)
Staging
INFO: Business events only (monitor production-like behavior)
WARNING: Unusual patterns
ERROR: Failed operations
CRITICAL: System failures
Production
INFO: Business events (user actions, transactions)
WARNING: Unexpected conditions (slow requests, retries)
ERROR: Failed operations (requires investigation)
CRITICAL: System outages (page on-call)
DEBUG: Disabled (logs to /dev/null)
Configuration Example (Python):
import os
import logging
log_level = os.getenv('LOG_LEVEL', 'INFO')
logging.basicConfig(level=getattr(logging, log_level))
# Specific module log levels
logging.getLogger('vendor_library').setLevel(logging.WARNING) # Less verbose for 3rd party
logging.getLogger('myapp.payment').setLevel(logging.DEBUG) # More verbose for critical
Common Issues & Fixes
Problem: “Log Bombing” - Too Many Logs
[NO] Example:
for user_id in user_ids:
logger.info(f"Processing user {user_id}") # Logs 1000 times!
logger.info(f"Fetched data for user {user_id}")
logger.info(f"Updated database for user {user_id}")
[YES] Fix:
logger.info("Starting bulk user processing", extra={"total_users": len(user_ids)})
for user_id in user_ids:
# Only log errors, not normal flow
try:
process_user(user_id)
except Exception as e:
logger.error("Failed to process user", extra={
"user_id": user_id,
"error": str(e)
})
logger.info("Bulk user processing completed", extra={
"total_users": len(user_ids),
"duration_sec": elapsed_time
})
Problem: Missing Context
[NO] Bad:
logger.error("Connection failed") # Which connection? Which service?
logger.warning("Request timed out") # Which request? What timeout?
[YES] Good:
logger.error("Database connection failed", extra={
"host": "db.prod.example.com",
"port": 5432,
"error": "connection refused",
"timeout_ms": 5000
})
logger.warning("API request timed out", extra={
"service": "payment_provider",
"endpoint": "/api/charges",
"timeout_ms": 5000,
"attempt": 2
})
Problem: Logging Sensitive Data
[NO] Bad:
logger.info("User login", extra={
"email": user.email,
"password": user.password, # NEVER log this!
"ssn": user.ssn # NEVER log this!
})
[YES] Good:
logger.info("User login successful", extra={
"user_id": user.id,
"email_hash": hash(user.email), # Hash for verification
"ip_address": request.remote_addr
})
Logging Checklist
Before deploying, verify:
- No sensitive data: No passwords, API keys, PII in logs
- Appropriate levels: DEBUG/INFO in right places
- Unique identifiers: Include IDs (user_id, order_id, request_id)
- Correlation IDs: All related requests traceable (microservices)
- Error context: Errors include error codes and context
- Not redundant: Same information not logged twice
- Not noisy: Not logging every normal operation
- Parsing-friendly: JSON structured logging (not raw strings)
- Performance impact: Logging overhead acceptable in hot paths
Related Commands
/pb-security- Logging sensitive data safely/pb-observability- Logging as part of observability/pb-incident- Using logs during incident investigation/pb-guide- Implementing logging in development/pb-testing- Testing logging behavior
Tools Reference
Tools to consider:
- Local: Python
logging, Node.jswinston, Gozap - Cloud: AWS CloudWatch, GCP Cloud Logging, Azure Monitor
- Aggregation: ELK Stack, Splunk, Datadog, New Relic
Created: 2026-01-11 | Category: Code Review | Tier: M