Skip to content

Worker Design Philosophy

Core Principle: Database as Single Source of Truth

Workers are designed as stateless executors that rely entirely on the database for state management, workflow decisions, and coordination. This approach ensures reliability, observability, and recovery capabilities in our distributed processing system.

What is a Worker?

All workers in our system follow the same foundational principles:

Stateless Execution

  • No memory between operations - Workers don't retain context from previous executions
  • No shared state - Workers don't communicate with each other directly
  • Database dependency - All state and decisions come from database queries
  • Crash resilient - Workers can be killed and restarted without losing anything

Database-Driven Architecture

  • Single source of truth - Database contains all processing state and business rules
  • Atomic operations - All database interactions are transactional
  • Complete audit trail - Every action is recorded for debugging and monitoring
  • Decision delegation - Workers ask database "what should I do?" rather than deciding

Core Worker Pattern

Every worker operation follows this pattern:

  1. Query database for work or state information
  2. Execute business logic based on database-provided context
  3. Update database with results and new state
  4. Create next job if part of multi-step workflow

Unified Worker Architecture

All workers in our system follow the same foundational pattern - they are stateless job processors that use the database as their single source of truth. Workers are differentiated by the job type they process, not by their execution pattern.

Core Worker Principles

  • Stateless execution - No memory between job executions
  • Database-driven - All state and decisions come from database queries
  • Job queue based - All work flows through processing_jobs table
  • Crash resilient - Workers can be killed and restarted without losing anything
  • Type-specific - Each worker only processes jobs of its designated type

Universal Job Processing Pattern

Every worker follows this identical 5-step pattern for every job:

  1. "Give me a job to do" (poll database)

    • Query database for available work: SELECT * FROM processing_jobs WHERE type = 'my_job_type' AND status = 'queued'
    • Use atomic operations to claim jobs and prevent race conditions
    • Natural load balancing - idle workers automatically pick up more work
  2. "I'm starting this job" (update status to running)

    • Immediately mark job as running with timestamp
    • Record worker ID for tracking and debugging
    • Establishes clear ownership and prevents duplicate processing
  3. Do the actual work (business logic execution)

    • Execute the business logic for this specific job type
    • Handle all necessary external API calls, file operations, etc.
    • Maintain focus on the single task at hand
  4. "I finished, here's the result" (update status and result)

    • Update job status to completed or failed
    • Store all results, error messages, and metadata in database
    • Provide complete audit trail of what happened
  5. "Create next job if needed" (multi-step workflows)

    • For multi-step workflows, create the next job in the sequence
    • Example: document_collection worker creates ocr job when complete
    • Single-step jobs (like basic_discovery) don't create follow-up jobs

User Experience vs Worker Implementation

From the worker perspective: All jobs are processed identically through the job queue system.

User experience differences are handled by the frontend:

  • Wait for results: Frontend polls job status until completion (basic discovery)
  • Background processing: Frontend shows "processing in background" and notifies when done (document collection, OCR, chunking)

Workers are completely unaware of how the frontend presents the job to users.

Implementation Benefits

Observability

  • Complete visibility into system state through database queries
  • Real-time monitoring of all job processing
  • Historical analysis of performance and error patterns across all worker types

Recovery and Debugging

  • Granular recovery - restart individual failed jobs
  • Complete audit trail of all processing steps
  • Manual intervention capabilities through database updates
  • Failed jobs remain in queue for retry

Scalability

  • Linear scaling - add workers without coordination overhead
  • Natural load balancing through database polling
  • No inter-worker communication or state synchronization needed
  • Each worker type can be scaled independently

Reliability

  • Single point of truth eliminates consistency problems
  • Atomic database operations ensure data integrity
  • Worker failures don't corrupt system state
  • Failed jobs remain in queue for retry

Trade-offs and Considerations

Performance Overhead

  • Additional database round-trips between job steps
  • Serialization/deserialization of intermediate results
  • Acceptable cost for operational benefits

Database Load

  • Constant polling creates steady database traffic
  • Requires proper indexing and connection pooling
  • Database becomes critical system component

Complexity

  • Requires building job scheduling and coordination infrastructure
  • More complex than simple in-memory processing
  • Justified by reliability and observability requirements

This philosophy prioritizes reliability, observability, and operational simplicity over raw performance, providing a consistent pattern for all types of processing workflows.