Worker Design Philosophy
Core Principle: Database as Single Source of Truth
Workers are designed as stateless executors that rely entirely on the database for state management, workflow decisions, and coordination. This approach ensures reliability, observability, and recovery capabilities in our distributed processing system.
What is a Worker?
All workers in our system follow the same foundational principles:
Stateless Execution
- No memory between operations - Workers don't retain context from previous executions
- No shared state - Workers don't communicate with each other directly
- Database dependency - All state and decisions come from database queries
- Crash resilient - Workers can be killed and restarted without losing anything
Database-Driven Architecture
- Single source of truth - Database contains all processing state and business rules
- Atomic operations - All database interactions are transactional
- Complete audit trail - Every action is recorded for debugging and monitoring
- Decision delegation - Workers ask database "what should I do?" rather than deciding
Core Worker Pattern
Every worker operation follows this pattern:
- Query database for work or state information
- Execute business logic based on database-provided context
- Update database with results and new state
- Create next job if part of multi-step workflow
Unified Worker Architecture
All workers in our system follow the same foundational pattern - they are stateless job processors that use the database as their single source of truth. Workers are differentiated by the job type they process, not by their execution pattern.
Core Worker Principles
- Stateless execution - No memory between job executions
- Database-driven - All state and decisions come from database queries
- Job queue based - All work flows through
processing_jobstable - Crash resilient - Workers can be killed and restarted without losing anything
- Type-specific - Each worker only processes jobs of its designated type
Universal Job Processing Pattern
Every worker follows this identical 5-step pattern for every job:
"Give me a job to do" (poll database)
- Query database for available work:
SELECT * FROM processing_jobs WHERE type = 'my_job_type' AND status = 'queued' - Use atomic operations to claim jobs and prevent race conditions
- Natural load balancing - idle workers automatically pick up more work
- Query database for available work:
"I'm starting this job" (update status to running)
- Immediately mark job as
runningwith timestamp - Record worker ID for tracking and debugging
- Establishes clear ownership and prevents duplicate processing
- Immediately mark job as
Do the actual work (business logic execution)
- Execute the business logic for this specific job type
- Handle all necessary external API calls, file operations, etc.
- Maintain focus on the single task at hand
"I finished, here's the result" (update status and result)
- Update job status to
completedorfailed - Store all results, error messages, and metadata in database
- Provide complete audit trail of what happened
- Update job status to
"Create next job if needed" (multi-step workflows)
- For multi-step workflows, create the next job in the sequence
- Example:
document_collectionworker createsocrjob when complete - Single-step jobs (like
basic_discovery) don't create follow-up jobs
User Experience vs Worker Implementation
From the worker perspective: All jobs are processed identically through the job queue system.
User experience differences are handled by the frontend:
- Wait for results: Frontend polls job status until completion (basic discovery)
- Background processing: Frontend shows "processing in background" and notifies when done (document collection, OCR, chunking)
Workers are completely unaware of how the frontend presents the job to users.
Implementation Benefits
Observability
- Complete visibility into system state through database queries
- Real-time monitoring of all job processing
- Historical analysis of performance and error patterns across all worker types
Recovery and Debugging
- Granular recovery - restart individual failed jobs
- Complete audit trail of all processing steps
- Manual intervention capabilities through database updates
- Failed jobs remain in queue for retry
Scalability
- Linear scaling - add workers without coordination overhead
- Natural load balancing through database polling
- No inter-worker communication or state synchronization needed
- Each worker type can be scaled independently
Reliability
- Single point of truth eliminates consistency problems
- Atomic database operations ensure data integrity
- Worker failures don't corrupt system state
- Failed jobs remain in queue for retry
Trade-offs and Considerations
Performance Overhead
- Additional database round-trips between job steps
- Serialization/deserialization of intermediate results
- Acceptable cost for operational benefits
Database Load
- Constant polling creates steady database traffic
- Requires proper indexing and connection pooling
- Database becomes critical system component
Complexity
- Requires building job scheduling and coordination infrastructure
- More complex than simple in-memory processing
- Justified by reliability and observability requirements
This philosophy prioritizes reliability, observability, and operational simplicity over raw performance, providing a consistent pattern for all types of processing workflows.