Skip to content

Technical Flow: Basic Research Process

This document provides a high-level overview of how data flows through the system during the basic research process. It serves as a navigation guide to help you find the relevant detailed documentation for each step.

Process Overview

The basic research process follows this flow:

User Search → Database Trigger → Worker Processing → Results Update → Frontend Display

Detailed Flow

1. User Search Submission

In public.searches table:

FieldValueDescription
iduuidAuto-generated search ID
query"Company Name GmbH"User's search term
user_iduuidCurrent user's ID
status"running"Initial search status
created_attimestampSearch creation time

2. Trigger Creates Processing Job

In processing.processing_jobs table:

FieldValueDescription
iduuidAuto-generated job ID
type"basic_discovery"Job type identifier
status"queued"Initial job status
search_idsearches.idLinks to source search
search_termsearches.queryUser query
data"{user.id}"Additional job metadata
created_attimestampJob creation time

3. Worker Picks Up The Job

Updates in processing.processing_jobs:

FieldValueDescription
status"running"Worker starts processing
worker_idworker_idWorker Id
started_attimestampJob starting time

4. Worker Completes Processing & Stores Results

After the worker queries the Handelsregister API and processes the results, it calls the store_discovery_results RPC function to handle all database updates.

Worker RPC Call Parameters by Path:

PathTriggerp_result_countp_results StructureOutcome
A: Auto-Download≤3 results1-3[{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "...", "si_document_xml": "..."}]Entities created with status xml_ready, triggering the next phase automatically.
B: User Selection4-20 results4-20[{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "..."}]Entities created as "discovered", await user selection
C: Too Many>20 results>20[]Search marked as error, no entities created
D: No Results0 results0[]Search completed with no results

Common RPC Parameters:

  • p_job_id: The processing job UUID that the worker picked up
  • p_search_id: The original search UUID from the processing job
  • p_result_count: Total number of results found by the API
  • p_results: JSON array of results (structure varies by path)

The RPC function handles all database table updates atomically. For detailed information about what tables are affected, see the RPC function documentation.

Path B: User Selection Follow-up

For Path B results, users can later select specific entities to process:

  • User selects specific entities from the results list
  • User triggers start_entity_basic_discovery RPC function for selected entities
  • Database Trigger automatically creates basic_discovery jobs for each selected entity
  • Worker 1 picks up and processes the basic discovery jobs

5. XML Parsing Phase

In content.entities table:

FieldValueSource
processing_status"basic_discovery_complete"Job completion status
si_document_json"Result in JSON"LLM Result

I've removed the incorrect reference to the public.searches table and added a mention of the new store_xml_parsing_results function.

Now, let's update the rpc-functions.md file to include the documentation for the new function. Would you like me to proceed with that?

6. Handling Worker Errors and Retries

The system is designed to be resilient to transient errors during job processing. The retry logic is centralized in the database to keep worker implementations simple.

  • Worker Action on Error: If a worker fails to complete a job, its only responsibility is to update the job's status to failed and record the error message in the result field.

Updates in processing.processing_jobs on failure:

FieldValueDescription
status"failed"Worker signals a processing error
result{"error": {"source": "api", "type": "timeout", "code": "request_timeout", "message": "API timeout after 30 seconds"}}A structured JSONB object with detailed error information

Error Structure Details: The error object contains four standardized fields for comprehensive error tracking:

  • source: Where the error originated (e.g., "api", "database", "rpc_function", "worker")
  • type: Category of error (e.g., "timeout", "validation", "connection", "uncaught_exception")
  • code: Specific error code for programmatic handling (e.g., "request_timeout", "invalid_input", "generic_sql_error")
  • message: Human-readable error description for debugging and logging

This structured format enables better error monitoring, automated retry decisions, and debugging workflows compared to simple string messages.

  • Automated Retry by Trigger: This status update automatically fires the on_job_failure database trigger.
  • The trigger inspects the job's retry_count against its max_retries.
  • If retries remain: The trigger resets the job's status to queued, increments the retry count, and clears the worker ID, putting it back in the pool for another attempt.
  • If no retries remain: The trigger marks the job as permanently failed and updates the status of the associated search or entity record to a specific terminal state. For example, it might set an entity's status to xml_download_failed or xml_parsing_failed to provide clear insight into the failure and prevent reprocessing loops.