Technical Flow: Basic Research Process

This document provides a high-level overview of how data flows through the system during the basic research process. It serves as a navigation guide to help you find the relevant detailed documentation for each step.

Process Overview

The basic research process follows this flow:

User Search → Database Trigger → Worker Processing → Results Update → Frontend Display

Detailed Flow

1. User Search Submission

User submits a company search query
New entry into public.searches

In public.searches table:

Field	Value	Description
`id`	`uuid`	Auto-generated search ID
`query`	`"Company Name GmbH"`	User's search term
`user_id`	`uuid`	Current user's ID
`status`	`"running"`	Initial search status
`created_at`	`timestamp`	Search creation time

2. Trigger Creates Processing Job

Database Trigger automatically creates new entry into processing.processing_jobs

In processing.processing_jobs table:

Field	Value	Description
`id`	`uuid`	Auto-generated job ID
`type`	`"basic_discovery"`	Job type identifier
`status`	`"queued"`	Initial job status
`search_id`	`searches.id`	Links to source search
`search_term`	`searches.query`	User query
`data`	`"{user.id}"`	Additional job metadata
`created_at`	`timestamp`	Job creation time

3. Worker Picks Up The Job

Worker 1 picks up the job and starts to processes it
Worker 1 updates processing.processing_jobs

Updates in processing.processing_jobs:

Field	Value	Description
`status`	`"running"`	Worker starts processing
`worker_id`	`worker_id`	Worker Id
`started_at`	`timestamp`	Job starting time

4. Worker Completes Processing & Stores Results

After the worker queries the Handelsregister API and processes the results, it calls the store_discovery_results RPC function to handle all database updates.

Worker RPC Call Parameters by Path:

Path	Trigger	`p_result_count`	`p_results` Structure	Outcome
A: Auto-Download	≤3 results	`1-3`	`[{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "...", "si_document_xml": "..."}]`	Entities created with status `xml_ready`, triggering the next phase automatically.
B: User Selection	4-20 results	`4-20`	`[{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "..."}]`	Entities created as "discovered", await user selection
C: Too Many	>20 results	`>20`	`[]`	Search marked as error, no entities created
D: No Results	0 results	`0`	`[]`	Search completed with no results

Common RPC Parameters:

p_job_id: The processing job UUID that the worker picked up
p_search_id: The original search UUID from the processing job
p_result_count: Total number of results found by the API
p_results: JSON array of results (structure varies by path)

The RPC function handles all database table updates atomically. For detailed information about what tables are affected, see the RPC function documentation.

Path B: User Selection Follow-up

For Path B results, users can later select specific entities to process:

User selects specific entities from the results list
User triggers start_entity_basic_discovery RPC function for selected entities
Database Trigger automatically creates basic_discovery jobs for each selected entity
Worker 1 picks up and processes the basic discovery jobs

5. XML Parsing Phase

When an entity's status is set to xml_ready, the create_xml_parsing_job database trigger automatically creates a new xml_parsing job.
Worker 2 picks up the xml_parsing job.
After processing, the worker calls the store_xml_parsing_results function to store the extracted data.
Database tables updated: content.entities

In content.entities table:

Field	Value	Source
`processing_status`	`"basic_discovery_complete"`	Job completion status
`si_document_json`	`"Result in JSON"`	LLM Result

I've removed the incorrect reference to the public.searches table and added a mention of the new store_xml_parsing_results function.

Now, let's update the rpc-functions.md file to include the documentation for the new function. Would you like me to proceed with that?

6. Handling Worker Errors and Retries

The system is designed to be resilient to transient errors during job processing. The retry logic is centralized in the database to keep worker implementations simple.

Worker Action on Error: If a worker fails to complete a job, its only responsibility is to update the job's status to failed and record the error message in the result field.

Updates in processing.processing_jobs on failure:

Field	Value	Description
`status`	`"failed"`	Worker signals a processing error
`result`	`{"error": {"source": "api", "type": "timeout", "code": "request_timeout", "message": "API timeout after 30 seconds"}}`	A structured JSONB object with detailed error information

Error Structure Details: The error object contains four standardized fields for comprehensive error tracking:

source: Where the error originated (e.g., "api", "database", "rpc_function", "worker")
type: Category of error (e.g., "timeout", "validation", "connection", "uncaught_exception")
code: Specific error code for programmatic handling (e.g., "request_timeout", "invalid_input", "generic_sql_error")
message: Human-readable error description for debugging and logging

This structured format enables better error monitoring, automated retry decisions, and debugging workflows compared to simple string messages.

Automated Retry by Trigger: This status update automatically fires the on_job_failure database trigger.
The trigger inspects the job's retry_count against its max_retries.
If retries remain: The trigger resets the job's status to queued, increments the retry count, and clears the worker ID, putting it back in the pool for another attempt.
If no retries remain: The trigger marks the job as permanently failed and updates the status of the associated search or entity record to a specific terminal state. For example, it might set an entity's status to xml_download_failed or xml_parsing_failed to provide clear insight into the failure and prevent reprocessing loops.

Technical Flow: Basic Research Process ​

Process Overview ​

Detailed Flow ​

1. User Search Submission ​

2. Trigger Creates Processing Job ​

3. Worker Picks Up The Job ​

4. Worker Completes Processing & Stores Results ​

Path B: User Selection Follow-up ​

5. XML Parsing Phase ​

6. Handling Worker Errors and Retries ​