Technical Flow: Basic Research Process
This document provides a high-level overview of how data flows through the system during the basic research process. It serves as a navigation guide to help you find the relevant detailed documentation for each step.
Process Overview
The basic research process follows this flow:
User Search → Database Trigger → Worker Processing → Results Update → Frontend DisplayDetailed Flow
1. User Search Submission
- User submits a company search query
- New entry into public.searches
In public.searches table:
| Field | Value | Description |
|---|---|---|
id | uuid | Auto-generated search ID |
query | "Company Name GmbH" | User's search term |
user_id | uuid | Current user's ID |
status | "running" | Initial search status |
created_at | timestamp | Search creation time |
2. Trigger Creates Processing Job
- Database Trigger automatically creates new entry into processing.processing_jobs
In processing.processing_jobs table:
| Field | Value | Description |
|---|---|---|
id | uuid | Auto-generated job ID |
type | "basic_discovery" | Job type identifier |
status | "queued" | Initial job status |
search_id | searches.id | Links to source search |
search_term | searches.query | User query |
data | "{user.id}" | Additional job metadata |
created_at | timestamp | Job creation time |
3. Worker Picks Up The Job
- Worker 1 picks up the job and starts to processes it
- Worker 1 updates processing.processing_jobs
Updates in processing.processing_jobs:
| Field | Value | Description |
|---|---|---|
status | "running" | Worker starts processing |
worker_id | worker_id | Worker Id |
started_at | timestamp | Job starting time |
4. Worker Completes Processing & Stores Results
After the worker queries the Handelsregister API and processes the results, it calls the store_discovery_results RPC function to handle all database updates.
Worker RPC Call Parameters by Path:
| Path | Trigger | p_result_count | p_results Structure | Outcome |
|---|---|---|---|---|
| A: Auto-Download | ≤3 results | 1-3 | [{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "...", "si_document_xml": "..."}] | Entities created with status xml_ready, triggering the next phase automatically. |
| B: User Selection | 4-20 results | 4-20 | [{"name": "...", "register_court": "...", "register_type": "...", "register_number": "...", "location": "...", "registration_status": "..."}] | Entities created as "discovered", await user selection |
| C: Too Many | >20 results | >20 | [] | Search marked as error, no entities created |
| D: No Results | 0 results | 0 | [] | Search completed with no results |
Common RPC Parameters:
p_job_id: The processing job UUID that the worker picked upp_search_id: The original search UUID from the processing jobp_result_count: Total number of results found by the APIp_results: JSON array of results (structure varies by path)
The RPC function handles all database table updates atomically. For detailed information about what tables are affected, see the RPC function documentation.
Path B: User Selection Follow-up
For Path B results, users can later select specific entities to process:
- User selects specific entities from the results list
- User triggers start_entity_basic_discovery RPC function for selected entities
- Database Trigger automatically creates
basic_discoveryjobs for each selected entity - Worker 1 picks up and processes the basic discovery jobs
5. XML Parsing Phase
- When an entity's status is set to
xml_ready, thecreate_xml_parsing_jobdatabase trigger automatically creates a newxml_parsingjob. - Worker 2 picks up the
xml_parsingjob. - After processing, the worker calls the
store_xml_parsing_resultsfunction to store the extracted data. - Database tables updated: content.entities
In content.entities table:
| Field | Value | Source |
|---|---|---|
processing_status | "basic_discovery_complete" | Job completion status |
si_document_json | "Result in JSON" | LLM Result |
I've removed the incorrect reference to the public.searches table and added a mention of the new store_xml_parsing_results function.
Now, let's update the rpc-functions.md file to include the documentation for the new function. Would you like me to proceed with that?
6. Handling Worker Errors and Retries
The system is designed to be resilient to transient errors during job processing. The retry logic is centralized in the database to keep worker implementations simple.
- Worker Action on Error: If a worker fails to complete a job, its only responsibility is to update the job's status to
failedand record the error message in theresultfield.
Updates in processing.processing_jobs on failure:
| Field | Value | Description |
|---|---|---|
status | "failed" | Worker signals a processing error |
result | {"error": {"source": "api", "type": "timeout", "code": "request_timeout", "message": "API timeout after 30 seconds"}} | A structured JSONB object with detailed error information |
Error Structure Details: The error object contains four standardized fields for comprehensive error tracking:
source: Where the error originated (e.g.,"api","database","rpc_function","worker")type: Category of error (e.g.,"timeout","validation","connection","uncaught_exception")code: Specific error code for programmatic handling (e.g.,"request_timeout","invalid_input","generic_sql_error")message: Human-readable error description for debugging and logging
This structured format enables better error monitoring, automated retry decisions, and debugging workflows compared to simple string messages.
- Automated Retry by Trigger: This status update automatically fires the
on_job_failuredatabase trigger. - The trigger inspects the job's
retry_countagainst itsmax_retries. - If retries remain: The trigger resets the job's status to
queued, increments the retry count, and clears the worker ID, putting it back in the pool for another attempt. - If no retries remain: The trigger marks the job as permanently failed and updates the status of the associated
searchorentityrecord to a specific terminal state. For example, it might set an entity's status toxml_download_failedorxml_parsing_failedto provide clear insight into the failure and prevent reprocessing loops.