Tier 1: Basic Discovery Process
Trigger: User submits search query Completion: Search results updated with structured company data
Step 1: Search Execution & XML Collection
Job Type: processing_jobs.type = 'basic_discovery'
Process Flow
Search Execution & HTML Parsing
- Submit company name search to Handelsregister
- Handle disambiguation → Get complete results list
- Parse HTML result table to extract basic discovery data:
- Company Name: From
<span class="marginLeft20">element - Register Info: From
<span class="fontWeightBold">(e.g., "District court Stuttgart VR 720092")- Extract
register_court: "District court Stuttgart" - Extract
register_type: "VR" (or HRB, HRA, etc.) - Extract
register_number: "720092"
- Extract
- Location: From
<span class="verticalText">in location column - Registration Status: From
<span class="verticalText">in status column (e.g., "currently registered")
- Company Name: From
Path Decision & Entity Creation
- Call
store_discovery_resultsRPC with parsed data - Path A (≤3 results): Download SI XML for each entity, store with basic data
- Path B (4-20 results): Store basic data only, mark as
discoveredfor user selection - Path C (>20 results): Mark search as
result_count_exceeds_limit
- Call
Next Job: Creates processing_jobs.type = 'xml_parsing' for entities with new or updated XML
Step 2: LLM-Based Data Extraction
Job Type: processing_jobs.type = 'xml_parsing'
Process Flow
XML Retrieval
- Read stored XML from
entities.si_document_xml - Validate XML structure and content availability
- Read stored XML from
LLM-Based Parsing
- Extract structured data using LLM to handle varying XML formats
- Parse board members, addresses, capital, business purpose
- Handle format variations across different German courts
Database Update
- Store extracted structured data in entity fields
- Update entity processing status
- Mark parsing completion timestamp
Search Update
- Update search results with newly parsed entity data
- Trigger frontend refresh for live results display
Completion: Update entities.processing_status = 'basic_discovery_complete'