Skip to content

Tier 1: Basic Discovery Process

Trigger: User submits search query Completion: Search results updated with structured company data

Step 1: Search Execution & XML Collection

Job Type: processing_jobs.type = 'basic_discovery'

Process Flow

  1. Search Execution & HTML Parsing

    • Submit company name search to Handelsregister
    • Handle disambiguation → Get complete results list
    • Parse HTML result table to extract basic discovery data:
      • Company Name: From <span class="marginLeft20"> element
      • Register Info: From <span class="fontWeightBold"> (e.g., "District court Stuttgart VR 720092")
        • Extract register_court: "District court Stuttgart"
        • Extract register_type: "VR" (or HRB, HRA, etc.)
        • Extract register_number: "720092"
      • Location: From <span class="verticalText"> in location column
      • Registration Status: From <span class="verticalText"> in status column (e.g., "currently registered")
  2. Path Decision & Entity Creation

    • Call store_discovery_results RPC with parsed data
    • Path A (≤3 results): Download SI XML for each entity, store with basic data
    • Path B (4-20 results): Store basic data only, mark as discovered for user selection
    • Path C (>20 results): Mark search as result_count_exceeds_limit

Next Job: Creates processing_jobs.type = 'xml_parsing' for entities with new or updated XML

Step 2: LLM-Based Data Extraction

Job Type: processing_jobs.type = 'xml_parsing'

Process Flow

  1. XML Retrieval

    • Read stored XML from entities.si_document_xml
    • Validate XML structure and content availability
  2. LLM-Based Parsing

    • Extract structured data using LLM to handle varying XML formats
    • Parse board members, addresses, capital, business purpose
    • Handle format variations across different German courts
  3. Database Update

    • Store extracted structured data in entity fields
    • Update entity processing status
    • Mark parsing completion timestamp
  4. Search Update

    • Update search results with newly parsed entity data
    • Trigger frontend refresh for live results display

Completion: Update entities.processing_status = 'basic_discovery_complete'