Skip to content

Content Schema Tables

Access: Read-only for authenticated users, full access for workers

entities Table

Stores company records with extracted data and processing status.

Field Definitions

FieldTypeDescriptionEnum Values
iduuidPrimary key-
nametextCompany name-
legal_formtextLegal form of the company-
register_courttextCourt where company is registered-
register_typetextRegister type (e.g., HRB, HRA, VR)-
register_numbertextOfficial registration number-
seattextCompany location/seat-
statetextGerman state where the company is located (e.g., "Berlin", "Bayern")-
registration_statustextCurrent registration status (e.g., "currently registered")-
addresstextCompany address-
rawtextRaw data dump from the source system-
si_document_xmlxmlRaw XML document from Handelsregister-
si_document_jsonJSONBParsed XML document-
si_document_retrieved_attimestampWhen XML was last retrieved-
processing_statusprocessing_status_enumCurrent basic discovery processing stagediscovered, basic_discovery_running, xml_ready, xml_parsing_running, basic_discovery_complete, xml_download_failed, xml_parsing_failed
shareholder_research_statusshareholder_research_status_enumStatus of optional shareholder researchnot_started, ready, downloading, download_failed, downloaded, parsing, parsing_failed, complete
deep_research_statusdeep_research_status_enumStatus of optional deep researchnot_started, ready, running, failed, complete
created_attimestampRecord creation time-
updated_attimestampLast update time-

files Table

Stores metadata for each unique physical file downloaded from the Handelsregister. This table uses a content-based hash (file_hash) as a unique key to ensure that the same file is never stored more than once, saving significant storage space.

Field Definitions

FieldTypeDescription
iduuidPrimary key
file_hashtextUnique SHA256 hash of the file's content for deduplication.
storage_pathtextThe full path to the file in the entity_documents storage bucket.
file_size_bytesbigintThe size of the file in bytes.
mime_typetextThe MIME type of the file (e.g., application/pdf, application/zip).
created_attimestampTimestamp of when the file record was first created.

documents Table

This table serves as a bridge, linking a physical file (content.files) to a specific company (content.entities) and enriching it with essential business context and metadata from the Handelsregister. It also supports hierarchical relationships, allowing us to track files that were extracted from an archive (like a ZIP).

Field Definitions

FieldTypeDescription
iduuidPrimary key
entity_iduuidForeign key linking the document to a specific company in content.entities.
file_iduuidForeign key linking to the actual physical file in content.files.
parent_document_iduuidIf extracted from an archive, this links to the parent archive's record in this table.
original_filenametextThe original, often cryptic, filename as downloaded from the source.
display_nametextA clean, human-readable name for the document, potentially generated by AI.
hr_document_pathtextThe navigational path from the Handelsregister portal (e.g., VÖ/1/2).
document_datedateThe date printed on the document itself (e.g., date of signing).
received_ondateThe date the document was received by the register.
published_ondateThe date the document was officially published.
created_bytextThe source named in the Handelsregister
type_of_documenttextThe specific type of the document (e.g., 'Gesellschafterliste', 'Jahresabschluss').
language_identifiertextThe language of the document in the Handelsregister
created_attimestampRecord creation time.
updated_attimestampLast update time.

document_pages

  • Stores OCR results page by page with content hashes
  • Enables intelligent deduplication to avoid reprocessing identical pages
  • Most expensive operation (OCR) gets cached here

document_chunks

  • Stores semantic chunks created by LLM processing
  • Contains embeddings and metadata for RAG-based chat
  • Prepares data for efficient question answering

entity_shareholders Table

Stores shareholder information for companies, extracted from shareholder list documents via LLM processing.

Field Definitions

FieldTypeDescription
iduuidPrimary key
entity_iduuidForeign key to the company in content.entities
shareholder_typetextType of shareholder: natural_person or organization
first_nametextFirst name (for natural persons)
last_nametextLast name (for natural persons)
date_of_birthdateDate of birth (for natural persons)
residencetextResidence/location (wohnort, for natural persons)
company_nametextCompany name (for organizations)
register_courttextRegister court (registergericht, for organizations)
register_typetextRegister type (e.g., HRB, HRA, VR, for organizations)
register_numbertextRegister number (registernummer, for organizations)
seattextCompany seat/location (for organizations)
foreign_entitybooleanIf true, indicates a foreign entity that should not trigger automatic discovery
resolved_entity_iduuidForeign key to content.entities if shareholder organization has been resolved to an existing entity
share_nominal_amountnumericNominal amount of shares held (nennbetrag_anteil)
share_percentagenumericPercentage of ownership (anteil_prozent_einzel)
sequence_numberintegerOrder/sequence number in the shareholder list (lfd_nummer)
source_document_iduuidForeign key to the source document in content.documents
created_attimestampRecord creation time
updated_attimestampLast update time