Skip to content

Database Schema Organization

Architecture Overview

Our Handelsregister application uses a functional domain schema organization with four distinct schemas in Supabase. This structure supports our stateless worker architecture and separates concerns across development teams.

handelsregister_app/
├── public/           # User-facing data and interactions
├── processing/       # Pipeline management and job orchestration  
├── content/         # Documents, OCR results, and extracted data
└── system/          # Configuration, monitoring, and AI management

Schema Definitions

Public Schema - User Experience Layer

Owner: Frontend Team
Access: All authenticated users (with RLS), auto-generated Supabase APIs

public/
├── users                    # User profiles (extends Supabase Auth)
├── search_history          # "My Searches" dashboard data
├── chat_sessions           # Conversation containers per entity
└── chat_messages           # Individual Q&A interactions

Rules:

  • All tables MUST have Row Level Security (RLS) enabled
  • Foreign keys to other schemas are allowed and encouraged
  • This is the only schema exposed to auto-generated Supabase client APIs

Processing Schema - Pipeline Orchestration

Owner: Backend Team
Access: Service role only (workers and admin), RPC functions for frontend triggers

processing/
├── processing_jobs         # Core job queue (collection → OCR → chunking)
├── workflow_steps          # Pipeline configuration and rules
└── job_step_history        # Complete audit trail of job execution

Rules:

  • NO direct table access from frontend code
  • Frontend triggers processing via RPC functions only
  • Workers poll processing_jobs for available work
  • All workflow decisions MUST be database-driven, not hardcoded
  • Complete audit trail required for all job state changes

Content Schema - Document Data and Results

Owner: Backend Team
Access: Read-only for authenticated users, full access for workers

content/
├── entities               # Company records with cache management
├── documents              # Downloaded PDF/TIF metadata  
├── document_pages         # OCR results with hash-based deduplication
├── document_chunks        # Semantic chunks for RAG-based chat

Rules:

  • Users get SELECT permissions only
  • Workers get full CRUD permissions
  • Hash-based deduplication MUST be implemented for document_pages
  • All large data tables belong here

System Schema - Configuration and AI Management

Owner: Backend Team
Access: Admin and service roles only

system/
├── prompt_templates       # Reusable LLM prompts with versioning
├── extraction_schemas     # Expected JSON output structures
└── prompt_executions      # LLM interaction logs and analytics

Rules:

  • NO user access - admin/developer only
  • All LLM prompts MUST be stored here, not hardcoded
  • Prompt versioning is required for A/B testing
  • Complete logging of all LLM interactions for cost tracking