Changelog
Release notes for the Pauhu® platform. All infrastructure runs on Cloudflare Workers in EU jurisdiction (Helsinki).
2026-03-11 v5
CUA browser automation, 6 new documentation pages, staging consolidation, security hardening, and link integrity verification across all 5 domains.
Computer Use Agent (CUA)
- New CUA architecture — screenshot→reason→act→verify loop with deontic safety gates. 7 action types, 12 allowlisted EU portals, human-in-the-loop confirmation for all form submissions.
- New CUA API — 8 endpoints: /v1/cua/start, action, screenshot, stream (SSE), confirm, rollback, stop, stats. Full request/response schemas with curl examples.
- New CUA MCP tools — 8 Playwright actions exposed as MCP tools (click, type, scroll, navigate, select, submit, wait, screenshot) for programmatic browser automation.
- New CUA safety model — sandbox boundaries, M1 prohibited actions, audit trail (90-day retention), GDPR Art. 17 erasure support, EU-jurisdiction screenshot storage.
Documentation
- New 6 CUA documentation pages — architecture, API reference, safety model, quickstart (TED eProcurement walkthrough), MCP tools, FAQ (18 Q&As).
- New Getting Started pages for pauhu.eu (data feeds) and pauhu.com (search + translate). 3-step format with curl examples.
- Improved Documentation index updated with CUA section links. All 5 staging domains verified: 43 same-domain targets, 12 cross-domain targets, 0 broken links.
Staging consolidation (v4 → v5)
- Improved Direct index.html serving — removed _index.html indirection layer. Staging domains now serve content directly instead of “Building…” placeholder pages.
- Improved pauhu.eu index restored — ticker ribbons, all product sections, and prior improvements fused into single index (commit 653df7c5d).
- Improved 52 files changed across staging domains: pricing pages, source pages, quality dashboards, welcome flows refreshed.
Security
- Security X-Pauhu-Domain injection fix — header always deleted and overwritten in gateway proxy, preventing cross-product ACL bypass.
- Security /v1/feature-health auth — endpoint now requires admin/premium auth with 10/min rate limit. Evidence field redacted to prevent binding name leakage.
- Security CORS + chat fix — Gateway CORS headers and chat endpoint corrected.
Internationalization
- Fixed Hardcoded locale parameters on 3 staging files (brief.js, translate-monitor.js, quality/index.html).
- New 68 CUA i18n keys added for overlay labels, action types, error messages, and TED demo flow in 24 EU languages.
2026-03-07 Update
Sovereign Brain architecture documentation, 3-tier adaptive model loading, supply chain sovereignty, annotation inheritance, and vectorize embedding pipeline fixes.
Sovereign Brain architecture
- New Sovereign Brain documentation — full architecture guide explaining two-hemisphere design: left hemisphere (Laine search, 26ms paragraph retrieval) and right hemisphere (FiD mT5, grounded generation with citations).
- New Thalamus gateway — single entry point that validates every query and routes to the correct hemisphere before either side does any work.
- New Supply chain sovereignty — models run in the browser or on your server. No cloud API dependency, no model hosting subscription, no inference-per-token billing. ONNX open standard (ISO/IEC 17203), OCI-compliant containers.
Adaptive model loading
- New 3-tier loading — Lite (<4 GB, search only), Standard (4–16 GB, search + FiD generation), Full (>16 GB, all models including 552 NMT pairs and 21 domain classifiers).
- New Progressive download — search models load first (available in seconds), FiD generation loads second (10–30s), translation pairs load on demand.
- Improved FiD generation model fits in 300 MB of DRAM. Runs on commodity hardware without special procurement.
- Improved Browser-native inference via ONNX Runtime for WebAssembly. No server, no GPU required. Data never leaves the browser process.
Annotation inheritance
- New English-first Rosetta pattern — English is annotated with highest-quality NLP models, and structural annotations (topic, deontic modality, cross-references) are inherited by all 24 parallel language versions.
- New COALESCE/NULLIF SQL — inheritance logic prefers language-specific annotations when available, falls back to English otherwise. 24/24 EU language coverage for all topic domains.
- Improved Multilingual rollout in progress: multilingual indexing, backfill of existing EN-only documents, cross-language annotation verification.
Vectorize embedding pipeline
- Fixed Vector serialization — Float32Array output from BGE-M3 was not serializing correctly to the vector index. Fixed via
Array.from()normalization before storage. Vectors now populate correctly across all 20 product indexes. - Improved Documented full embedding path: object storage → annotation engine (STAM) → structured index (D1) → embedding service (BGE-M3, 1024 dimensions) → vector index (cosine similarity).
Documentation
- New CRM API reference — internal API documentation for sales pipeline: contacts, companies, deals, activities, tasks, email sequences, AI lead scoring.
- New Data pipeline guide — 14-section documentation covering ingestion, STAM annotation, paragraph indexing, semantic search, grounded generation, multilingual flow, annotation inheritance, vectorize embedding, adaptive loading, browser sidebar integration.
- New “Works Alongside Your Tools” — browser sidebar overlay for contextual search, grounded answers, terminology lookup, and local translation. No vendor lock-in, no per-seat licensing.
2026-03-06 Update
Security audit clearance, E2E staging verification, FiD dual-brain architecture, 24-language translation at 100% coverage, and container hardening.
Security
- Security FiD dual-brain audit complete — 0 CRITICAL findings. ONNX model SHA-256 checksums verified. STAM bounds checking confirmed. Server-side grounding guarantee validated.
- Security Container security clearance — all sovereign container images audited: FiD container cleared, Compass container cleared (XSS fix applied), LDS container conditionally passed.
- Security GDPR Art. 17 erasure — right-to-erasure flow verified end-to-end. Data retention policy documented.
- Security Air-gap readiness — government readiness audit scored 72/100. Offline mode detection and Finnish/Dutch language detection added.
Staging verification
- Improved End-to-end launch testing across all staging domains (pauhu.eu, pauhu.com, pauhu.ai, pauhu.dev). 8 tests PASS, 2 PASS with runtime notes, 2 deferred.
- Improved Data feeds page verified: 20 EU institutional source cards with correct attribution and licensing.
- Improved Sovereign deployment connector page verified.
FiD architecture
- New FiD browser neural network specification — Fusion-in-Decoder architecture for grounded answer generation. Encoder processes each retrieved paragraph independently; decoder attends to all simultaneously for cross-document reasoning.
- Improved FiD data format cleaned and validated. Retrieval comparison test against BGE-M3 cross-lingual embeddings completed.
Internationalisation
- Improved 24/24 EU languages at 100% coverage — all translation keys verified across Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish.
- Improved FiD 24-language prompt templates delivered. Decoder locale keys wired.
Infrastructure
- Improved Worker count increased from 97 to 150. All new workers with EU jurisdiction placement.
- Improved 4.7M+ objects verified across 24 product R2 buckets.
- Improved LLM adapter port changed from 8000 to 8001 to resolve collision with TTS container.
- Fixed Chat renderer wired into workspace UI.
2026-03-03 Launch
The European launch release brings server-side document extraction via Document extraction, a transparent two-part pricing model, 20 live EU institutional data feeds, 2.4M IATE terminology terms, and browser-native ONNX inference with full offline support.
Document extraction integration
- New POST /api/v1/extract — Server-side document extraction via Document extraction headless Chrome on EU Hetzner Helsinki. Extracts text from any URL with optional IATE terminology lookup and STAM annotation.
- New POST /api/v1/extract-and-index — Extract, annotate, and store documents in EU storage for automatic indexing. Writes STAM sidecar JSON with full provenance metadata.
- New POST /api/v1/pdf-render — Render any URL to PDF via server-side Chrome. Returns raw PDF binary with inline Content-Disposition.
- Improved Document extraction runs on Hetzner Helsinki — documents never leave the EU. Tab is closed after each extraction (stateless).
Two-part tariff pricing
- New GET /v1/pricing/two-part — Transparent two-part tariff: Azure pass-through cost (live from Azure Retail Prices API, northeurope region, EUR) plus Pauhu Data License (fixed EUR amounts). No markup on compute, clear separation of infrastructure and data costs.
- New GET /v1/pricing/data-licenses — Enterprise data licensing endpoint with tiered pricing.
- New GET /v1/pricing/ranker — Semantic ranker add-on pricing.
- Improved All pricing pages across 5 domains (pauhu.eu, pauhu.com, pauhu.ai, pauhu.dev, pauhu.io) now use
pauhu-pricing.jsfor live client-side price filling from the Azure Retail Prices API.
20 EU institutional data feeds
- New All 20 EU data sources are live with dedicated R2 buckets, D1 databases, Vectorize indexes, and queue pipelines per product.
- New Products: Commission, Consilium, CORDIS, CURIA, Data Europa, DPP, ECB, ECHA, EMA, EPO, Europarl, EUR-Lex, Eurostat, IATE, National Law (28 countries), OEIL, Publications, TED, Who is Who, Wiki.
- Improved Hybrid search across all 20 indexes: 70% BGE-M3 semantic similarity (1024 dimensions, cosine) + 30% BM25 keyword matching via the Laine Algorithm.
- Improved STAM (Stand-off Text Annotation Model) sidecar format for non-destructive annotations with full provenance tracking. All producers and consumers migrated from
.annotation.jsonto.stam.json. - Improved 11 data sync services migrated to eternal protocols (SPARQL, SDMX, REST) — zero fragile API patterns.
IATE terminology
- Improved 2,456,445 terms across 24 EU official languages. Reliability scores and domain classification included in all responses.
- Improved Fuzzy search, exact lookup, entry by concept ID, language stats, and quality dashboard endpoints all live.
- Improved IATE terms automatically extracted during Document extraction document processing when
terminology: trueis set.
Browser-native inference
- New ONNX Runtime Web integration with WebGPU/WASM backend auto-detection. Cold-start and warm-start benchmarking via the Onboarding Wizard.
- New Full offline mode with Service Worker caching. Pre-flight asset audit, download progress indicators, and offline workflow verification.
- New Browser-side vision models: Tesseract.js (OCR), ViT-GPT2 (captioning), MobileNetV3 (classification), Legal-BERT (NER), BART-Large-CNN (summarization). No data leaves the browser.
- Improved Model downloads show granular progress indicators with phase labels (detecting device, cold loading, cold inference, warm loading, warm inference).
Infrastructure
- Improved 97 Cloudflare Workers, all with
[placement] jurisdiction = "eu". All R2 buckets EU jurisdiction. All D1 databases created with--location eu. - Improved 16-gate orchestrator loop with deontic modality classification (prohibition, obligation, permission, exemption).
- Improved DSPy integration: 486 signatures, 401 modules, 34 orchestrators. MetaPromptEngine for prompt optimization with EU AI Act transparency compliance.
- Improved EWMA-based gate health monitoring replaces Z-score. Trend-aware statistical process control with 3-sigma control limits.
Accessibility
- Fixed Resolved 7 CRITICAL and 2 HIGH WCAG 2.1 AA violations across all staging domains.
- Improved Skip-to-content links, ARIA landmarks, keyboard navigation, focus traps, text size controls (A+/A-), and reduced-motion support on all 5 domains.
- Improved All images have descriptive alt text. All interactive elements are keyboard-navigable. All forms use proper
role="search"andaria-labelattributes.
Security
- Security IEC 62443-3-3 zone-based security verified. Model Last pattern enforced: gates run before inference.
- Security 81 services hardened with development mode disabled. Only the API router and landing page remain intentionally public.
- Security SHA-256 checksums for all model files. Credential rotation tracking. Python credential audit complete.
- Security GDPR Article 25 (Privacy by Design) and Article 32 (Security of Processing) compliance verified. All data flows EU-only — no third-country transfer.
Compliance
- Improved EU AI Act Article 52 transparency endpoints on all AI-powered services.
- Improved DSA Article 27 ranking transparency metadata in search results.
- Improved NIS2 Important Entity self-assessment (Art. 21 partial).