API Reference

IATE terminology (2.4M terms), neural translation (1,440+ language pairs), semantic search (20 Vectorize indexes), document annotation. All services run on Cloudflare Workers in EU jurisdiction.

Architecture

Pauhu® EU runs as a fleet of Cloudflare Workers in EU jurisdiction. Each service is a separate worker with its own domain. There is no single unified gateway URL — each API is accessed at its own worker endpoint.

API service endpoints
ServiceWorkerPurpose
Terminologyterminology.pauhu.euIATE term lookup, search, TBX/TMX export
Translationtranslate.pauhu.euHelsinki-NLP OPUS-MT neural translation
Searchsearch.pauhu.euSemantic search across 20 data sources
Annotationannotate.pauhu.euTopic + deontic classification
Modelsmodels.pauhu.aiONNX model CDN (2,342 models)
Gatewaystaging.pauhu.euGate orchestration

All workers return JSON by default. CORS is enabled for pauhu.ai, pauhu.eu, and localhost development origins.

Authentication

API keys

Generate a self-service API key via the Pauhu search service. Keys follow the format pk_*. Each key is linked to a seat tier that determines data access entitlements.

POST /keys/generate

Create an API key. Requires an email address. New keys start as live tier (activated by Stripe subscription).

curl -X POST https://staging.pauhu.eu/keys/generate \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com"}'

Response:

{
  "api_key": "pk_...",
  "tier": "live",
  "entitlements": {
    "raw_feeds": true,
    "annotated_feeds": false,
    "training_export": false,
    "pauhu_ai": false,
    "search": true,
    "terminology": true,
    "translation": true,
    "rerank": true
  },
  "created_at": "2026-02-24T12:00:00.000Z"
}
GET /keys/usage

Check current entitlements and burst limits. Requires Authorization: Bearer <api_key>.

{
  "tier": "live",
  "entitlements": {
    "raw_feeds": true,
    "annotated_feeds": false,
    "training_export": false,
    "pauhu_ai": false,
    "search": true,
    "terminology": true,
    "translation": true,
    "rerank": true
  },
  "burst_limit": "10 req/sec sustained, 50 req/sec peak",
  "active": true,
  "org_id": null
}

Request authentication

Include your API key as a Bearer token:

Authorization: Bearer pk_...

Without a key, requests run in trial mode: 3 requests/day (IP-based), search + terminology + translation only. No access to raw feeds, annotations, or training export.

Terminology API LIVE

Serves 2,456,445 IATE terms across 24 EU languages with exact lookup and semantic search (BGE-M3, 1024 dimensions).

Exact lookup

GET /lookup?term=<term>&lang=<code>

Exact match against IATE terminology database.

/lookup parameters
ParameterTypeDescription
termstringTerm to look up (required)
langstringISO 639-1 language code (optional, searches all if omitted)
{
  "query": "data protection",
  "found": true,
  "count": 3,
  "results": [...],
  "source": "IATE",
  "stage": "LOOKUP"
}

Semantic search

POST /search

BGE-M3 embedding search via Vectorize. Returns semantically similar terms ranked by cosine similarity.

curl -X POST https://staging.pauhu.eu/search \
  -H "Content-Type: application/json" \
  -d '{"query": "personal data processing", "lang": "en", "limit": 10}'
{
  "query": "personal data processing",
  "count": 10,
  "results": [...],
  "source": "IATE Pauhu Search",
  "stage": "MODEL"
}

Statistics

GET /stats

Term counts by language.

{
  "total": 2456445,
  "languages": 24,
  "byLanguage": [{"lang": "en", "count": 312847}, ...],
  "source": "IATE",
  "reliability": "4-star"
}

TBX export (ISO 30042)

GET /export/tbx?lang=<code>&limit=100&offset=0

Export terminology in TermBase eXchange format. Returns application/x-tbx+xml.

TMX export

GET /export/tmx?source=<code>&target=<code>&limit=100&offset=0

Export translation pairs in Translation Memory eXchange format. Returns application/x-tmx+xml.

Batch export

GET /batch?lang=<code>&offset=0&limit=1000

Paginated export for embedding generation pipelines.

{
  "lang": "en",
  "offset": 0,
  "limit": 1000,
  "count": 1000,
  "hasMore": true,
  "nextOffset": 1000,
  "terms": [...],
  "embedding_model": "bge-m3-onnx"
}

Custom glossaries (tenant)

POST /tenant/upload

Upload a custom glossary (CSV or TBX). Terms are merged with IATE at lookup time, with tenant terms taking priority.

GET /tenant/lookup?tenant_id=<id>&source_lang=en&target_lang=fi&terms=data+protection

Merged lookup: tenant glossary first, then IATE fallback.

GET /tenant/list?tenant_id=<id>

List all glossaries for a tenant.

DELETE /tenant/glossary?glossary_id=<id>&tenant_id=<id>

Delete a tenant glossary.

Translation API LIVE

Helsinki-NLP OPUS-MT models at zero inference cost (browser ONNX). 1,440+ language pairs.

Translate text

POST /translate

6-stage translation cascade: KV cache → IATE terminology → rules engine → (reserved) → browser inference (OPUS-MT) → Vectorize semantic verification.

/translate parameters
ParameterTypeDescription
textstringText to translate (required)
source_langstringSource language (ISO 639-1)
target_langstringTarget language (required)
curl -X POST https://staging.pauhu.eu/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "General Data Protection Regulation", "source_lang": "en", "target_lang": "fi"}'
{
  "source_lang": "en",
  "target_lang": "fi",
  "text": "General Data Protection Regulation",
  "translation": "Yleinen tietosuoja-asetus",
  "model_id": "Helsinki-NLP/opus-mt-en-fi"
}

Full cascade (verbose)

POST /v1/complete

Returns all 6 cascade stages with timing and provenance for each step.

Batch segment translation

POST /v1/translate/segments

Translate an array of segments in a single request.

Supported languages

GET /languages

List all supported language codes and available pairs.

Fan-out semantic search across 20 indexes (BGE-M3, 1024 dimensions, cosine similarity). Powered by the Laine Algorithm.

Semantic search

GET /search?q=<query>&limit=20&domain=<eurovoc_id>

Search across all 20 data source Vectorize indexes simultaneously.

/search parameters
ParameterTypeDescription
qstringSearch query (required)
limitintegerMax results (default: 20)
domainstringEuroVoc domain filter (1-21)
{
  "query": "digital product passport ESPR",
  "limit": 20,
  "results": [
    {"product": "eurlex", "id": "32024R1781", "title": "...", "score": 0.89, "url": "..."},
    {"product": "commission", "id": "...", "title": "...", "score": 0.84, "url": "..."}
  ]
}

Instant answers

GET /instant?q=<query>

Knowledge panels from IATE, EUR-Lex, and Wikidata. Returns structured answer snippets.

Web proxy

GET /proxy?source=arxiv&q=<query>

CORS proxy for institutional search sources. Normalizes results into a common schema.

/proxy source types
SourceDescription
arxivarXiv academic papers
eurostatEurostat datasets
tedTED procurement notices

Cross-language siblings

GET /siblings?celex=<CELEX>

Find all language versions of a EUR-Lex document by CELEX number.

Semantic reranking

POST /rerank

Rerank a set of results using BGE-M3 cross-encoder scoring.

DLC packs (browser models)

GET /manifest.json

Signed manifest of downloadable model packs with Ed25519 signatures and SHA-256 checksums.

GET /core

Core DLC pack: ONNX models and terminology for browser-native inference.

GET /delta

Delta pack: incremental updates since last core download.

Annotation API LIVE

Classifies documents with topic annotations, deontic modalities, and language detection. Split across two service instances (A–E and E–W) for the 20 data sources.

Annotate document

POST /annotate

Full annotation pipeline: language detection, topic classification, deontic modality (obligation/prohibition/permission/exemption), word count, product-specific metadata.

curl -X POST https://annotate.pauhu.eu/annotate \
  -H "Content-Type: application/json" \
  -d '{"text": "Member States shall ensure...", "product": "eurlex"}'
{
  "original_path": "...",
  "organized_path": "...",
  "language": "en",
  "deontic": {"modality": "obligation", "confidence": 0.95},
  "product": "eurlex",
  "topic_domain": "law",
  "word_count": 847,
  "char_count": 5231
}

Batch annotate

POST /batch

Annotate up to 50 documents in a single request.

Deontic classification only

POST /classify

Lightweight endpoint: returns only deontic modality classification.

{
  "language": "en",
  "annotation": {
    "modality": "prohibition",
    "confidence": 0.92
  }
}

Deontic modalities

Deontic modalities
ModalityMeaningExample
ProhibitionAction is forbidden"Member States shall not permit..."
ObligationAction is required"Member States shall ensure..."
PermissionAction is allowed"Member States may designate..."
ExemptionNo requirement applies"This Regulation shall not apply to..."

Service metadata

GET /products

List all registered data source annotators and their product codes.

GET /stats

Annotation counts from R2 sidecar metadata, grouped by product.

GET /audit

Full provenance audit: total annotated documents, per-product breakdown, provenance tier distribution (NATIVE 1.0, PARSED 0.95, KEYWORD ≤0.9).

Indexing API LIVE

Hybrid semantic + BM25 search, alert monitoring, and document health checks across all 20 data sources. Split across two service instances (A–E and E–W).

Health check

GET /health

Binding smoke test. Returns status of R2, D1, Vectorize, and KV bindings for the worker’s product set.

{
  "service": "index",
  "status": "healthy",
  "bindings": {
    "R2_COMMISSION": "ok",
    "D1_COMMISSION": "ok",
    "RECIPE_ALERTS": "bound",
    "CF_AI_TOKEN": "set"
  },
  "timestamp": "2026-03-03T09:00:00Z"
}

Alerts

GET /alerts?recipe=<name>&severity=<level>&limit=20

Query stored regulatory alerts from KV. Filter by recipe name and severity level.

/alerts parameters
ParameterTypeDescription
recipestringRecipe name filter (optional, defaults to all)
severitystringSeverity filter: critical, high, medium, low (optional)
limitintegerMax results (default: 20)
{
  "count": 5,
  "filters": { "recipe": "*", "severity": "*" },
  "alerts": [...]
}

Hybrid query

GET /query?product=<PRODUCT>&q=<text>&lang=<code>&domain=<id>&limit=10

Hybrid semantic (70% BGE-M3) + BM25 keyword (30%) search within a single product index. Includes DSA Article 27 ranking transparency metadata.

/query parameters
ParameterTypeDescription
productstringProduct code, e.g. COMMISSION, EURLEX (required)
qstringSearch query (required)
langstringISO 639-1 language filter (optional)
domainstringEuroVoc domain ID (optional)
limitintegerMax results (default: 10)

Backfill ADMIN

POST /backfill?product=<PRODUCT>&limit=2000&prefix=en/

Admin-only endpoint. Index unprocessed R2 sidecars for a product. Useful after initial data seeding or to recover from indexing gaps. Requires infrastructure-level access (not available via public API keys).

/backfill parameters
ParameterTypeDescription
productstringProduct code, e.g. COMMISSION, EURLEX (required)
limitintegerMax documents to index in one batch (default: 2000)
prefixstringR2 key prefix filter, e.g. en/ for English documents only (optional)
{
  "product": "COMMISSION",
  "limit": 2000,
  "prefix": "en/",
  "indexed": 500,
  "errors": 3
}

Cross-language siblings

GET /siblings?celex=<CELEX>

Find all language versions of a EUR-Lex document. Returns available languages and document paths.

Ranking methodology

GET /ranking-methodology

DSA Article 27 ranking transparency. Returns algorithm weights (0.7 semantic, 0.3 keyword), manipulation resistance details, and update frequency. Cached for 24 hours.

Statistics

GET /stats

Document counts per product. The secondary index instance includes IATE term counts.

IATE deontic distribution

GET /iate/deontic?lang=<code>

Distribution of deontic modalities across IATE terminology entries.

IATE cross-lingual translation

GET /iate/translate?term=<term>&from=<code>

Look up a term in one language and retrieve translations across all 24 EU languages via IATE concept IDs.

{
  "concept_id": "C12345",
  "source_term": "tietosuoja",
  "source_language": "fi",
  "languages": 24,
  "translations": {
    "en": [{ "term": "data protection", "reliability": 4 }],
    "de": [{ "term": "Datenschutz", "reliability": 4 }]
  }
}

Model CDN LIVE

Serves ONNX models from EU storage (2,342 models). Supports HTTP range requests for large files.

GET /

Model manifest with all available models and SHA-256 checksums.

GET /models/{model_id}/*

Serve model files with Accept-Ranges: bytes for resumable downloads. CORS enabled for browser-native inference via ONNX Runtime Web / Transformers.js.

Model categories

Available model categories
CategoryModelsFormat
Translation1,440+ OPUS-MT pairsONNX
EmbeddingsBGE-M3 (1024d)ONNX
Speech-to-textWhisper variantsONNX
Text-to-speechTTS modelsONNX
ClassificationDistilBERT, domain classifiersONNX
CodeQwen2.5-Coder, Phi-3 MiniONNX

Data infrastructure

20 EU institutional data sources are ingested into per-product R2 buckets, annotated via queue-triggered workers, and indexed in Vectorize for semantic search. Each source has matching R2, Queue, D1, and Vectorize resources.

Data sources (20)

EUR-Lex EU law and regulations
CURIA Court of Justice case law
TED Public procurement
IATE Terminology (2.4M terms)
ECB European Central Bank
ECHA Chemical substances (REACH/CLP)
EMA Medicinal products
EPO Patent publications
Parliament Legislative proceedings
Eurostat Statistical indicators
Commission EU Commission documents
Consilium Council of the EU
CORDIS EU research projects
Data Europa EU Open Data Portal
DPP Digital Product Passports
OEIL Legislative Observatory
Publications EU Publications Office
National Law 28 EU member states + UK
Who is Who EU institutional directory
Wikidata EU entity knowledge base

Pipeline

Documents flow through: IngestionEU storage (with metadata) → Event notificationQueueAnnotation service (topic + deontic classification) → Sidecar JSON. Searchable via the Laine Algorithm across all 20 indexes.

National law databases (27 countries)

27 national law adapters with source database links. Connected to EUR-Lex Sector 7 (290,172 national transposition measures linking EU directives to national implementations).

AT Austria — RIS
BE Belgium — Belgisch Staatsblad
BG Bulgaria — lex.bg
CY Cyprus — CyLaw
CZ Czech Republic — e-Sbírka
DE Germany — Gesetze im Internet
DK Denmark — Retsinformation
EE Estonia — Riigi Teataja
ES Spain — BOE
FI Finland — Finlex
FR France — Légifrance
GB United Kingdom — legislation.gov.uk
GR Greece — Kodiko
HR Croatia — Narodne Novine
HU Hungary — Nemzeti Jogszabálytár
IE Ireland — Irish Statute Book
IT Italy — Normattiva
LT Lithuania — TAR
LU Luxembourg — Legilux
LV Latvia — Likumi.lv
MT Malta — Laws of Malta
NL Netherlands — Wetten.nl
PL Poland — ISAP
PT Portugal — DRE
RO Romania — Legislatie.just.ro
SE Sweden — Riksdagen
SI Slovenia — Uradni list
SK Slovakia — Slov-Lex

EuroVoc domains (21)

All annotations use the EU Publications Office EuroVoc thesaurus for domain classification:

04 Politics            08 Education & Comms     16 Environment
08 International        10 Business              17 Industry
10 EU Institutions      11 Agriculture           20 Energy
04 Economics            12 Law                   24 Production
20 Trade                14 Geography             28 Employment
24 Finance              16 Intl Organisations    32 Information
28 Social Affairs       20 Transport

EU AI Act transparency

All workers expose an Article 52 transparency endpoint:

GET /transparency
GET /.well-known/ai-plugin.json
{
  "ai_system": true,
  "provider": "Pauhu Ltd",
  "eu_ai_act_article": 52,
  "purpose": "...",
  "risk_category": "limited",
  "jurisdiction": "EU"
}

Access control

Pauhu uses entitlement-based access control, not volume-based rate limiting. Your seat tier determines what data you can access, not how many requests you can make.

Seat tiers

Seat tier entitlements
TierData accessAuth
TrialSearch, terminology, translation (3 req/day)IP-based (no key needed)
LiveRaw feeds from 20 sources + search + terminology + translation + rerankingAPI key
AnnotatedLive + annotated feeds (EuroVoc, deontic) + Pauhu AI platformAPI key
TrainingLive + bulk export for ML trainingAPI key

Burst protection

Paying seats have no daily request caps. Burst protection prevents abuse:

Trial tier: 3 requests/day (IP-based), plus burst protection.

Response headers

X-Pauhu-Tier: live
Retry-After: 1          (only if burst limit hit)

Trial tier also receives X-RateLimit-Limit: 50.

Guides

GuideDomainDescription
IATE API Referencepauhu.euFull reference for all 11 terminology endpoints: lookup, search, TBX/TMX export, custom glossaries
Recipe Catalogpauhu.eu6 pre-configured monitoring recipes with alert format specification
How We Protect Your Datapauhu.euZone isolation, EU data residency, encryption, access control, audit trails
GPU Extensionspauhu.eu6 GPU extension types (LLMs, video, image, real-time video, audio, 3D). Bring your own API keys.
Data Source Attributionspauhu.euLicenses, publishers, and modification notices for all 35 data sources
Getting Started (Recipe Wizard)pauhu.aiConfigure your EU regulatory feed in 3 steps
Benchmark Guidepauhu.aiInterpret browser inference benchmark results
Search Guidepauhu.comQuery syntax, filters, boolean operators, CELEX lookup, 20 product examples
Translation Quality Pipelinepauhu.com10-stage translation quality cascade
Getting Started (Containerverse)pauhu.devInstall and run the EU context container in 3 lines
Who is Who Privacy Noticepauhu.euGDPR privacy notice for personal data from the EU Who is Who directory
LDS Connector Deploymentpauhu.euDeploy and configure the Language Data Space connector
LDS Demo Runbookpauhu.euStep-by-step: login, Swagger, certificate upload, data publishing for lds.pauhu.eu
Data Catalogpauhu.eu24 data products: source institution, update frequency, record count, languages, license. Machine-readable YAML + /v1/search API reference.
Cross-Referencespauhu.euHow EUR-Lex, CURIA, OEIL, TED, and national law link together. Example API responses with linked documents.
Data Freshnesspauhu.euSync schedules per product, what “Last updated” means, data currency SLA.
Multilingual Searchpauhu.euCross-lingual semantic search with BGE-M3. Query in one language, find documents in another. 24 EU languages.
Data Pipelinepauhu.euFrom EU source to grounded answer: ingestion, STAM annotation, paragraph indexing, semantic search, FiD generation. Sovereign deployment data flow.
FiD Dual-Brain Architecturepauhu.euFusion-in-Decoder: how the retrieval brain (20 data sources) and generation brain (ONNX specialists) work together, cloud and sovereign modes
Compass Searchpauhu.comHow the Compass works: 3.2B endpoints, sync, annotation, Vectorize indexing, Laine Algorithm ranking. Developer guide for adding new data sources.
MCP Sovereign Modepauhu.devAPI reference for sovereign MCP server: 4 tools, air-gapped deployment, SHA-256 audit logging, model adapter patterns
AI Transparency (Art. 52)pauhu.euEU AI Act Art. 52 compliance: how Pauhu discloses AI involvement, system classification, user notification, training data sources
Pauhu for Governmentpauhu.euData sovereignty, GDPR Art. 25/32, NIS2, Traficom compliance, data residency guarantees, procurement compatibility
Testausopas (Anne)pauhu.eu5-step testing guide: open, login, search “julkinen hankinta”, translate, chat — with expected results
Pauhu julkishallinnolle (yleiskatsaus)pauhu.eu1-page overview for Anne Miettinen’s 40-org GovAI network: procurement, legal compliance, terminology, translation
API Quickstart (EN)pauhu.euEnglish API quickstart with curl examples, 20 data sources, eForms procurement, translation, security overview
eForms-kenttäopas (BT)pauhu.eueForms SDK 1.14 BT field reference for TED procurement data: 40+ Business Terms with Finnish descriptions, CPV codes, API response example
Government Procurement Trainingpauhu.eu6-module training guide for procurement officials: EU law search, TED notices, IATE terminology, cross-references, multilingual search, compliance checklists
Demo: Government Procurementpauhu.euStep-by-step walkthrough: search EUR-Lex, translate to Finnish, check national transposition, sovereign FiD deployment
Demo: eForms Procurement Searchpauhu.euSearch TED notices by BT fields, CPV codes, country comparison, monitoring recipes, CSV/JSON export
Demo: Pharmaceutical EMA Compliancepauhu.euEMA variation procedures, ECHA substance checks, SmPC translation, CURIA case law monitoring
MACC GuideallMicrosoft Azure Consumption Commitment — hot-swap Azure workloads to Pauhu at identical North Europe EUR rates
CRM API ReferenceinternalSales pipeline API: contacts, companies, deals, activities, tasks, email sequences, AI lead scoring
ChangelogallRelease notes: Document extraction integration, two-part tariff, 20 data feeds, browser-native inference
Getting Startedpauhu.eu7-section guide: signup, first search, filters, products, export, chat, next steps
Sovereign Brainpauhu.euHow Pauhu thinks: dual-hemisphere architecture, browser-native inference, grounded generation
Install Sovereign AIpauhu.eu8-container self-hosted deployment guide for air-gapped and on-premises environments
Chip-Agnostic Architecturepauhu.euWhy Pauhu runs on any device: ONNX Runtime, WebAssembly, ARM/x86, browser-native inference
Two-Path Pricingpauhu.euPauhu license + Azure pass-through pricing model explained
Onboarding Wizardpauhu.euStep-by-step account setup and configuration walkthrough
Guide vs. Encyclopediapauhu.euHow Pauhu differs from static reference databases: guided search vs. keyword lookup

Support

Technical: support@pauhu.eu

Sales: sales@pauhu.eu