Compass Integration

How search retrieval feeds grounded AI answers. Left hemisphere meets right.

Architecture overview

Pauhu® uses a two-hemisphere architecture inspired by human neuroanatomy. The left hemisphere retrieves facts; the right hemisphere generates grounded answers from them.

Left hemisphere — Retrieval

The hybrid search searches across 24 EU data sources using hybrid BM25 + semantic similarity. It returns ranked paragraphs with confidence scores in approximately 26 ms. Each paragraph carries full provenance metadata: source document, publication date, language, and institution.

Right hemisphere — Generation

FiD (Fusion-in-Decoder) reads the retrieved paragraphs and generates a grounded answer. The model runs entirely in the browser via ONNX Runtime, producing responses in approximately 3 seconds. No server-side inference is required for the browser-native path.

Bridge

Ranked paragraphs cross from retrieval to generation through a structured bridge. Every generated token traces back to a source paragraph. If the retrieval step returns no relevant sources, the generation step does not produce an answer — this is the core anti-hallucination guarantee.

Data flow

A query passes through the following steps from user input to grounded answer:

Query received — User sends a query via POST /v1/chat.
Intent classification — The query is classified by intent: search, translate, code, app, or chat.
Left hemisphere search — Relevant products are searched based on domain scope (see domain scoping below).
Semantic ranking — Top paragraphs are ranked by the hybrid search using cosine similarity via Born rule. Each paragraph receives a confidence score.
Sources streamed — Paragraphs and metadata are streamed to the client as an SSE sources event.
Right hemisphere reads — The browser-side FiD model reads the retrieved paragraphs and fuses them into context.
Grounded answer generated — The generated answer includes inline citations to source documents. Each citation links to the exact paragraph.
No hallucination — If no relevant sources are found, no answer is generated. The system returns the source results only.

24 products searchable

Compass searches across 24 data products covering EU institutions, national law, terminology, and open knowledge:

Product	Source institution
`commission`	European Commission
`consilium`	Council of the European Union
`cordis`	Community Research and Development Information Service
`curia`	Court of Justice of the European Union
`dataeuropa`	data.europa.eu (European Data Portal)
`dpp`	Digital Product Passport (ESPR)
`ecb`	European Central Bank
`echa`	European Chemicals Agency
`ema`	European Medicines Agency
`epo`	European Patent Office
`europarl`	European Parliament
`eurlex`	EUR-Lex (Official Journal of the EU)
`eurostat`	Eurostat (Statistical Office)
`iate`	Inter-Active Terminology for Europe
`lex`	National legislation (27 EU member states)
`news`	EU institutional press releases
`oeil`	Legislative Observatory (European Parliament)
`osm`	OpenStreetMap (geospatial)
`publications`	EU Publications Office
`ted`	Tenders Electronic Daily (public procurement)
`weather`	Meteorological data
`whoiswho`	EU institutional directory
`wiki`	Wikipedia (multilingual)
`code`	Open-source code (GitHub, npm, PyPI, crates.io)

Domain scoping

Each Pauhu domain searches a different subset of products, tailored to its use case:

Domain	Scope
`pauhu.ai` / `pauhu.eu` / `pauhu.com`	All 24 products
`pauhu.dev`	`eurlex`, `iate`, `wiki`, `code`
`pauhu.io`	`eurostat`, `echa`, `ema`, `dpp`, `dataeuropa`, `osm`, `weather`

Domain scoping is enforced server-side. A query on pauhu.dev will never return results from ted or europarl, for example.

Integration example

Stream a grounded answer with source paragraphs using the chat endpoint:

const response = await fetch('https://staging.pauhu.eu/v1/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'What are the GDPR fines for data breaches?',
    language: 'en'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  // Parse SSE events: sources, paragraphs, status, done
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      console.log(event);
    }
  }
}

The SSE stream emits the following event types:

Event	Description
`sources`	Ranked source paragraphs with metadata and confidence scores
`paragraphs`	Full paragraph text for each source
`status`	Generation progress updates
`done`	Final answer with inline citations

Key properties

Property	Description
Grounded	Every answer cites exact source paragraphs. No answer is generated without supporting evidence.
Verifiable	Click any citation to view the original document at its source institution.
Offline-capable	The FiD model is cached in the browser. Once loaded, generation works without an internet connection.
Zero inference cost	The browser-native path has no per-query server-side charges. Inference runs on the user's device.
24 languages	Query in any EU official language. The hybrid search and FiD both support all 24.

Support

Technical: support@pauhu.eu

API keys: Get an API key

Full documentation: Documentation index