Compass Integration
How search retrieval feeds grounded AI answers. Left hemisphere meets right.
Architecture overview
Pauhu® uses a two-hemisphere architecture inspired by human neuroanatomy. The left hemisphere retrieves facts; the right hemisphere generates grounded answers from them.
Left hemisphere — Retrieval
The hybrid search searches across 24 EU data sources using hybrid BM25 + semantic similarity. It returns ranked paragraphs with confidence scores in approximately 26 ms. Each paragraph carries full provenance metadata: source document, publication date, language, and institution.
Right hemisphere — Generation
FiD (Fusion-in-Decoder) reads the retrieved paragraphs and generates a grounded answer. The model runs entirely in the browser via ONNX Runtime, producing responses in approximately 3 seconds. No server-side inference is required for the browser-native path.
Bridge
Ranked paragraphs cross from retrieval to generation through a structured bridge. Every generated token traces back to a source paragraph. If the retrieval step returns no relevant sources, the generation step does not produce an answer — this is the core anti-hallucination guarantee.
Data flow
A query passes through the following steps from user input to grounded answer:
- Query received — User sends a query via
POST /v1/chat. - Intent classification — The query is classified by intent: search, translate, code, app, or chat.
- Left hemisphere search — Relevant products are searched based on domain scope (see domain scoping below).
- Semantic ranking — Top paragraphs are ranked by the hybrid search using cosine similarity via Born rule. Each paragraph receives a confidence score.
- Sources streamed — Paragraphs and metadata are streamed to the client as an SSE
sourcesevent. - Right hemisphere reads — The browser-side FiD model reads the retrieved paragraphs and fuses them into context.
- Grounded answer generated — The generated answer includes inline citations to source documents. Each citation links to the exact paragraph.
- No hallucination — If no relevant sources are found, no answer is generated. The system returns the source results only.
24 products searchable
Compass searches across 24 data products covering EU institutions, national law, terminology, and open knowledge:
| Product | Source institution |
|---|---|
commission | European Commission |
consilium | Council of the European Union |
cordis | Community Research and Development Information Service |
curia | Court of Justice of the European Union |
dataeuropa | data.europa.eu (European Data Portal) |
dpp | Digital Product Passport (ESPR) |
ecb | European Central Bank |
echa | European Chemicals Agency |
ema | European Medicines Agency |
epo | European Patent Office |
europarl | European Parliament |
eurlex | EUR-Lex (Official Journal of the EU) |
eurostat | Eurostat (Statistical Office) |
iate | Inter-Active Terminology for Europe |
lex | National legislation (27 EU member states) |
news | EU institutional press releases |
oeil | Legislative Observatory (European Parliament) |
osm | OpenStreetMap (geospatial) |
publications | EU Publications Office |
ted | Tenders Electronic Daily (public procurement) |
weather | Meteorological data |
whoiswho | EU institutional directory |
wiki | Wikipedia (multilingual) |
code | Open-source code (GitHub, npm, PyPI, crates.io) |
Domain scoping
Each Pauhu domain searches a different subset of products, tailored to its use case:
| Domain | Scope |
|---|---|
pauhu.ai / pauhu.eu / pauhu.com | All 24 products |
pauhu.dev | eurlex, iate, wiki, code |
pauhu.io | eurostat, echa, ema, dpp, dataeuropa, osm, weather |
Domain scoping is enforced server-side. A query on pauhu.dev will never return results from ted or europarl, for example.
Integration example
Stream a grounded answer with source paragraphs using the chat endpoint:
const response = await fetch('https://staging.pauhu.eu/v1/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'What are the GDPR fines for data breaches?',
language: 'en'
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Parse SSE events: sources, paragraphs, status, done
for (const line of text.split('\n')) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
console.log(event);
}
}
}
The SSE stream emits the following event types:
| Event | Description |
|---|---|
sources | Ranked source paragraphs with metadata and confidence scores |
paragraphs | Full paragraph text for each source |
status | Generation progress updates |
done | Final answer with inline citations |
Key properties
| Property | Description |
|---|---|
| Grounded | Every answer cites exact source paragraphs. No answer is generated without supporting evidence. |
| Verifiable | Click any citation to view the original document at its source institution. |
| Offline-capable | The FiD model is cached in the browser. Once loaded, generation works without an internet connection. |
| Zero inference cost | The browser-native path has no per-query server-side charges. Inference runs on the user's device. |
| 24 languages | Query in any EU official language. The hybrid search and FiD both support all 24. |
Support
Technical: support@pauhu.eu
API keys: Get an API key
Full documentation: Documentation index