The Sovereign Brain

One brain, two hemispheres. The left comprehends. The right synthesises. Together they answer questions without hallucinating — because every answer is grounded in 4.8 million EU documents.

1. How the brain is organised

The human brain has two hemispheres. The left hemisphere is analytical — it reads, classifies, and comprehends language. The right hemisphere is holistic — it synthesises information, recognises patterns, and generates coherent narrative from fragments.

Pauhu® follows the same structural design. This is not a metaphor. It is an engineering decision. We separated comprehension from synthesis because combining them in a single system is what causes hallucinations. When one AI model both retrieves and generates, it invents things. When two specialised subsystems work together — one that finds evidence and one that writes from that evidence — the output is grounded in fact.

  LEFT HEMISPHERE                         RIGHT HEMISPHERE
  Analytical comprehension                Holistic synthesis

  Reads your question              ───►   Reads the evidence
  Searches 4.8M documents                 Generates a grounded answer
  Classifies by topic and domain          Renders in 24 languages
  Applies regulatory rules                Presents in your browser
  Guards data integrity                   Produces the response

              ═══════════════════════
              ║   CORPUS CALLOSUM   ║
              ║ (the bridge between ║
              ║   comprehension     ║
              ║   and synthesis)    ║
              ═══════════════════════

  MEMORY       ◄── 4.8M documents + 2.4M terms (bilateral) ──►
  CLOCK        ◄── 23 automated sync jobs keep data fresh   ──►
  GATEWAY      ◄── single entry point, sensory relay        ──►

The thalamus — sensory relay

In the human brain, the thalamus is the gateway through which all sensory input passes before reaching the cortex. Nothing reaches the hemispheres without going through the thalamus first.

Pauhu has an equivalent: a single entry point that receives every query, validates it, and routes it to the correct hemisphere. The thalamus decides what the brain pays attention to. The gateway decides which data sources, language models, and domain specialists are relevant to your question — before either hemisphere does any work.

2. Left hemisphere — comprehension

The left hemisphere is where understanding happens. When you ask a question, this is the side that reads it, determines what you need, and finds the relevant evidence from nearly five million documents.

What it does

Comprehends your query — understands what you are asking, even across languages, and finds the most relevant passages in 26 milliseconds
Classifies by domain — automatically determines whether your query relates to law, environment, procurement, pharmaceuticals, patents, or any of 21 EU policy domains
Applies regulatory rules — determines what is prohibited, what is mandatory, what is permitted, and what is exempt under the relevant regulation
Guards data quality — validates sources, checks document integrity, and ensures that only verified institutional data reaches the synthesis side

Analogy

In the human brain, Wernicke's area comprehends language — it understands what words mean. The angular gyrus classifies sensory input into categories. Broca's area governs the rules of grammar and syntax. The amygdala guards against threats.

Pauhu's left hemisphere does the same with EU regulatory data: comprehend the question, classify the domain, apply the rules, protect the integrity.

3. Right hemisphere — synthesis

The right hemisphere takes the evidence found by the left hemisphere and produces the output you see. It reads multiple passages simultaneously and writes a single coherent answer — always grounded in the source documents, never fabricated.

What it does

Fuses multiple sources — reads 3 to 10 retrieved passages at once and generates an answer that draws on all of them, with citations
Renders in your browser — every response is displayed natively, with no plugins or extensions required
Speaks 24 languages — answers in any EU official language, with terminology validated against 2.4 million IATE entries
Drives the interface — search, chat, translation, and document analysis all come from this hemisphere

Analogy

In the human brain, the right temporal lobe recognises complex patterns and assembles fragments into a whole. The right parietal cortex handles spatial awareness — where things are on the page. Right prosody controls the tone and rhythm of speech.

Pauhu's right hemisphere does the same: fuse fragments into narrative, render the layout, and adapt the language to the audience.

4. The bridge between them

In the human brain, the corpus callosum is the thick bundle of nerve fibres connecting the two hemispheres. Without it, the left hand literally does not know what the right hand is doing.

Pauhu has the same structure: a dedicated conduit that carries evidence from the comprehension hemisphere to the synthesis hemisphere. This is not a single API call — it is a structured pipeline that ensures:

Only verified evidence crosses. The left hemisphere's quality checks must pass before data reaches the right hemisphere. No unverified claims, no unchecked sources.
Every crossing is auditable. Each passage that moves from comprehension to synthesis is logged with a SHA-256 integrity hash. Your compliance team can reconstruct exactly what evidence informed each answer.
The pipeline is directional. Evidence flows from left to right. The synthesis hemisphere cannot reach back into raw data — it can only work with what the comprehension hemisphere has validated and delivered.

Why this matters for accuracy: Large language models hallucinate because they generate text from statistical patterns rather than from verified evidence. By separating comprehension from synthesis and connecting them through a controlled bridge, Pauhu ensures that every generated answer is traceable to specific, verified EU institutional documents.

5. How it stays current

A brain needs to stay awake. EU institutions publish new legislation, court rulings, procurement notices, and regulatory updates continuously. Pauhu's "biological clock" consists of 23 automated synchronisation jobs that keep the data current:

Every 15 minutes — national legislation from 28 EU/EEA countries
Every 4 hours — EUR-Lex (1.7 million legal documents), OEIL (legislative tracking)
Every 6 hours — TED procurement notices, Commission and Council publications, research projects
Daily — court judgments, patent publications, medicines agency approvals, parliamentary proceedings
Weekly — statistical updates, chemical substance registrations, institutional directory changes

When new data arrives, it flows through the same two-hemisphere pipeline: the left hemisphere indexes, classifies, and validates it. Only then does it become available to the right hemisphere for synthesis. This means the system never serves an answer based on data it hasn't verified.

For sovereign deployments: The cloud version receives these updates automatically. In an on-premises installation, data updates are delivered periodically via secure transfer — typically monthly or quarterly. See Section 6.

6. The sovereign deployment

Everything described above — both hemispheres, the bridge, the memory, the biological clock — can run entirely on your hardware. This is the Sovereign Brain: the same system, the same models, the same data, with no cloud dependency and no data leaving your premises.

One sentence for the CTO

A Docker container with 4.8 million EU documents, 2.4 million terminology entries, and 21 domain-specialist AI models — two-hemisphere architecture on a single server, no internet connection required.

What changes in a sovereign deployment

Capability	Cloud (pauhu.eu)	Sovereign Brain
Architecture	Two hemispheres across EU servers	Same two hemispheres on your server
Data sources	20 EU institutional sources	Same 20 sources (snapshot included)
Documents	4,788,464	Same (delivered with container)
Languages	24 EU official languages	Same 24 languages
Terminology	2,456,445 IATE terms	Same (bundled locally)
AI models	21 domain specialists + chat	Same models (ONNX format)
Data leaves your network	Queries go to EU servers (Helsinki)	Never
Internet required	Yes	No (after deployment)
Data freshness	Real-time (23 automated sync jobs)	Snapshot at delivery; periodic updates via secure transfer
Hardware	Managed by Pauhu	Your server (16 GB RAM minimum)

Hardware requirements

Requirement	Minimum	Recommended
CPU	4 cores (x86_64 or ARM64)	8+ cores
RAM	16 GB	32 GB
Storage	100 GB SSD	250 GB NVMe
GPU	Not required (CPU inference)	NVIDIA GPU for faster inference
OS	Any Linux with Docker	Ubuntu 22.04 LTS
Network	None (air-gap compatible)	LAN only (no internet)

Delivery and installation

Three delivery methods: encrypted download via SFTP, physical media for classified environments, or push to your private container registry. Installation is a single command:

# Load and run the Sovereign Brain
docker load -i pauhu-sovereign-brain.tar.gz
docker run -d --name pauhu-sovereign --restart unless-stopped \
  -e PAUHU_SOVEREIGN=true -p 3000:3000 pauhu/sovereign-brain:latest

# Verify: search for EU AI Act
curl http://localhost:3000/v1/search?q=artificial+intelligence+regulation

No configuration files, no API keys, no cloud accounts, no database setup. The container includes both hemispheres, the bridge, and the complete data snapshot.

7. What data is included

The Sovereign Brain ships with a complete snapshot of all 20 EU institutional data sources — the same data that feeds the cloud version's left hemisphere:

Source	Documents	What it covers
EUR-Lex	1,667,952	EU legislation, case law, preparatory acts, international agreements
TED	1,602,496	Public procurement notices from all EU member states
National Law	256,303	National legislation from 28 countries (transposition tracking)
OEIL	203,632	Legislative Observatory — procedure files, committee reports
Consilium	199,565	Council of the EU documents, meeting outcomes
Publications Office	172,098	Official publications, EU bookshop
Who is Who	161,553	EU institutional directory (organisational charts)
Data Europa	160,584	EU Open Data Portal (datasets, metadata)
CURIA	144,026	Court of Justice of the EU (judgments, opinions)
Eurostat	130,410	Statistical tables, indicators
IATE	2,456,445 terms	Inter-Active Terminology for Europe (24 languages)
ECB	8,415	European Central Bank legal framework, opinions
CORDIS	8,934	EU research and innovation projects
EMA	5,236	European Medicines Agency (EPARs, product information)
EPO	4,987	European Patent Office (patent publications)
ECHA	494	Chemical substances (REACH, CLP, biocides)
DPP	259	Digital Product Passport requirements (ESPR)
Commission	194	European Commission press and decisions
Europarl	—	European Parliament plenary proceedings
Wiki	5,467	Curated EU entity knowledge base

Total: 4,788,464 documents plus 2,456,445 terminology entries in 24 languages. This is the memory that both hemispheres share — the hippocampus of the system.

8. What it guarantees

Zero outbound connections

Once deployed, the Sovereign Brain makes no network calls. It does not contact any external server, cloud API, or telemetry service. Verify this with your network monitoring tools.

Data stays on your hardware

All queries, search results, translations, and AI responses are processed locally. Both hemispheres run inside the same container on your server.

Full audit trail

Every operation — including every crossing from comprehension to synthesis — produces a SHA-256-signed audit record in a local database. Your compliance team can inspect the complete history.

Air-gap compatible

Works in classified environments and air-gapped networks. The container is delivered via physical media or secure file transfer. No internet required for installation or operation.

9. Supply chain sovereignty

Most AI systems depend on a chain of external providers: cloud compute, proprietary APIs, third-party model hosting, and centralised inference services. Remove any link in the chain and the system stops working. This is a single point of failure — or multiple single points of failure.

Your AI runs in your browser

No cloud provider. No government. No single point of failure. The models run in your browser or on your server. The data sits on your storage. The inference happens on your hardware. You own the entire chain from question to answer.

What supply chain sovereignty means

No API dependency: Pauhu does not call external AI APIs. The models are ONNX files that execute locally — in the browser via WebAssembly, or on the server via ONNX Runtime. If every cloud provider went offline simultaneously, Pauhu would still work.
No model hosting dependency: The models ship with the container or are downloaded once to the browser cache. No ongoing model-as-a-service subscription. No inference-per-token billing.
No data dependency: The 4.8 million EU documents are included. You do not need to query an external database. The data is yours, on your volume.
No vendor lock-in: ONNX is an open standard (ISO/IEC 17203). Docker containers are OCI-compliant. The REST API follows OpenAPI 3.1. Every component uses open formats.

The geopolitical dimension

Government agencies increasingly recognise that depending on foreign-controlled AI infrastructure creates a strategic vulnerability. Executive orders, sanctions, licensing changes, or corporate acquisitions can cut off access to critical AI services overnight. The Sovereign Brain eliminates this risk: the entire system — models, data, inference — is under your control, on your soil, subject to your laws.

10. Adaptive model loading

The Sovereign Brain adapts to the hardware it runs on. Not every deployment has a GPU server with 32 GB of RAM. A civil servant’s laptop, a ministry’s standard-issue workstation, a dedicated inference server — the same brain architecture works on all of them, at different performance levels.

Three tiers

Lite

< 4 GB memory

Search + embeddings only. BGE-M3 quantized (~80 MB). Paragraph retrieval in the browser. No generation.

Standard

4–16 GB memory

Search + FiD generation (~300 MB). Grounded answers with citations. Selected NMT translation pairs.

Full

> 16 GB memory

All models: search, FiD, 552 NMT pairs, 21 domain classifiers, NER, specialists. Complete capability.

Why 300 MB matters

The global semiconductor supply chain is under sustained pressure. Memory prices fluctuate, procurement cycles lengthen, and government IT budgets rarely include high-end GPU servers. Pauhu’s FiD generation model fits in 300 MB of DRAM — less than a typical browser tab. This is not a limitation; it is a design decision. A model that fits in commodity hardware is a model that every government agency can deploy without special procurement.

Progressive download

Models are loaded in priority order, not all at once:

Search models first — paragraph retrieval is available within seconds of start
FiD generation second — grounded answers become available in 10–30 seconds
Translation on demand — only the language pairs you use are loaded. Finnish-English loads on first Finnish query, not at startup

Browser-native advantage: In Lite and Standard tiers, all inference runs inside the browser via ONNX Runtime for WebAssembly. No server required. No GPU required. The user’s own device does the work — which means your IT department does not need to provision additional infrastructure.

11. Double Anti-Hallucination

Most AI systems rely on a single layer of defence against hallucination: either they check the output after generation, or they constrain the input. Pauhu uses both — simultaneously.

Layer 1: Passage-level grounding

The synthesis hemisphere (right brain) can only generate text from passages that the comprehension hemisphere (left brain) has retrieved and verified. If the evidence does not exist in the corpus, the answer cannot be generated. This is architectural — it is not a filter applied after the fact.

Layer 2: Neuron-level suppression (H-Neurons)

Inside the synthesis model itself, specific neurons that are associated with hallucinated output are identified during training and suppressed during inference. The model is physically prevented from activating the pathways that produce ungrounded text.

The result: every claim in a Pauhu answer traces back to a specific paragraph in a verified EU document. If the system cannot ground a statement, it says so — rather than inventing a plausible-sounding answer.

Why this matters for government: In legal and regulatory contexts, a confident but wrong answer is worse than no answer. Pauhu’s double anti-hallucination means your staff can trust the citations without manually verifying every response against the source documents.

12. For procurement officers

Why government agencies choose the Sovereign Brain

Data sovereignty: Your queries and results never leave your premises. No cloud processing, no data transfer to third countries, no dependency on foreign infrastructure.
Classified environments: Works in air-gapped networks, SCIFs, and restricted environments where internet access is not available or permitted.
GDPR Article 44: No third-country data transfers. All processing happens within your jurisdiction.
EU AI Act Article 53: Full training data transparency. Every model includes a published summary of its training data. See the AI transparency disclosure.
Grounded by design: The two-hemisphere architecture ensures every answer is traceable to verified EU institutional documents. The system cannot hallucinate because the synthesis hemisphere only works with evidence the comprehension hemisphere has validated.
No vendor lock-in: Standard Docker container, standard REST API, standard ONNX models. If you stop using Pauhu, your data and audit trail remain on your hardware.

Tender-ready specifications

For public procurement (CPV code 72000000 — IT services):

On-premises deployment with zero cloud dependency
EU-origin software (Pauhu Ltd, Helsinki, Finland, Y-tunnus 3425757-6)
All training data sourced from EU institutional open data
IEC 62443-3-3 zone-based security architecture
WCAG 2.1 AA compliant web interface
REST API with OpenAPI specification
Docker container (OCI-compliant)
ONNX model format (ISO/IEC 17203-compliant runtime)
24 EU official languages supported

Contract model

Item	What you get
Initial delivery	Sovereign Brain container with both hemispheres, all 4.8M documents, 2.4M terminology entries, 21 AI models, and translation models for 24 languages
Data updates	Monthly or quarterly data snapshots delivered via your preferred secure channel
Model updates	Updated ONNX models when improved versions are available (included in subscription)
Support	Helsinki-based technical support team. On-site deployment assistance available for EU government customers.
SLA	Custom SLAs available. Because the system runs on your hardware, uptime is under your control.

Contact for government sales

Email: sales@pauhu.eu
For a demo, see the government procurement walkthrough (10-step guide using Finnish government as an example).

13. Frequently asked questions

Why two hemispheres instead of one AI model?

Single-model systems generate text from statistical patterns. They can produce fluent, confident answers that are completely wrong. By separating comprehension from synthesis — the same way the human brain separates language comprehension (left hemisphere) from holistic synthesis (right hemisphere) — we ensure that the generation side can only work with evidence the comprehension side has verified. The result: grounded answers with citations, not plausible-sounding fabrications.

Does it really work offline?

Yes. After installation, you can disconnect the server from the network entirely. Both hemispheres, the bridge between them, and all 4.8 million documents run locally. We encourage you to verify this with your network monitoring tools.

How fresh is the data?

The cloud version receives continuous updates (every 15 minutes for some sources). The Sovereign Brain contains a snapshot at the time of delivery. Data updates are delivered periodically via secure transfer — typically monthly or quarterly. The update process is a single command.

What hardware do we need?

A standard server with 16 GB RAM and 100 GB storage. No GPU required — all models run on CPU. A GPU speeds up inference but is not necessary. See Section 6 for full requirements.

Can we run it in a VM?

Yes. Docker runs in any virtualisation environment: VMware, Hyper-V, KVM, or bare metal. The container has no hardware-specific dependencies.

Is the source code available?

The Sovereign Brain is provided as a container image. Source code review is available under NDA for government customers. Contact sales@pauhu.eu.

Can we integrate it with our existing systems?

The Sovereign Brain exposes a standard REST API. Any system that can make HTTP requests can use it. API documentation is included in the container.

What is the licensing model?

Annual subscription per deployment. Volume discounts available for multiple installations. Contact sales@pauhu.eu for pricing.

The Sovereign Brain

1. How the brain is organised

The thalamus — sensory relay

2. Left hemisphere — comprehension

What it does

Analogy

3. Right hemisphere — synthesis

What it does

Analogy

4. The bridge between them

5. How it stays current

6. The sovereign deployment

One sentence for the CTO

What changes in a sovereign deployment

Hardware requirements

Delivery and installation

7. What data is included

8. What it guarantees

9. Supply chain sovereignty

Your AI runs in your browser

What supply chain sovereignty means

The geopolitical dimension

10. Adaptive model loading

Three tiers

Why 300 MB matters

Progressive download

11. Double Anti-Hallucination

12. For procurement officers

Why government agencies choose the Sovereign Brain

Tender-ready specifications

Contract model

Contact for government sales

13. Frequently asked questions

Why two hemispheres instead of one AI model?

Does it really work offline?

How fresh is the data?

What hardware do we need?

Can we run it in a VM?

Is the source code available?

Can we integrate it with our existing systems?

What is the licensing model?

Related documentation