Data Freshness
How often each data source is synced, what the timestamps mean, and what latency to expect.
How Sync Works
Pauhu runs automated sync jobs for all 24 data products. Each job polls the upstream institutional API or portal for new and updated documents, downloads them into EU storage, annotates them (topic classification, language detection, deontic modality), and indexes them for semantic search.
The pipeline has three stages:
- Sync: Fetch new/changed documents from the source institution
- Annotate: Classify with topic annotations and deontic modalities (queue-triggered, typically within seconds)
- Index: Update the semantic search index (runs every 5 minutes)
End-to-end latency from source publication to searchability is typically the sync interval plus 5–10 minutes for annotation and indexing.
Continuous Sync Every 15 minutes
| Product | Source | Schedule | Typical Latency |
|---|---|---|---|
| National Law (lex) | 28 national law portals | Every 15 minutes | 15–25 min |
National law is synced most frequently because transposition deadlines and national legislative changes are time-sensitive. The sync job rotates through all 28 country adapters on each run.
Frequent Sync Every 4–6 hours
| Product | Source | Schedule | Typical Latency |
|---|---|---|---|
| EUR-Lex | Publications Office of the EU | Every 4 hours (weekdays) | 4–4.5 h |
| OEIL | European Parliament | Every 4 hours | 4–4.5 h |
| Consilium | General Secretariat of the Council | Every 6 hours | 6–6.5 h |
| CORDIS | European Commission, DG Research | Every 6 hours | 6–6.5 h |
| data.europa.eu | Publications Office of the EU | Every 6 hours | 6–6.5 h |
| TED | Publications Office of the EU | Every 6 hours | 6–6.5 h |
| Commission | European Commission | Every 6 hours | 6–6.5 h |
| National Law sync | 28 national portals | Every 4 hours | 4–4.5 h |
EUR-Lex sync only runs on weekdays because the Publications Office rarely publishes on weekends. TED and other 6-hour products sync around the clock.
Daily Sync Once per day
| Product | Source | Schedule | Typical Latency |
|---|---|---|---|
| CURIA | Court of Justice of the EU | Daily (04:00 UTC) | < 24 h |
| DPP | European Commission (ESPR) | Daily (03:00 UTC) | < 24 h |
| ECB | European Central Bank | Daily (06:00 UTC) | < 24 h |
| EMA | European Medicines Agency | Daily (05:00 UTC) | < 24 h |
| EPO | European Patent Office | Daily (07:00 UTC) | < 24 h |
| European Parliament | European Parliament | Daily (03:00 UTC) | < 24 h |
| Publications | Publications Office of the EU | Daily (04:00 UTC) | < 24 h |
| Wiki | Wikimedia Foundation | Daily (04:00 UTC) | < 24 h |
Weekly Sync Once per week
| Product | Source | Schedule | Typical Latency |
|---|---|---|---|
| ECHA | European Chemicals Agency | Weekly (Sunday 00:00 UTC) | < 7 days |
| Eurostat | Eurostat | Weekly (Monday 05:00 UTC) | < 7 days |
| Who is Who | Publications Office of the EU | Weekly (Monday 02:00 UTC) | < 7 days |
These products publish infrequently, so weekly sync is sufficient. ECHA substances change only after formal regulatory decisions. Eurostat datasets update on fixed release calendars.
Search Index Updates
After documents are synced and annotated, the search index is updated every 5 minutes. The indexing job processes newly annotated documents across all products and updates the semantic search vectors.
| Stage | Frequency | Description |
|---|---|---|
| Sync | Per product (see above) | Fetch from source institution |
| Annotation | Queue-triggered | Topic + deontic classification (typically < 30 seconds) |
| D1 + Vectorize indexing | Every 5 minutes | Insert into database and semantic search index |
Understanding Timestamps
API responses include a last_updated field. This represents the time when the document was last synced from the source institution and indexed, not the time when the source institution published the document.
{
"id": "32024R1689",
"title": "Regulation (EU) 2024/1689 (AI Act)",
"date": "2024-07-12",
"last_updated": "2026-03-12T08:15:00Z",
"score": 0.96
}
date— The official publication or decision date from the source institutionlast_updated— When Pauhu last synced and verified this document
To check when a product was last synced, use the /v1/search/:product endpoint. The response header X-Pauhu-Last-Sync contains the ISO 8601 timestamp of the most recent successful sync run.
Freshness Guarantees
| Tier | Guarantee |
|---|---|
| Free (3 requests/day) | Best effort. No guaranteed sync latency. Data is typically within the published sync interval. |
| Paid tiers | Data will be updated within the published sync interval for each product. If a sync job fails, the previous data remains available and the job retries automatically. |
Sync failures are rare and typically caused by upstream source outages (e.g., EUR-Lex maintenance windows). Failed syncs retry automatically. No data is lost during temporary outages — the next successful sync picks up all missed updates.