CUA Architecture

Computer Use Agent — screenshot, reason, act, verify. Every browser action passes through deontic safety gates with human-in-the-loop confirmation before form submission.

Overview

The Computer Use Agent (CUA) automates browser-based tasks on EU institutional portals — such as TED eProcurement, EUR-Lex, and national law databases — while maintaining strict safety controls. Unlike unguarded browser automation, every CUA action passes through deontic safety gates before execution. Destructive actions (form submissions, data entry) require explicit human approval.

The CUA operates in a continuous loop: it observes the current browser state via screenshot, reasons about the next action using an AI model, executes the action (subject to safety gates), and verifies the result. This loop continues until the task is complete or the user intervenes.

The CUA loop

Step-by-step

  1. Screenshot. The CUA captures a screenshot of the current browser viewport. This is the agent’s “eyes” — it has no DOM access, only pixel-level observation.
  2. Reason. An AI model analyses the screenshot together with the task description and action history. It determines the next action: what to click, what to type, where to scroll, or whether the task is complete.
  3. Deontic safety gate. Before execution, the proposed action is classified against the four deontic modalities (see below). Prohibited actions are blocked. Obligatory confirmations pause for human approval.
  4. Act. The action is executed in the browser: a click at specific coordinates, text entry into a field, scrolling, navigation, or form submission.
  5. Verify. A new screenshot is taken. The CUA compares the result against expected state. If the action failed or produced unexpected results, the loop can retry or escalate to the user.

Deontic safety gates

Every action passes through deontic safety gates before execution, following IEC 62443-3-3 zone-based security principles. The four modalities control what the CUA may, must, must not, and need not do.

Deontic modalities applied to CUA actions
ModalityGate behaviourCUA actions
M1 Prohibition Blocks the action entirely. The CUA cannot proceed. Submitting forms without human approval. Navigating to domains not on the allowlist. Entering credentials. Clicking “delete” or “remove” buttons.
M2 Obligation Pauses execution. Requires explicit human confirmation before the action proceeds. Form submission (human reviews all filled fields). Data entry into official portals. Any action that creates or modifies a record.
M3 Permission Action may proceed while the user session is active. No confirmation needed, but the action is logged. Clicking navigation links. Selecting dropdown values. Typing search queries. Scrolling.
M4 Exemption Read-only actions that require no gate check. Taking screenshots. Observing page state. Reading text. Waiting for page load.

Human-in-the-loop flow

When a CUA action triggers an M2 Obligation gate (typically form submission), the system pauses and presents a confirmation modal. The user sees:

The CUA does not resume until the user makes an explicit choice. This ensures no data is submitted to external portals without human review.

Action types

The CUA supports 7 browser action types. Each maps to a specific deontic gate level.

CUA action types and their default deontic classification
ActionDescriptionDefault gate
clickClick at specific (x, y) coordinates in the viewportM3 Permission
typeEnter text into the currently focused fieldM3 Permission
scrollScroll the page in a given direction and distanceM3 Permission
navigateNavigate to a URL (must be on the domain allowlist)M3 Permission
selectSelect an option from a dropdown or listM3 Permission
submitSubmit a form — always requires human confirmationM2 Obligation
waitWait for page load or element to appearM4 Exemption

Gate classification can be elevated by context. For example, a click on a “Submit” button is reclassified from M3 to M2. A navigate to a domain not on the allowlist is reclassified from M3 to M1 (blocked).

Action lifecycle

Each action progresses through a defined state machine with 7 possible statuses.

Action statuses
StatusMeaning
pendingAction proposed by the AI model, awaiting gate check
confirmedGate check passed (or human approved for M2 actions)
executingAction is being performed in the browser
completedAction finished successfully, verified by screenshot
failedAction execution failed (element not found, timeout, etc.)
cancelledUser or system cancelled the action before execution
rolled_backAction was reverted by rollback to a previous step

Architecture components

CuaActionOverlay

Visual action indicator Shows what the CUA is about to do: a highlight overlay positioned at the target coordinates (x, y) with confirm and cancel buttons. For click actions, a crosshair marks the exact pixel. For type actions, the overlay shows the text to be entered. The user can approve or reject any action directly from the overlay.

CuaSessionPanel

Session sidebar Displays the full action history for the current session: each step with its type, target, status, and timestamp. Includes a progress bar showing completion toward the task goal. Supports rollback — clicking any previous completed step offers to revert the session to that point.

CuaConfirmationModal

Human-in-the-loop gate (M2 Obligation) Triggered before form submission. Displays all fields the CUA has filled, their values, and the target form. Three buttons: Approve (submit as-is), Edit (modify values before submission), Cancel (abort). The CUA is fully paused until the user responds. No timeout — the modal stays open until explicit human action.

CuaReplayViewer

Session replay Step-through viewer for completed CUA sessions. Each step shows the screenshot captured at that point, the action taken, coordinates, and result. Navigate forward and backward through the entire session. Useful for auditing and debugging.

CuaDashboard

Portal statistics Aggregated view of CUA activity: active sessions, total action count, success rate per portal (e.g., TED eProcurement, EUR-Lex). Tracks how many actions required human confirmation and how many were auto-approved via M3/M4 gates.

SSE event stream

The CUA executor streams real-time events to the frontend via Server-Sent Events (SSE). The connection reconnects automatically with exponential backoff (maximum 5 attempts).

Event types

SSE event types
EventPayloadWhen
action_pendingAction type, target coordinates, descriptionAI model proposes a new action
action_executingAction ID, typeAction passes gate check and begins execution
action_completedAction ID, result, screenshot URLAction finishes successfully
action_failedAction ID, error message, screenshot URLAction execution fails
session_updateSession status, current step, progressSession state changes (pause, resume, complete)
form_confirmationForm fields, values, target URLM2 gate triggers — awaiting human approval
screenshotScreenshot data (base64 or URL)New viewport capture available

Connection example

const source = new EventSource('/v1/cua/stream?session=' + sessionId);

source.addEventListener('action_pending', (e) => {
  const action = JSON.parse(e.data);
  // Show CuaActionOverlay at action.coordinates
});

source.addEventListener('form_confirmation', (e) => {
  const form = JSON.parse(e.data);
  // Show CuaConfirmationModal with form.fields
});

source.addEventListener('action_completed', (e) => {
  const result = JSON.parse(e.data);
  // Update CuaSessionPanel with completed step
});

Session model

Each CUA session targets a specific portal and task. Sessions maintain full state across the action lifecycle.

Session properties

Session object fields
FieldTypeDescription
idstringUnique session identifier (UUID v4)
statusstringSession status: active, paused, completed, failed, cancelled
portalstringTarget portal (e.g., “TED eProcurement”, “EUR-Lex”)
taskDescriptionstringNatural-language description of the task to perform
actionsarrayOrdered list of all actions in the session
currentStepintegerIndex of the current action (0-based)
createdAtISO 8601Session creation timestamp
updatedAtISO 8601Last state change timestamp

Session example

{
  "id": "cua_sess_a1b2c3d4",
  "status": "active",
  "portal": "TED eProcurement",
  "taskDescription": "Search for procurement notices matching CPV 72000000 in Finland",
  "actions": [
    {
      "step": 0,
      "type": "navigate",
      "target": "https://ted.europa.eu/en/search",
      "status": "completed",
      "timestamp": "2026-03-10T14:00:01Z",
      "screenshot": "screenshots/step-0.png"
    },
    {
      "step": 1,
      "type": "type",
      "target": "input#search-query",
      "value": "CPV 72000000",
      "coordinates": { "x": 540, "y": 312 },
      "status": "completed",
      "timestamp": "2026-03-10T14:00:03Z",
      "screenshot": "screenshots/step-1.png"
    },
    {
      "step": 2,
      "type": "click",
      "target": "Country filter: Finland",
      "coordinates": { "x": 180, "y": 480 },
      "status": "pending",
      "timestamp": "2026-03-10T14:00:05Z"
    }
  ],
  "currentStep": 2,
  "createdAt": "2026-03-10T14:00:00Z",
  "updatedAt": "2026-03-10T14:00:05Z"
}

Pause and resume

Sessions can be paused at any time. When paused, the CUA stops proposing new actions but retains full state. Resuming continues from the current step. This is useful when the user needs to inspect intermediate results or perform manual steps.

Audit trail

Every CUA action is logged with full provenance. The audit trail is immutable — entries cannot be modified or deleted after creation.

Audit record fields

Fields captured for every CUA action
FieldDescription
Session IDLinks the action to its parent session
Step numberSequential position in the session (0-based)
Action typeOne of the 7 action types (click, type, scroll, navigate, select, submit, wait)
TargetHuman-readable description of the action target (e.g., “Search button”, “CPV code input”)
CoordinatesPixel coordinates (x, y) where the action was performed
ValueFor type and select actions: the text entered or option selected
Deontic gateWhich modality was applied (M1/M2/M3/M4) and whether it passed
Human approvalFor M2 actions: whether the user approved, edited, or cancelled
Screenshot (before)Screenshot of the viewport before the action was executed
Screenshot (after)Screenshot of the viewport after the action completed
TimestampISO 8601 timestamp with millisecond precision
DurationTime in milliseconds between action start and completion
StatusFinal status of the action (completed, failed, cancelled, rolled_back)
ErrorFor failed actions: error message and context

Audit records are stored in EU jurisdiction and retained per GDPR data retention policies. Screenshots are stored alongside action metadata for complete session reconstruction.

Rollback capability

The CUA supports rollback to any previously completed step within a session. When a rollback is triggered:

  1. All actions after the target step are marked as rolled_back
  2. The browser navigates back to the state captured in the target step’s screenshot
  3. The session’s currentStep is reset to the target step
  4. The CUA loop resumes from that point, proposing new actions based on the restored state

Rollback is non-destructive: rolled-back actions remain in the audit trail with their original timestamps and screenshots. The audit record shows that a rollback occurred, when it was triggered, and which step was the target.

Rollback limitations

Support

Technical: support@pauhu.eu

Sales: sales@pauhu.eu