Patterns
Human-in-the-Loop
Why fully automated AI decisions are dangerous and how to implement approval workflows, escalation patterns and audit trails. Including EU AI Act Art. 14 requirements.
Human-in-the-Loop (HITL) means a human is involved in the AI system's decision process. Not for every minor task — but for critical, irreversible or uncertain decisions. The EU AI Act makes human oversight mandatory for high-risk systems (Art. 14). But even without regulation, HITL is the difference between a useful tool and a liability trap.
Why Automated AI Decisions Are Dangerous
LLMs are impressively good at generating plausible answers. But "plausible" is not "correct." When an LLM automatically makes decisions — answering emails, approving invoices, modifying customer data — a single mistake can cause significant damage.
The three main risks of fully automated AI decisions:
- Hallucinations in Action: The LLM invents a customer number and modifies the wrong record.
- Irreversible Actions: A deleted file, a sent email, an approved payment cannot be undone.
- Liability: Who is liable when an AI agent makes a wrong decision? Without documented human oversight: the company.
An AI agent automatically answers support tickets. A customer writes: "Please cancel my subscription." The agent cancels — but it was an enterprise contract with a 12-month term and cancellation period. Without human approval, that would be an expensive mistake.
Approval Workflows
An approval workflow interrupts the automatic flow and waits for human approval. The agent prepares the decision, but a human makes it.
| Pattern | When to Use | Example |
|---|---|---|
| Pre-Approval | Before every critical action | Agent shows email draft, human clicks 'Send' |
| Batch Approval | Multiple decisions together | Agent collects 10 support responses, human reviews all at once |
| Exception-Only | Only for deviations from standard | Agent handles standard tickets itself, escalates only special cases |
| Time-Delayed | Delay before execution | Agent plans action, 30 min wait, auto-execute if no veto |
Too many approvals make the agent useless — if every action needs approval, you might as well do it yourself. The art is finding the right thresholds: what can the agent handle alone, what needs approval?
Escalation Patterns
Escalation means the agent recognizes it cannot safely handle a situation and hands off to a human. This is not a failure — it's intelligent behavior.
| Trigger | Description | Implementation |
|---|---|---|
| Low Confidence | Agent is uncertain about the right action | Confidence score < threshold → escalation |
| Repeated Failure | Agent has already failed at the same task type | Error counter per task type > 1 → escalation |
| Out of Scope | Request falls outside the agent's mandate | Topic classification → no match → escalation |
| High Impact | Action has potentially large consequences | Action classification: delete, payment, contract → escalation |
| Adversarial Input | Suspected manipulation or injection | Injection detection score > threshold → escalation |
Escalation Logic (Pseudocode):
function shouldEscalate(task, confidence, context):
// Rule 1: Low confidence
if confidence < 0.7:
return { escalate: true, reason: "Low confidence" }
// Rule 2: Critical action
if task.action in ["delete", "payment", "contract_change"]:
return { escalate: true, reason: "High impact action" }
// Rule 3: Repeated failure
if getErrorCount(task.type, last_24h) > 1:
return { escalate: true, reason: "Repeated failures" }
// Rule 4: Injection suspected
if injectionScore(context.userInput) > 0.8:
return { escalate: true, reason: "Possible injection" }
return { escalate: false }Confidence Thresholds
Confidence thresholds define at what certainty level the agent may act autonomously. There are three zones:
| Zone | Confidence | Behavior |
|---|---|---|
| Green (Autonomous) | > 0.85 | Agent executes action, logs result |
| Yellow (Review) | 0.6 - 0.85 | Agent proposes action, waits for approval |
| Red (Escalation) | < 0.6 | Agent stops, escalates to human with context |
LLMs are notoriously poorly calibrated — an LLM can be 95% confident and still be wrong. Confidence scores should therefore never be the sole decision basis. Combine them with rule-based checks (e.g., "is this an irreversible action?") and historical error rates per task type.
Audit Trail & Logging
A complete audit trail documents every decision of the AI system — what was decided, why, and who approved it. This is not just best practice but mandatory for high-risk systems under the EU AI Act.
What Must Be Logged?
Audit Trail Entry:
{
"timestamp": "2026-03-22T14:30:00Z",
"agent_id": "support-agent-01",
"task_type": "ticket_response",
"input": "Customer asks about contract cancellation",
"decision": "escalate_to_human",
"confidence": 0.62,
"reason": "High impact action (contract_change) + low confidence",
"context_chunks": ["contract_123.pdf", "cancellation_terms.md"],
"approved_by": "[email protected]",
"approved_at": "2026-03-22T14:35:00Z",
"final_action": "manual_response_sent",
"retention_days": 365
}- Immutability: Logs must not be modified after the fact. Append-only storage.
- Retention: Store in GDPR-compliant fashion. Delete personal data after defined periods.
- Accessibility: Supervisory authorities must be able to inspect logs. Machine-readable format.
Practice: n8n Approval Workflow
A concrete approval workflow in n8n for a support agent:
n8n Approval Workflow:
1. Trigger: New support ticket (Webhook)
2. AI Agent Node (Ollama/OpenAI)
→ Analyzes ticket: category, urgency, solution proposal
→ Output: { category, urgency, confidence, draft_response }
3. Switch Node: Confidence Check
→ confidence > 0.85 AND category == "standard"
→ Send directly (with disclaimer "AI-generated")
→ confidence 0.6-0.85 OR category == "billing"
→ Approval request (continue to step 4)
→ confidence < 0.6 OR category == "legal"
→ Direct escalation (continue to step 5)
4. Approval Request
→ Team-Chat/Slack: Draft + context to support team
→ Wait Node: max 4 hours
→ Approved? → Send
→ Rejected? → Handle manually
→ Timeout? → Escalation
5. Escalation
→ Mark ticket as "human required"
→ Assign to next available agent
→ Attach AI analysis as context
6. Audit Log (at every exit)
→ Log decision, confidence, approval statusEU AI Act Art. 14: Human Oversight
Article 14 of the EU AI Act requires that high-risk AI systems be designed so they can be effectively overseen by natural persons. The core requirements:
| Requirement | What Does It Mean? | Implementation |
|---|---|---|
| Understand | User must understand the system's capabilities and limitations | Documentation, training, confidence display |
| Monitor | User must be able to monitor the system during operation | Dashboard, alerts, real-time logging |
| Intervene | User must be able to intervene or stop at any time | Kill switch, override, pause button |
| Override | User must be able to ignore AI recommendation | Recommendation instead of automation, opt-out |
Oversight measures must be proportionate to the risk. A chatbot that answers opening hours needs less oversight than a system that makes credit decisions. The risk class determines the HITL level.
Das Wichtigste
- ✓Fully automated AI decisions are dangerous for critical, irreversible or uncertain actions.
- ✓Approval workflows: Pre-Approval, Batch, Exception-Only or Time-Delayed — depending on risk and frequency.
- ✓Escalation patterns: Low confidence, repeated failure, out of scope, high impact, adversarial input.
- ✓Three confidence zones: Green (autonomous, >0.85), Yellow (review, 0.6-0.85), Red (escalation, <0.6).
- ✓Audit trail is mandatory: Timestamp, decision, confidence, approver, final action — immutable and GDPR-compliant.
- ✓EU AI Act Art. 14: High-risk systems must be understandable, monitorable, interruptible and overridable.
Sources
- Regulation (EU) 2024/1689 — EU AI Act — Article 14: Human Oversight
- European Commission — AI Act Regulatory Framework — Overview of risk classes and obligations
- n8n Documentation — Wait & Approval Nodes — Technical basis for approval workflows
- EU AI Act — Wiki Article — Risk classes, prohibitions, transparency obligations
- Safety Hooks Pattern — Guardrails and output validation
- Self-Improving Agents — Self-escalation as HITL mechanism
Need help implementing HITL workflows?
We help with designing approval workflows and escalation patterns — using n8n, Team-Chat and EU AI Act compliance.
Request consultationNext step: move from knowledge to implementation
If you want more than theory: setups, workflows and templates from real operations for teams that want local, documented AI systems.
- Local and self-hosted by default
- Documented and auditable
- Built from our own runtime
- Made in Austria