Zum Inhalt springen
>_<
AI EngineeringWiki

Patterns

Human-in-the-Loop

Why fully automated AI decisions are dangerous and how to implement approval workflows, escalation patterns and audit trails. Including EU AI Act Art. 14 requirements.

Reading time: 13 minLast updated: March 2026
📋 At a Glance

Human-in-the-Loop (HITL) means a human is involved in the AI system's decision process. Not for every minor task — but for critical, irreversible or uncertain decisions. The EU AI Act makes human oversight mandatory for high-risk systems (Art. 14). But even without regulation, HITL is the difference between a useful tool and a liability trap.

Why Automated AI Decisions Are Dangerous

LLMs are impressively good at generating plausible answers. But "plausible" is not "correct." When an LLM automatically makes decisions — answering emails, approving invoices, modifying customer data — a single mistake can cause significant damage.

The three main risks of fully automated AI decisions:

  • Hallucinations in Action: The LLM invents a customer number and modifies the wrong record.
  • Irreversible Actions: A deleted file, a sent email, an approved payment cannot be undone.
  • Liability: Who is liable when an AI agent makes a wrong decision? Without documented human oversight: the company.
⚠️ Real-World Example

An AI agent automatically answers support tickets. A customer writes: "Please cancel my subscription." The agent cancels — but it was an enterprise contract with a 12-month term and cancellation period. Without human approval, that would be an expensive mistake.

Approval Workflows

An approval workflow interrupts the automatic flow and waits for human approval. The agent prepares the decision, but a human makes it.

PatternWhen to UseExample
Pre-ApprovalBefore every critical actionAgent shows email draft, human clicks 'Send'
Batch ApprovalMultiple decisions togetherAgent collects 10 support responses, human reviews all at once
Exception-OnlyOnly for deviations from standardAgent handles standard tickets itself, escalates only special cases
Time-DelayedDelay before executionAgent plans action, 30 min wait, auto-execute if no veto
ℹ️ Finding the Balance

Too many approvals make the agent useless — if every action needs approval, you might as well do it yourself. The art is finding the right thresholds: what can the agent handle alone, what needs approval?

Escalation Patterns

Escalation means the agent recognizes it cannot safely handle a situation and hands off to a human. This is not a failure — it's intelligent behavior.

TriggerDescriptionImplementation
Low ConfidenceAgent is uncertain about the right actionConfidence score < threshold → escalation
Repeated FailureAgent has already failed at the same task typeError counter per task type > 1 → escalation
Out of ScopeRequest falls outside the agent's mandateTopic classification → no match → escalation
High ImpactAction has potentially large consequencesAction classification: delete, payment, contract → escalation
Adversarial InputSuspected manipulation or injectionInjection detection score > threshold → escalation
Escalation Logic (Pseudocode):

function shouldEscalate(task, confidence, context):
  // Rule 1: Low confidence
  if confidence < 0.7:
    return { escalate: true, reason: "Low confidence" }

  // Rule 2: Critical action
  if task.action in ["delete", "payment", "contract_change"]:
    return { escalate: true, reason: "High impact action" }

  // Rule 3: Repeated failure
  if getErrorCount(task.type, last_24h) > 1:
    return { escalate: true, reason: "Repeated failures" }

  // Rule 4: Injection suspected
  if injectionScore(context.userInput) > 0.8:
    return { escalate: true, reason: "Possible injection" }

  return { escalate: false }

Confidence Thresholds

Confidence thresholds define at what certainty level the agent may act autonomously. There are three zones:

ZoneConfidenceBehavior
Green (Autonomous)> 0.85Agent executes action, logs result
Yellow (Review)0.6 - 0.85Agent proposes action, waits for approval
Red (Escalation)< 0.6Agent stops, escalates to human with context
⚠️ LLM Confidence Is Unreliable

LLMs are notoriously poorly calibrated — an LLM can be 95% confident and still be wrong. Confidence scores should therefore never be the sole decision basis. Combine them with rule-based checks (e.g., "is this an irreversible action?") and historical error rates per task type.

Audit Trail & Logging

A complete audit trail documents every decision of the AI system — what was decided, why, and who approved it. This is not just best practice but mandatory for high-risk systems under the EU AI Act.

What Must Be Logged?

Audit Trail Entry:
{
  "timestamp": "2026-03-22T14:30:00Z",
  "agent_id": "support-agent-01",
  "task_type": "ticket_response",
  "input": "Customer asks about contract cancellation",
  "decision": "escalate_to_human",
  "confidence": 0.62,
  "reason": "High impact action (contract_change) + low confidence",
  "context_chunks": ["contract_123.pdf", "cancellation_terms.md"],
  "approved_by": "[email protected]",
  "approved_at": "2026-03-22T14:35:00Z",
  "final_action": "manual_response_sent",
  "retention_days": 365
}
  • Immutability: Logs must not be modified after the fact. Append-only storage.
  • Retention: Store in GDPR-compliant fashion. Delete personal data after defined periods.
  • Accessibility: Supervisory authorities must be able to inspect logs. Machine-readable format.

Practice: n8n Approval Workflow

A concrete approval workflow in n8n for a support agent:

n8n Approval Workflow:

1. Trigger: New support ticket (Webhook)

2. AI Agent Node (Ollama/OpenAI)
   → Analyzes ticket: category, urgency, solution proposal
   → Output: { category, urgency, confidence, draft_response }

3. Switch Node: Confidence Check
   → confidence > 0.85 AND category == "standard"
     → Send directly (with disclaimer "AI-generated")
   → confidence 0.6-0.85 OR category == "billing"
     → Approval request (continue to step 4)
   → confidence < 0.6 OR category == "legal"
     → Direct escalation (continue to step 5)

4. Approval Request
   → Team-Chat/Slack: Draft + context to support team
   → Wait Node: max 4 hours
   → Approved? → Send
   → Rejected? → Handle manually
   → Timeout? → Escalation

5. Escalation
   → Mark ticket as "human required"
   → Assign to next available agent
   → Attach AI analysis as context

6. Audit Log (at every exit)
   → Log decision, confidence, approval status

EU AI Act Art. 14: Human Oversight

Article 14 of the EU AI Act requires that high-risk AI systems be designed so they can be effectively overseen by natural persons. The core requirements:

RequirementWhat Does It Mean?Implementation
UnderstandUser must understand the system's capabilities and limitationsDocumentation, training, confidence display
MonitorUser must be able to monitor the system during operationDashboard, alerts, real-time logging
InterveneUser must be able to intervene or stop at any timeKill switch, override, pause button
OverrideUser must be able to ignore AI recommendationRecommendation instead of automation, opt-out
ℹ️ Art. 14(4): Proportionate Measures

Oversight measures must be proportionate to the risk. A chatbot that answers opening hours needs less oversight than a system that makes credit decisions. The risk class determines the HITL level.

Diagramm wird geladen...

Das Wichtigste

  • Fully automated AI decisions are dangerous for critical, irreversible or uncertain actions.
  • Approval workflows: Pre-Approval, Batch, Exception-Only or Time-Delayed — depending on risk and frequency.
  • Escalation patterns: Low confidence, repeated failure, out of scope, high impact, adversarial input.
  • Three confidence zones: Green (autonomous, >0.85), Yellow (review, 0.6-0.85), Red (escalation, <0.6).
  • Audit trail is mandatory: Timestamp, decision, confidence, approver, final action — immutable and GDPR-compliant.
  • EU AI Act Art. 14: High-risk systems must be understandable, monitorable, interruptible and overridable.

Sources

Need help implementing HITL workflows?

We help with designing approval workflows and escalation patterns — using n8n, Team-Chat and EU AI Act compliance.

Request consultation

Next step: move from knowledge to implementation

If you want more than theory: setups, workflows and templates from real operations for teams that want local, documented AI systems.

Why AI Engineering
  • Local and self-hosted by default
  • Documented and auditable
  • Built from our own runtime
  • Made in Austria
Not legal advice.