Human-in-the-Loop

📋 At a Glance

Human-in-the-Loop (HITL) means a human is involved in the AI system's decision process. Not for every minor task — but for critical, irreversible or uncertain decisions. The EU AI Act makes human oversight mandatory for high-risk systems (Art. 14). But even without regulation, HITL is the difference between a useful tool and a liability trap.

Why Automated AI Decisions Are Dangerous

LLMs are impressively good at generating plausible answers. But "plausible" is not "correct." When an LLM automatically makes decisions — answering emails, approving invoices, modifying customer data — a single mistake can cause significant damage.

The three main risks of fully automated AI decisions:

Hallucinations in Action: The LLM invents a customer number and modifies the wrong record.
Irreversible Actions: A deleted file, a sent email, an approved payment cannot be undone.
Liability: Who is liable when an AI agent makes a wrong decision? Without documented human oversight: the company.

⚠️ Real-World Example

An AI agent automatically answers support tickets. A customer writes: "Please cancel my subscription." The agent cancels — but it was an enterprise contract with a 12-month term and cancellation period. Without human approval, that would be an expensive mistake.

Approval Workflows

An approval workflow interrupts the automatic flow and waits for human approval. The agent prepares the decision, but a human makes it.

Pattern	When to Use	Example
Pre-Approval	Before every critical action	Agent shows email draft, human clicks 'Send'
Batch Approval	Multiple decisions together	Agent collects 10 support responses, human reviews all at once
Exception-Only	Only for deviations from standard	Agent handles standard tickets itself, escalates only special cases
Time-Delayed	Delay before execution	Agent plans action, 30 min wait, auto-execute if no veto

ℹ️ Finding the Balance

Too many approvals make the agent useless — if every action needs approval, you might as well do it yourself. The art is finding the right thresholds: what can the agent handle alone, what needs approval?

Escalation Patterns

Escalation means the agent recognizes it cannot safely handle a situation and hands off to a human. This is not a failure — it's intelligent behavior.

Trigger	Description	Implementation
Low Confidence	Agent is uncertain about the right action	Confidence score < threshold → escalation
Repeated Failure	Agent has already failed at the same task type	Error counter per task type > 1 → escalation
Out of Scope	Request falls outside the agent's mandate	Topic classification → no match → escalation
High Impact	Action has potentially large consequences	Action classification: delete, payment, contract → escalation
Adversarial Input	Suspected manipulation or injection	Injection detection score > threshold → escalation

Escalation Logic (Pseudocode):

function shouldEscalate(task, confidence, context):
  // Rule 1: Low confidence
  if confidence < 0.7:
    return { escalate: true, reason: "Low confidence" }

  // Rule 2: Critical action
  if task.action in ["delete", "payment", "contract_change"]:
    return { escalate: true, reason: "High impact action" }

  // Rule 3: Repeated failure
  if getErrorCount(task.type, last_24h) > 1:
    return { escalate: true, reason: "Repeated failures" }

  // Rule 4: Injection suspected
  if injectionScore(context.userInput) > 0.8:
    return { escalate: true, reason: "Possible injection" }

  return { escalate: false }

Confidence Thresholds

Confidence thresholds define at what certainty level the agent may act autonomously. There are three zones:

Zone	Confidence	Behavior
Green (Autonomous)	> 0.85	Agent executes action, logs result
Yellow (Review)	0.6 - 0.85	Agent proposes action, waits for approval
Red (Escalation)	< 0.6	Agent stops, escalates to human with context

⚠️ LLM Confidence Is Unreliable

LLMs are notoriously poorly calibrated — an LLM can be 95% confident and still be wrong. Confidence scores should therefore never be the sole decision basis. Combine them with rule-based checks (e.g., "is this an irreversible action?") and historical error rates per task type.

Audit Trail & Logging

A complete audit trail documents every decision of the AI system — what was decided, why, and who approved it. This is not just best practice but mandatory for high-risk systems under the EU AI Act.

What Must Be Logged?

Audit Trail Entry:
{
  "timestamp": "2026-03-22T14:30:00Z",
  "agent_id": "support-agent-01",
  "task_type": "ticket_response",
  "input": "Customer asks about contract cancellation",
  "decision": "escalate_to_human",
  "confidence": 0.62,
  "reason": "High impact action (contract_change) + low confidence",
  "context_chunks": ["contract_123.pdf", "cancellation_terms.md"],
  "approved_by": "[email protected]",
  "approved_at": "2026-03-22T14:35:00Z",
  "final_action": "manual_response_sent",
  "retention_days": 365
}

Immutability: Logs must not be modified after the fact. Append-only storage.
Retention: Store in GDPR-compliant fashion. Delete personal data after defined periods.
Accessibility: Supervisory authorities must be able to inspect logs. Machine-readable format.

Practice: n8n Approval Workflow

A concrete approval workflow in n8n for a support agent:

n8n Approval Workflow:

1. Trigger: New support ticket (Webhook)

2. AI Agent Node (Ollama/OpenAI)
   → Analyzes ticket: category, urgency, solution proposal
   → Output: { category, urgency, confidence, draft_response }

3. Switch Node: Confidence Check
   → confidence > 0.85 AND category == "standard"
     → Send directly (with disclaimer "AI-generated")
   → confidence 0.6-0.85 OR category == "billing"
     → Approval request (continue to step 4)
   → confidence < 0.6 OR category == "legal"
     → Direct escalation (continue to step 5)

4. Approval Request
   → Team-Chat/Slack: Draft + context to support team
   → Wait Node: max 4 hours
   → Approved? → Send
   → Rejected? → Handle manually
   → Timeout? → Escalation

5. Escalation
   → Mark ticket as "human required"
   → Assign to next available agent
   → Attach AI analysis as context

6. Audit Log (at every exit)
   → Log decision, confidence, approval status

EU AI Act Art. 14: Human Oversight

Article 14 of the EU AI Act requires that high-risk AI systems be designed so they can be effectively overseen by natural persons. The core requirements:

Requirement	What Does It Mean?	Implementation
Understand	User must understand the system's capabilities and limitations	Documentation, training, confidence display
Monitor	User must be able to monitor the system during operation	Dashboard, alerts, real-time logging
Intervene	User must be able to intervene or stop at any time	Kill switch, override, pause button
Override	User must be able to ignore AI recommendation	Recommendation instead of automation, opt-out

ℹ️ Art. 14(4): Proportionate Measures

Oversight measures must be proportionate to the risk. A chatbot that answers opening hours needs less oversight than a system that makes credit decisions. The risk class determines the HITL level.

Diagramm wird geladen...

Das Wichtigste

✓Fully automated AI decisions are dangerous for critical, irreversible or uncertain actions.
✓Approval workflows: Pre-Approval, Batch, Exception-Only or Time-Delayed — depending on risk and frequency.
✓Escalation patterns: Low confidence, repeated failure, out of scope, high impact, adversarial input.
✓Three confidence zones: Green (autonomous, >0.85), Yellow (review, 0.6-0.85), Red (escalation, <0.6).
✓Audit trail is mandatory: Timestamp, decision, confidence, approver, final action — immutable and GDPR-compliant.
✓EU AI Act Art. 14: High-risk systems must be understandable, monitorable, interruptible and overridable.

Sources

Regulation (EU) 2024/1689 — EU AI Act — Article 14: Human Oversight
European Commission — AI Act Regulatory Framework — Overview of risk classes and obligations
n8n Documentation — Wait & Approval Nodes — Technical basis for approval workflows
EU AI Act — Wiki Article — Risk classes, prohibitions, transparency obligations
Safety Hooks Pattern — Guardrails and output validation
Self-Improving Agents — Self-escalation as HITL mechanism

Need help implementing HITL workflows?

We help with designing approval workflows and escalation patterns — using n8n, Team-Chat and EU AI Act compliance.

Request consultation