Error Strategy

⚠️

This is a v2 design — not yet implemented. See Architecture v2 for context.

The problem

An agent framework has multiple failure modes with different severity. Treating all errors the same — either propagating everything or swallowing everything — leads to either fragile agents that crash on tool failures or silent agents that hide real problems.

Layered error handling


LLM failures     → fatal (bubble up as AgentError)
Tool failures    → wrapped as ToolExecution { success: false }
                   (LLM sees the error and adapts)
Memory failures  → logged and degraded (turn continues without memory)
Event failures   → silent (broadcast send errors are ignored)
Hook failures    → depends on hook.fail_open():
                   true  → log and continue
                   false → abort turn with HookRejected error

AgentError — what can fail a turn


pub enum AgentError {
    /// LLM call failed. No fallback.
    Llm(LlmError),
    /// A gate hook returned Abort.
    Aborted { hook: &'static str, reason: String },
    /// Input failed validation against core's schema.
    InputValidation(String),
    /// Output failed serialization.
    OutputSerialization(String),
    /// Catch-all for unexpected errors.
    Internal(Box<dyn std::error::Error + Send + Sync>),
}

What is NOT in AgentError:

No ToolError — tool errors go in ToolExecution.output
No MemoryError — memory failures are logged and ignored
No PolicyError — policy denial is tool-level, not turn-level
No EventError — event failures are silent

Tool errors — conversation data, not exceptions

Tool failures are wrapped as ToolExecution { success: false }. The LLM sees the error and can self-heal:


// Tool not found
ToolExecution {
    success: false,
    output: json!({"error": "Tool 'web.search2' not found. Available: web.search, http.fetch"})
}
// Permission denied
ToolExecution {
    success: false,
    output: json!({"error": "Permission denied: tool 'code.exec' requires 'process:spawn'"})
}
// Execution failure
ToolExecution {
    success: false,
    output: json!({"error": "HTTP request failed: connection refused"})
}

The LLM can retry with different arguments, fall back to a different tool, or explain the failure to the user. Propagating these as Rust Err would destroy this self-healing capability.

Memory errors — logged and degraded


// In AgentRuntime::run_turn():
let context = match self.memory.read(&query).await {
    Ok(entries) => entries,
    Err(e) => {
        log::warn!("Memory read failed: {e}");
        self.events.send(AgentEvent::Warning(format!("Memory unavailable: {e}")));
        vec![]  // continue without context
    }
};

An agent with no memory still works — it just has less context. Crashing because SQLite is locked is worse than proceeding with incomplete context.

Event errors — silent


pub fn send(&self, event: AgentEvent) {
    let _ = self.tx.send(event);
}

Events are observability. An agent without a UI should work identically to one with full instrumentation.

Hook errors — configurable

Each hook declares fail_open():

true (default): hook errors are logged but don't stop execution
false: hook errors abort the current operation

Error flow diagram


AgentRuntime::run_turn()
│
├── Hook: session start ──Abort──► AgentError::Aborted
│                         Continue ▼
├── Memory: read ─────────Err────► log + Warning + empty context
│                         Ok ▼
├── LLM: chat() ─────────Err────► AgentError::Llm (fatal)
│                         Ok ▼
├── Tool dispatch (per call)
│   ├── Hook: abort? ────Abort───► skip tool (None)
│   ├── Policy: deny? ──Deny────► ToolExecution { success: false }
│   ├── Execute: fail? ─Err─────► ToolExecution { success: false }
│   └── Execute: ok ────Ok──────► ToolExecution { success: true }
│
├── Memory: write ────────Err────► log + continue
├── Events: emit ─────────No rx──► silent drop
└── Return Ok(TurnOutput)
Only two paths lead to Err:
  1. LLM failure  → AgentError::Llm
  2. Hook abort   → AgentError::Aborted

Builder Pattern CLI Reference