Docs
Advanced Subsystems
Error Strategy

Error Strategy

⚠️

This is a v2 design — not yet implemented. See Architecture v2 for context.

The problem

An agent framework has multiple failure modes with different severity. Treating all errors the same — either propagating everything or swallowing everything — leads to either fragile agents that crash on tool failures or silent agents that hide real problems.

Layered error handling


LLM failures → fatal (bubble up as AgentError)
Tool failures → wrapped as ToolExecution { success: false }
(LLM sees the error and adapts)
Memory failures → logged and degraded (turn continues without memory)
Event failures → silent (broadcast send errors are ignored)
Hook failures → depends on hook.fail_open():
true → log and continue
false → abort turn with HookRejected error

AgentError — what can fail a turn


pub enum AgentError {
/// LLM call failed. No fallback.
Llm(LlmError),
/// A gate hook returned Abort.
Aborted { hook: &'static str, reason: String },
/// Input failed validation against core's schema.
InputValidation(String),
/// Output failed serialization.
OutputSerialization(String),
/// Catch-all for unexpected errors.
Internal(Box<dyn std::error::Error + Send + Sync>),
}

What is NOT in AgentError:

  • No ToolError — tool errors go in ToolExecution.output
  • No MemoryError — memory failures are logged and ignored
  • No PolicyError — policy denial is tool-level, not turn-level
  • No EventError — event failures are silent

Tool errors — conversation data, not exceptions

Tool failures are wrapped as ToolExecution { success: false }. The LLM sees the error and can self-heal:


// Tool not found
ToolExecution {
success: false,
output: json!({"error": "Tool 'web.search2' not found. Available: web.search, http.fetch"})
}
// Permission denied
ToolExecution {
success: false,
output: json!({"error": "Permission denied: tool 'code.exec' requires 'process:spawn'"})
}
// Execution failure
ToolExecution {
success: false,
output: json!({"error": "HTTP request failed: connection refused"})
}

The LLM can retry with different arguments, fall back to a different tool, or explain the failure to the user. Propagating these as Rust Err would destroy this self-healing capability.

Memory errors — logged and degraded


// In AgentRuntime::run_turn():
let context = match self.memory.read(&query).await {
Ok(entries) => entries,
Err(e) => {
log::warn!("Memory read failed: {e}");
self.events.send(AgentEvent::Warning(format!("Memory unavailable: {e}")));
vec![] // continue without context
}
};

An agent with no memory still works — it just has less context. Crashing because SQLite is locked is worse than proceeding with incomplete context.

Event errors — silent


pub fn send(&self, event: AgentEvent) {
let _ = self.tx.send(event);
}

Events are observability. An agent without a UI should work identically to one with full instrumentation.

Hook errors — configurable

Each hook declares fail_open():

  • true (default): hook errors are logged but don't stop execution
  • false: hook errors abort the current operation

Error flow diagram


AgentRuntime::run_turn()
├── Hook: session start ──Abort──► AgentError::Aborted
│ Continue ▼
├── Memory: read ─────────Err────► log + Warning + empty context
│ Ok ▼
├── LLM: chat() ─────────Err────► AgentError::Llm (fatal)
│ Ok ▼
├── Tool dispatch (per call)
│ ├── Hook: abort? ────Abort───► skip tool (None)
│ ├── Policy: deny? ──Deny────► ToolExecution { success: false }
│ ├── Execute: fail? ─Err─────► ToolExecution { success: false }
│ └── Execute: ok ────Ok──────► ToolExecution { success: true }
├── Memory: write ────────Err────► log + continue
├── Events: emit ─────────No rx──► silent drop
└── Return Ok(TurnOutput)
Only two paths lead to Err:
1. LLM failure → AgentError::Llm
2. Hook abort → AgentError::Aborted