Error Strategy
This is a v2 design — not yet implemented. See Architecture v2 for context.
The problem
An agent framework has multiple failure modes with different severity. Treating all errors the same — either propagating everything or swallowing everything — leads to either fragile agents that crash on tool failures or silent agents that hide real problems.
Layered error handling
LLM failures → fatal (bubble up as AgentError)Tool failures → wrapped as ToolExecution { success: false } (LLM sees the error and adapts)Memory failures → logged and degraded (turn continues without memory)Event failures → silent (broadcast send errors are ignored)Hook failures → depends on hook.fail_open(): true → log and continue false → abort turn with HookRejected error
AgentError — what can fail a turn
pub enum AgentError { /// LLM call failed. No fallback. Llm(LlmError), /// A gate hook returned Abort. Aborted { hook: &'static str, reason: String }, /// Input failed validation against core's schema. InputValidation(String), /// Output failed serialization. OutputSerialization(String), /// Catch-all for unexpected errors. Internal(Box<dyn std::error::Error + Send + Sync>),}
What is NOT in AgentError:
- No
ToolError— tool errors go inToolExecution.output - No
MemoryError— memory failures are logged and ignored - No
PolicyError— policy denial is tool-level, not turn-level - No
EventError— event failures are silent
Tool errors — conversation data, not exceptions
Tool failures are wrapped as ToolExecution { success: false }. The LLM sees the error and can self-heal:
// Tool not foundToolExecution { success: false, output: json!({"error": "Tool 'web.search2' not found. Available: web.search, http.fetch"})}// Permission deniedToolExecution { success: false, output: json!({"error": "Permission denied: tool 'code.exec' requires 'process:spawn'"})}// Execution failureToolExecution { success: false, output: json!({"error": "HTTP request failed: connection refused"})}
The LLM can retry with different arguments, fall back to a different tool, or explain the failure to the user. Propagating these as Rust Err would destroy this self-healing capability.
Memory errors — logged and degraded
// In AgentRuntime::run_turn():let context = match self.memory.read(&query).await { Ok(entries) => entries, Err(e) => { log::warn!("Memory read failed: {e}"); self.events.send(AgentEvent::Warning(format!("Memory unavailable: {e}"))); vec![] // continue without context }};
An agent with no memory still works — it just has less context. Crashing because SQLite is locked is worse than proceeding with incomplete context.
Event errors — silent
pub fn send(&self, event: AgentEvent) { let _ = self.tx.send(event);}
Events are observability. An agent without a UI should work identically to one with full instrumentation.
Hook errors — configurable
Each hook declares fail_open():
true(default): hook errors are logged but don't stop executionfalse: hook errors abort the current operation
Error flow diagram
AgentRuntime::run_turn()│├── Hook: session start ──Abort──► AgentError::Aborted│ Continue ▼├── Memory: read ─────────Err────► log + Warning + empty context│ Ok ▼├── LLM: chat() ─────────Err────► AgentError::Llm (fatal)│ Ok ▼├── Tool dispatch (per call)│ ├── Hook: abort? ────Abort───► skip tool (None)│ ├── Policy: deny? ──Deny────► ToolExecution { success: false }│ ├── Execute: fail? ─Err─────► ToolExecution { success: false }│ └── Execute: ok ────Ok──────► ToolExecution { success: true }│├── Memory: write ────────Err────► log + continue├── Events: emit ─────────No rx──► silent drop└── Return Ok(TurnOutput)Only two paths lead to Err: 1. LLM failure → AgentError::Llm 2. Hook abort → AgentError::Aborted