Docs
Advanced Subsystems
LLM Abstractions

LLM Abstractions

⚠️

This is a v2 design — not yet implemented. See Architecture v2 for context.

The problem

Jan needs an LlmProvider trait abstracting over inference backends — local LlamaCPP, remote OpenAI, Anthropic's native API. The trait must handle text chat, streaming, multimodal input, structured output, usage tracking, and tool calling.

LlmProvider trait

Two methods instead of five — tools and structured output go in ChatOptions:


#[async_trait]
pub trait LlmProvider: Send + Sync {
/// Primary chat method.
async fn chat(&self, messages: &[ChatMessage], options: &ChatOptions)
-> Result<Box<dyn LlmResponse>, LlmError>;
/// Streaming variant. Default: wraps chat() in single-item stream.
async fn chat_stream(&self, messages: &[ChatMessage], options: &ChatOptions)
-> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, LlmError>> + Send>>, LlmError>;
fn provider_name(&self) -> &str;
}
pub struct ChatOptions {
pub tools: Option<Vec<Tool>>,
pub structured_output: Option<StructuredOutputFormat>,
pub tool_choice: ToolChoice,
pub max_tokens: Option<u32>,
}

ChatMessage with builder


pub struct ChatMessage {
pub role: ChatRole,
pub content_type: MessageContent,
pub text: String,
}
pub enum MessageContent {
Text,
Image(ImageMime, Vec<u8>),
Pdf(Vec<u8>),
ImageUrl(String),
ToolUse(Vec<ToolCall>),
ToolResult(Vec<ToolCall>),
}
// Builder pattern for complex messages
let msg = ChatMessage::user()
.text("What's in this image?")
.image(ImageMime::Png, image_bytes)
.build();

LlmResponse trait

A trait (not struct) so providers can carry additional metadata:


pub trait LlmResponse: Debug + Send + Sync {
fn text(&self) -> Option<String>;
fn tool_calls(&self) -> Option<Vec<ToolCall>>;
fn thinking(&self) -> Option<String> { None }
fn usage(&self) -> Option<TokenUsage> { None }
}

Streaming protocol


pub enum StreamChunk {
Text(String),
ReasoningContent(String),
ToolCallStart { index: usize, call_id: String, name: String },
ToolCallDelta { index: usize, partial_json: String },
ToolCallComplete { index: usize, call: ToolCall },
Done { stop_reason: String },
Usage(TokenUsage),
}

Token usage

Handles naming differences between providers with serde aliases:


pub struct TokenUsage {
#[serde(alias = "input_tokens")]
pub prompt_tokens: u32,
#[serde(alias = "output_tokens")]
pub completion_tokens: u32,
pub total_tokens: u32,
pub reasoning_tokens: Option<u32>,
pub cached_tokens: Option<u32>,
}

Tool types


pub struct Tool {
pub tool_type: String, // "function"
pub function: FunctionDef,
}
pub struct ToolCall {
pub id: String,
pub call_type: String,
pub function: FunctionCall,
}
pub enum ToolChoice {
Auto, // model decides
Any, // must use at least one
Tool(String), // must use this specific tool
None, // tools disabled
}

Key design decisions

DecisionRationale
2 methods, not 5ChatOptions eliminates chat_with_tools(), chat_stream_struct() combinatorial explosion
Default streaming implProviders that don't support streaming get a free non-streaming fallback
LlmResponse as traitProviders can carry extra metadata without forcing it into a common struct
Chat only, no embeddingsStart simple. EmbeddingProvider can be a separate trait later
Serde aliases on TokenUsageinput_tokens (Anthropic) and prompt_tokens (OpenAI) deserialize to the same field