LLM Abstractions

⚠️

This is a v2 design — not yet implemented. See Architecture v2 for context.

The problem

Jan needs an LlmProvider trait abstracting over inference backends — local LlamaCPP, remote OpenAI, Anthropic's native API. The trait must handle text chat, streaming, multimodal input, structured output, usage tracking, and tool calling.

LlmProvider trait

Two methods instead of five — tools and structured output go in ChatOptions:


#[async_trait]
pub trait LlmProvider: Send + Sync {
    /// Primary chat method.
    async fn chat(&self, messages: &[ChatMessage], options: &ChatOptions)
        -> Result<Box<dyn LlmResponse>, LlmError>;
    /// Streaming variant. Default: wraps chat() in single-item stream.
    async fn chat_stream(&self, messages: &[ChatMessage], options: &ChatOptions)
        -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, LlmError>> + Send>>, LlmError>;
    fn provider_name(&self) -> &str;
}
pub struct ChatOptions {
    pub tools: Option<Vec<Tool>>,
    pub structured_output: Option<StructuredOutputFormat>,
    pub tool_choice: ToolChoice,
    pub max_tokens: Option<u32>,
}

ChatMessage with builder


pub struct ChatMessage {
    pub role: ChatRole,
    pub content_type: MessageContent,
    pub text: String,
}
pub enum MessageContent {
    Text,
    Image(ImageMime, Vec<u8>),
    Pdf(Vec<u8>),
    ImageUrl(String),
    ToolUse(Vec<ToolCall>),
    ToolResult(Vec<ToolCall>),
}
// Builder pattern for complex messages
let msg = ChatMessage::user()
    .text("What's in this image?")
    .image(ImageMime::Png, image_bytes)
    .build();

LlmResponse trait

A trait (not struct) so providers can carry additional metadata:


pub trait LlmResponse: Debug + Send + Sync {
    fn text(&self) -> Option<String>;
    fn tool_calls(&self) -> Option<Vec<ToolCall>>;
    fn thinking(&self) -> Option<String> { None }
    fn usage(&self) -> Option<TokenUsage> { None }
}

Streaming protocol


pub enum StreamChunk {
    Text(String),
    ReasoningContent(String),
    ToolCallStart { index: usize, call_id: String, name: String },
    ToolCallDelta { index: usize, partial_json: String },
    ToolCallComplete { index: usize, call: ToolCall },
    Done { stop_reason: String },
    Usage(TokenUsage),
}

Token usage

Handles naming differences between providers with serde aliases:


pub struct TokenUsage {
    #[serde(alias = "input_tokens")]
    pub prompt_tokens: u32,
    #[serde(alias = "output_tokens")]
    pub completion_tokens: u32,
    pub total_tokens: u32,
    pub reasoning_tokens: Option<u32>,
    pub cached_tokens: Option<u32>,
}

Tool types


pub struct Tool {
    pub tool_type: String,  // "function"
    pub function: FunctionDef,
}
pub struct ToolCall {
    pub id: String,
    pub call_type: String,
    pub function: FunctionCall,
}
pub enum ToolChoice {
    Auto,           // model decides
    Any,            // must use at least one
    Tool(String),   // must use this specific tool
    None,           // tools disabled
}

Key design decisions

Decision	Rationale
2 methods, not 5	`ChatOptions` eliminates `chat_with_tools()`, `chat_stream_struct()` combinatorial explosion
Default streaming impl	Providers that don't support streaming get a free non-streaming fallback
`LlmResponse` as trait	Providers can carry extra metadata without forcing it into a common struct
Chat only, no embeddings	Start simple. `EmbeddingProvider` can be a separate trait later
Serde aliases on TokenUsage	`input_tokens` (Anthropic) and `prompt_tokens` (OpenAI) deserialize to the same field

Event Protocol Builder Pattern