LLM Abstractions
⚠️
This is a v2 design — not yet implemented. See Architecture v2 for context.
The problem
Jan needs an LlmProvider trait abstracting over inference backends — local LlamaCPP, remote OpenAI, Anthropic's native API. The trait must handle text chat, streaming, multimodal input, structured output, usage tracking, and tool calling.
LlmProvider trait
Two methods instead of five — tools and structured output go in ChatOptions:
#[async_trait]pub trait LlmProvider: Send + Sync { /// Primary chat method. async fn chat(&self, messages: &[ChatMessage], options: &ChatOptions) -> Result<Box<dyn LlmResponse>, LlmError>; /// Streaming variant. Default: wraps chat() in single-item stream. async fn chat_stream(&self, messages: &[ChatMessage], options: &ChatOptions) -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, LlmError>> + Send>>, LlmError>; fn provider_name(&self) -> &str;}pub struct ChatOptions { pub tools: Option<Vec<Tool>>, pub structured_output: Option<StructuredOutputFormat>, pub tool_choice: ToolChoice, pub max_tokens: Option<u32>,}
ChatMessage with builder
pub struct ChatMessage { pub role: ChatRole, pub content_type: MessageContent, pub text: String,}pub enum MessageContent { Text, Image(ImageMime, Vec<u8>), Pdf(Vec<u8>), ImageUrl(String), ToolUse(Vec<ToolCall>), ToolResult(Vec<ToolCall>),}// Builder pattern for complex messageslet msg = ChatMessage::user() .text("What's in this image?") .image(ImageMime::Png, image_bytes) .build();
LlmResponse trait
A trait (not struct) so providers can carry additional metadata:
pub trait LlmResponse: Debug + Send + Sync { fn text(&self) -> Option<String>; fn tool_calls(&self) -> Option<Vec<ToolCall>>; fn thinking(&self) -> Option<String> { None } fn usage(&self) -> Option<TokenUsage> { None }}
Streaming protocol
pub enum StreamChunk { Text(String), ReasoningContent(String), ToolCallStart { index: usize, call_id: String, name: String }, ToolCallDelta { index: usize, partial_json: String }, ToolCallComplete { index: usize, call: ToolCall }, Done { stop_reason: String }, Usage(TokenUsage),}
Token usage
Handles naming differences between providers with serde aliases:
pub struct TokenUsage { #[serde(alias = "input_tokens")] pub prompt_tokens: u32, #[serde(alias = "output_tokens")] pub completion_tokens: u32, pub total_tokens: u32, pub reasoning_tokens: Option<u32>, pub cached_tokens: Option<u32>,}
Tool types
pub struct Tool { pub tool_type: String, // "function" pub function: FunctionDef,}pub struct ToolCall { pub id: String, pub call_type: String, pub function: FunctionCall,}pub enum ToolChoice { Auto, // model decides Any, // must use at least one Tool(String), // must use this specific tool None, // tools disabled}
Key design decisions
| Decision | Rationale |
|---|---|
| 2 methods, not 5 | ChatOptions eliminates chat_with_tools(), chat_stream_struct() combinatorial explosion |
| Default streaming impl | Providers that don't support streaming get a free non-streaming fallback |
LlmResponse as trait | Providers can carry extra metadata without forcing it into a common struct |
| Chat only, no embeddings | Start simple. EmbeddingProvider can be a separate trait later |
| Serde aliases on TokenUsage | input_tokens (Anthropic) and prompt_tokens (OpenAI) deserialize to the same field |