Memory System
Memory is what separates an agent from a chatbot. Without memory, every turn starts from scratch. With memory, the agent accumulates context, learns from past actions, and recalls relevant information.
In Jan Agent Framework, memory is a plugin — the same way tools are. Different agents need different kinds of memory, so the framework defines a trait and lets you plug in whichever backends fit your use case.
The MemoryPlugin trait
#[async_trait]pub trait MemoryPlugin: Plugin { /// Query memory. Returns entries ordered by relevance. async fn read(&self, query: &MemoryQuery) -> Result<Vec<MemoryEntry>, MemoryError>; /// Write an entry to memory. async fn write(&self, entry: &MemoryEntry) -> Result<MemoryId, MemoryError>; /// Remove entries matching the query. async fn forget(&self, query: &MemoryQuery) -> Result<u64, MemoryError>; /// Called before each agent turn — inject relevant context. async fn pre_turn_context(&self, input: &TurnInput) -> Result<Vec<MemoryEntry>, MemoryError> { Ok(vec![]) } /// Called after each agent turn — observe what happened. async fn post_turn_observe(&self, input: &TurnInput, output: &TurnOutput) -> Result<(), MemoryError> { Ok(()) }}
The key design: memory hooks into the agent loop at two points — before a turn (inject context) and after a turn (learn from what happened). This happens automatically. The agent core doesn't manage memory directly — the runtime orchestrates it.
How memory fits into the agent loop
User message arrives │ ▼memory.pre_turn_context() ← "What do I already know that's relevant?" │ │ Inject context into the conversation ▼AgentCore::run_turn() ← Reasoning + tool calls (unchanged) │ │ Turn completes ▼memory.post_turn_observe() ← "What did I just learn?" │ ▼Response to user
This is transparent to the core. A ReActCore written without any memory awareness still benefits from memory — the runtime handles injection and observation.
Built-in memory backends
Conversation memory (first citizen)
The simplest and most important memory: the conversation itself.
- Stores: Full conversation turns (user messages, assistant responses, tool calls)
- Persists: To disk via
jan-datathread storage (existing JSONL format) - Recalls: Recent conversation history, injected as context each turn
- Compacts: Summarizes old turns when the token budget is exceeded
{ "memory": { "backends": [ { "type": "conversation", "persist": true, "max_turns": 100, "compaction_threshold": 0.8 } ] }}
This wraps the existing thread persistence in jan-data. Current behavior (stateless per session) is preserved when persist: false or when no memory is configured.
Working memory (first citizen)
A short-term scratchpad for the current task.
- Stores: Key-value pairs — intermediate results, notes, partial computations
- Persists: No. Cleared when the session ends.
- Exposed as tools:
memory.writeandmemory.read— the LLM can explicitly decide what to remember
Agent: "I'll save these search results to working memory for later." → memory.write({ key: "search_results", value: [...] })Agent: "Let me check what I found earlier." → memory.read({ key: "search_results" })
This gives the agent explicit control over its own short-term memory during multi-step tasks.
Semantic memory (future)
Vector-indexed knowledge for RAG-style recall.
- Stores: Text chunks with embeddings
- Recalls: Semantically similar entries — "find memories related to this query"
- Storage: SQLite + embedded vector index (no external vector DB)
- Auto-embeds: New entries are embedded using the configured LLM or a local embedding model
Spatial memory (future — robot)
2D/3D spatial map for embodied agents.
- Stores: Observations at positions — "I saw a chair at (3.2, 1.5, 0.0)"
- Recalls: By region — "what's near my current position?"
- Decays: Old observations become less reliable over time
- Used by:
EmbodiedCorefor navigation and object tracking
Composite memory
Agents typically use multiple memory backends simultaneously. CompositeMemory combines them:
let memory = CompositeMemory::new() .add("conversation", ConversationMemory::new(thread_id)) .add("working", WorkingMemory::new());
When the agent reads memory, CompositeMemory queries all backends and merges results by relevance. When the agent writes, it routes to the appropriate backend.
Memory as a tool
Memory backends are automatically exposed to the LLM as callable tools:
| Tool | Description |
|---|---|
memory.read | Recall stored information relevant to a query |
memory.write | Store information for later recall |
This means the LLM can actively manage its own memory — deciding what's worth remembering and when to look something up. The automatic pre_turn_context / post_turn_observe hooks handle the passive side; the tools handle the active side.
Different agents, different memory
| Agent Type | Memory Stack |
|---|---|
| Chat assistant | Conversation + Working |
| Coding agent | Conversation + Working + Semantic (project docs) |
| Research agent | Conversation + Semantic (knowledge base) |
| Robot | Spatial + Episodic (past task outcomes) |
The framework doesn't prescribe which memory backends an agent should use. The config determines the stack, and CompositeMemory wires them together.
When no memory plugin is configured, the framework uses NullMemory — a no-op implementation that preserves current stateless behavior. Existing users see no change.