Memory System

Memory is what separates an agent from a chatbot. Without memory, every turn starts from scratch. With memory, the agent accumulates context, learns from past actions, and recalls relevant information.

In Jan Agent Framework, memory is a plugin — the same way tools are. Different agents need different kinds of memory, so the framework defines a trait and lets you plug in whichever backends fit your use case.

The `MemoryPlugin` trait


#[async_trait]
pub trait MemoryPlugin: Plugin {
    /// Query memory. Returns entries ordered by relevance.
    async fn read(&self, query: &MemoryQuery) -> Result<Vec<MemoryEntry>, MemoryError>;
    /// Write an entry to memory.
    async fn write(&self, entry: &MemoryEntry) -> Result<MemoryId, MemoryError>;
    /// Remove entries matching the query.
    async fn forget(&self, query: &MemoryQuery) -> Result<u64, MemoryError>;
    /// Called before each agent turn — inject relevant context.
    async fn pre_turn_context(&self, input: &TurnInput) -> Result<Vec<MemoryEntry>, MemoryError> {
        Ok(vec![])
    }
    /// Called after each agent turn — observe what happened.
    async fn post_turn_observe(&self, input: &TurnInput, output: &TurnOutput) -> Result<(), MemoryError> {
        Ok(())
    }
}

The key design: memory hooks into the agent loop at two points — before a turn (inject context) and after a turn (learn from what happened). This happens automatically. The agent core doesn't manage memory directly — the runtime orchestrates it.

How memory fits into the agent loop


User message arrives
    │
    ▼
memory.pre_turn_context()       ← "What do I already know that's relevant?"
    │
    │  Inject context into the conversation
    ▼
AgentCore::run_turn()           ← Reasoning + tool calls (unchanged)
    │
    │  Turn completes
    ▼
memory.post_turn_observe()      ← "What did I just learn?"
    │
    ▼
Response to user

This is transparent to the core. A ReActCore written without any memory awareness still benefits from memory — the runtime handles injection and observation.

Built-in memory backends

Conversation memory (first citizen)

The simplest and most important memory: the conversation itself.

Stores: Full conversation turns (user messages, assistant responses, tool calls)
Persists: To disk via jan-data thread storage (existing JSONL format)
Recalls: Recent conversation history, injected as context each turn
Compacts: Summarizes old turns when the token budget is exceeded


{
  "memory": {
    "backends": [
      {
        "type": "conversation",
        "persist": true,
        "max_turns": 100,
        "compaction_threshold": 0.8
      }
    ]
  }
}

This wraps the existing thread persistence in jan-data. Current behavior (stateless per session) is preserved when persist: false or when no memory is configured.

Working memory (first citizen)

A short-term scratchpad for the current task.

Stores: Key-value pairs — intermediate results, notes, partial computations
Persists: No. Cleared when the session ends.
Exposed as tools: memory.write and memory.read — the LLM can explicitly decide what to remember


Agent: "I'll save these search results to working memory for later."
  → memory.write({ key: "search_results", value: [...] })
Agent: "Let me check what I found earlier."
  → memory.read({ key: "search_results" })

This gives the agent explicit control over its own short-term memory during multi-step tasks.

Semantic memory (future)

Vector-indexed knowledge for RAG-style recall.

Stores: Text chunks with embeddings
Recalls: Semantically similar entries — "find memories related to this query"
Storage: SQLite + embedded vector index (no external vector DB)
Auto-embeds: New entries are embedded using the configured LLM or a local embedding model

Spatial memory (future — robot)

2D/3D spatial map for embodied agents.

Stores: Observations at positions — "I saw a chair at (3.2, 1.5, 0.0)"
Recalls: By region — "what's near my current position?"
Decays: Old observations become less reliable over time
Used by: EmbodiedCore for navigation and object tracking

Composite memory

Agents typically use multiple memory backends simultaneously. CompositeMemory combines them:


let memory = CompositeMemory::new()
    .add("conversation", ConversationMemory::new(thread_id))
    .add("working", WorkingMemory::new());

When the agent reads memory, CompositeMemory queries all backends and merges results by relevance. When the agent writes, it routes to the appropriate backend.

Memory as a tool

Memory backends are automatically exposed to the LLM as callable tools:

Tool	Description
`memory.read`	Recall stored information relevant to a query
`memory.write`	Store information for later recall

This means the LLM can actively manage its own memory — deciding what's worth remembering and when to look something up. The automatic pre_turn_context / post_turn_observe hooks handle the passive side; the tools handle the active side.

Different agents, different memory

Agent Type	Memory Stack
Chat assistant	Conversation + Working
Coding agent	Conversation + Working + Semantic (project docs)
Research agent	Conversation + Semantic (knowledge base)
Robot	Spatial + Episodic (past task outcomes)

The framework doesn't prescribe which memory backends an agent should use. The config determines the stack, and CompositeMemory wires them together.

When no memory plugin is configured, the framework uses NullMemory — a no-op implementation that preserves current stateless behavior. Existing users see no change.

Plugins & Tools Runtime Policy