Skip to content
prod e051e98
Browse

Memory

Continuity

By default an LLM forgets everything between calls. Memory is what lets a Mastra agent remember a user — within a conversation and across them. It comes in three layers that stack on the same storage, and you turn on only what you need.

The three layers of agent memory

The three layers of agent memory Layer 1: Conversation history (recent turns) — lastMessages; Layer 2: Semantic recall (similar past messages) — topK, messageRange, vector; Layer 3: Working memory (persistent profile) — template, enabled; Layer 4: Storage + vector (the durable backbone) — LibSQL, Postgres Conversation history recent turns lastMessages Semantic recall similar past messages topK messageRange vector Working memory persistent profile template enabled Storage + vector the durable backbone LibSQL Postgres
Each layer reads from the same storage (and a vector index for recall). Conversation history is automatic; semantic recall and working memory are opt-in.
  1. Conversation history — the last N messages, included verbatim. Cheap, automatic, bounded by lastMessages.
  2. Semantic recall — older messages retrieved by similarity from a vector index, so the agent can surface a relevant exchange from days ago without stuffing the whole transcript into context.
  3. Working memory — a small, persistent profile the agent maintains (name, preferences, current goals) via a template it updates as it learns. This is what makes an agent feel like it knows you.

A Memory instance needs storage (for messages), and for semantic recall also a vector store and an embedder. Everything else is tuning under options.

import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { LibSQLStore, LibSQLVector } from '@mastra/libsql';
const memory = new Memory({
storage: new LibSQLStore({ id: 'mem-store', url: 'file:./memory.db' }),
vector: new LibSQLVector({ id: 'mem-vector', url: 'file:./vector.db' }),
embedder: 'openai/text-embedding-3-small',
options: {
lastMessages: 20, // conversation history
semanticRecall: { topK: 3, messageRange: { before: 2, after: 1 } },
workingMemory: { enabled: true }, // persistent profile
},
});
export const memoryAgent = new Agent({
id: 'memory-agent',
name: 'Memory Agent',
instructions: 'Remember what the user tells you about themselves and use it.',
model: 'openai/gpt-4o',
memory,
});

Give working memory a template and the agent fills it in over time — a structured profile beats free-form notes.

workingMemory: {
enabled: true,
template: `
# User Profile
- Name:
- Timezone:
- Preferences:
- Current goals:
`,
}

Memory is scoped per resource (usually a user) and thread (a conversation). Pass them when you call the agent so continuity stays isolated per user/conversation:

await memoryAgent.generate('What did I ask you to remember?', {
memory: { resource: 'user-123', thread: 'support-chat' },
});

Reference: Memory overview · Working memory · Semantic recall

Next: RAG — ground answers in your own documents.