Skip to content
prod e051e98
Browse

RAG

Ground answers

RAG (retrieval-augmented generation) grounds an agent’s answers in your documents. The shape is always the same pipeline: split documents into chunks, turn each chunk into an embedding, store those vectors, and at query time retrieve the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don’t wire it from scratch.

The RAG pipeline

The RAG pipeline 1. Chunk (MDocument) → 2. Embed (embedder model) → 3. Store (vector index) → 4. Retrieve (top-K by similarity) → 5. Augment (answer from context) Chunk MDocument Embed embedder model Store vector index Retrieve top-K by similarity Augment answer from context
Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context.

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

import { MDocument } from '@mastra/rag';
import { openai } from '@ai-sdk/openai';
import { embedMany } from 'ai';
const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: chunks.map((c) => c.text),
});
await pgVector.upsert({ indexName: 'docs', vectors: embeddings });

The cleanest path is a vector query tool — hand it to an agent and the model retrieves grounding context on its own.

import { createVectorQueryTool } from '@mastra/rag';
import { ModelRouterEmbeddingModel } from '@mastra/core/llm';
const vectorQueryTool = createVectorQueryTool({
vectorStoreName: 'pgVector',
indexName: 'docs',
model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});
// Give it to an agent like any other tool:
export const docsAgent = new Agent({
id: 'docs-agent',
name: 'Docs Agent',
instructions: 'Answer using retrieved context. Cite the source chunks.',
model: 'openai/gpt-4o',
tools: { vectorQueryTool },
});

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

StorePackageNotes
PgVector@mastra/pgPostgres + pgvector; pairs with PostgresStore for memory too.
LibSQL@mastra/libsqlLibSQLVector — zero-infra local/dev option.
Pinecone@mastra/pineconeManaged, serverless vector DB.
Qdrant@mastra/qdrantSelf-host or cloud.

Reference: RAG overview · Chunking & embedding · Retrieval

Next: Multi-agent systems — coordinate specialists under a supervisor.