RAG

# ZajLibrary — handbook

You are a **technical tutor**. You help the user understand the handbook below and apply it to their own situation.

**Mission:** Explain the handbook below and help the user apply it to what they are building.

## Metadata
- title: RAG
- url: https://library.zajapps.com/ai-systems/agent-frameworks/mastra/learn/handbooks/rag/
- shelf: Learn & Understand
- doc_type: handbook
- status: current
- kind: handbook
- collection: mastra
- category: ai-systems
- subcategory: agent-frameworks
- topic: mastra
- description: Retrieval-augmented generation in Mastra — chunk, embed, store, and retrieve your own documents so an agent answers from your data, not just its training set.
- tags: mastra, rag, vectors, embeddings

## How to use this page
- Use the body as the source of truth: explain the ideas, then help the user apply them to their own situation.
- Surface trade-offs, decisions, and prerequisites — not just definitions.
- Cite section headings (`##`, `###`) when quoting or referring to specific parts.

---

# RAG

<p class="eyebrow">Ground answers</p>

**RAG** (retrieval-augmented generation) grounds an agent's answers in *your* documents. The shape is always the same pipeline: split documents into **chunks**, turn each chunk into an **embedding**, **store** those vectors, and at query time **retrieve** the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don't wire it from scratch.

<StageFlow
  title="The RAG pipeline"
  caption="Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context."
  stages={[
    { label: 'Chunk', sub: 'MDocument' },
    { label: 'Embed', sub: 'embedder model' },
    { label: 'Store', sub: 'vector index', tone: 'core' },
    { label: 'Retrieve', sub: 'top-K by similarity' },
    { label: 'Augment', sub: 'answer from context', tone: 'good' },
  ]}
/>

## Build the index

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

```ts

const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });

const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
});

await pgVector.upsert({ indexName: 'docs', vectors: embeddings });
```

## Retrieve at query time

The cleanest path is a **vector query tool** — hand it to an agent and the model retrieves grounding context on its own.

```ts

const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});

// Give it to an agent like any other tool:

  id: 'docs-agent',
  name: 'Docs Agent',
  instructions: 'Answer using retrieved context. Cite the source chunks.',
  model: 'openai/gpt-4o',
  tools: { vectorQueryTool },
});
```

## Vector stores

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

| Store | Package | Notes |
| --- | --- | --- |
| **PgVector** | `@mastra/pg` | Postgres + pgvector; pairs with `PostgresStore` for memory too. |
| **LibSQL** | `@mastra/libsql` | `LibSQLVector` — zero-infra local/dev option. |
| **Pinecone** | `@mastra/pinecone` | Managed, serverless vector DB. |
| **Qdrant** | `@mastra/qdrant` | Self-host or cloud. |

> [!TIP]
> Chunk size is the highest-leverage knob: too large dilutes relevance, too small loses context. Start at ~512 tokens with light overlap and tune against an [eval set](/ai-systems/agent-frameworks/mastra/learn/handbooks/evals/).

---

**Reference:** [RAG overview](https://mastra.ai/docs/rag/overview) · [Chunking & embedding](https://mastra.ai/docs/rag/chunking-and-embedding) · [Retrieval](https://mastra.ai/docs/rag/retrieval)

Next: [**Multi-agent systems**](/ai-systems/agent-frameworks/mastra/learn/handbooks/multi-agent/) — coordinate specialists under a supervisor.

# RAG

> Source: https://library.zajapps.com/ai-systems/agent-frameworks/mastra/learn/handbooks/rag/

<p class="eyebrow">Ground answers</p>

**RAG** (retrieval-augmented generation) grounds an agent's answers in *your* documents. The shape is always the same pipeline: split documents into **chunks**, turn each chunk into an **embedding**, **store** those vectors, and at query time **retrieve** the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don't wire it from scratch.

<StageFlow
  title="The RAG pipeline"
  caption="Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context."
  stages={[
    { label: 'Chunk', sub: 'MDocument' },
    { label: 'Embed', sub: 'embedder model' },
    { label: 'Store', sub: 'vector index', tone: 'core' },
    { label: 'Retrieve', sub: 'top-K by similarity' },
    { label: 'Augment', sub: 'answer from context', tone: 'good' },
  ]}
/>

## Build the index

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

```ts

const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });

const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
});

await pgVector.upsert({ indexName: 'docs', vectors: embeddings });
```

## Retrieve at query time

The cleanest path is a **vector query tool** — hand it to an agent and the model retrieves grounding context on its own.

```ts

const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});

// Give it to an agent like any other tool:

  id: 'docs-agent',
  name: 'Docs Agent',
  instructions: 'Answer using retrieved context. Cite the source chunks.',
  model: 'openai/gpt-4o',
  tools: { vectorQueryTool },
});
```

## Vector stores

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

| Store | Package | Notes |
| --- | --- | --- |
| **PgVector** | `@mastra/pg` | Postgres + pgvector; pairs with `PostgresStore` for memory too. |
| **LibSQL** | `@mastra/libsql` | `LibSQLVector` — zero-infra local/dev option. |
| **Pinecone** | `@mastra/pinecone` | Managed, serverless vector DB. |
| **Qdrant** | `@mastra/qdrant` | Self-host or cloud. |

> [!TIP]
> Chunk size is the highest-leverage knob: too large dilutes relevance, too small loses context. Start at ~512 tokens with light overlap and tune against an [eval set](/ai-systems/agent-frameworks/mastra/learn/handbooks/evals/).

---

**Reference:** [RAG overview](https://mastra.ai/docs/rag/overview) · [Chunking & embedding](https://mastra.ai/docs/rag/chunking-and-embedding) · [Retrieval](https://mastra.ai/docs/rag/retrieval)

Next: [**Multi-agent systems**](/ai-systems/agent-frameworks/mastra/learn/handbooks/multi-agent/) — coordinate specialists under a supervisor.

Ground answers

RAG (retrieval-augmented generation) grounds an agent’s answers in your documents. The shape is always the same pipeline: split documents into chunks, turn each chunk into an embedding, store those vectors, and at query time retrieve the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don’t wire it from scratch.

The RAG pipeline

Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context.

Build the index

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

import { MDocument } from '@mastra/rag';
import { openai } from '@ai-sdk/openai';
import { embedMany } from 'ai';

const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });

const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
});

await pgVector.upsert({ indexName: 'docs', vectors: embeddings });

Retrieve at query time

The cleanest path is a vector query tool — hand it to an agent and the model retrieves grounding context on its own.

import { createVectorQueryTool } from '@mastra/rag';
import { ModelRouterEmbeddingModel } from '@mastra/core/llm';

const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});

// Give it to an agent like any other tool:
export const docsAgent = new Agent({
  id: 'docs-agent',
  name: 'Docs Agent',
  instructions: 'Answer using retrieved context. Cite the source chunks.',
  model: 'openai/gpt-4o',
  tools: { vectorQueryTool },
});

Vector stores

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

Store	Package	Notes
PgVector	`@mastra/pg`	Postgres + pgvector; pairs with `PostgresStore` for memory too.
LibSQL	`@mastra/libsql`	`LibSQLVector` — zero-infra local/dev option.
Pinecone	`@mastra/pinecone`	Managed, serverless vector DB.
Qdrant	`@mastra/qdrant`	Self-host or cloud.

Reference: RAG overview · Chunking & embedding · Retrieval

Next: Multi-agent systems — coordinate specialists under a supervisor.

# ZajLibrary — handbook

You are a **technical tutor**. You help the user understand the handbook below and apply it to their own situation.

**Mission:** Explain the handbook below and help the user apply it to what they are building.

## Metadata
- title: RAG
- url: https://library.zajapps.com/ai-systems/agent-frameworks/mastra/learn/handbooks/rag/
- shelf: Learn & Understand
- doc_type: handbook
- status: current
- kind: handbook
- collection: mastra
- category: ai-systems
- subcategory: agent-frameworks
- topic: mastra
- description: Retrieval-augmented generation in Mastra — chunk, embed, store, and retrieve your own documents so an agent answers from your data, not just its training set.
- tags: mastra, rag, vectors, embeddings

## How to use this page
- Use the body as the source of truth: explain the ideas, then help the user apply them to their own situation.
- Surface trade-offs, decisions, and prerequisites — not just definitions.
- Cite section headings (`##`, `###`) when quoting or referring to specific parts.

---

# RAG

<p class="eyebrow">Ground answers</p>

**RAG** (retrieval-augmented generation) grounds an agent's answers in *your* documents. The shape is always the same pipeline: split documents into **chunks**, turn each chunk into an **embedding**, **store** those vectors, and at query time **retrieve** the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don't wire it from scratch.

<StageFlow
  title="The RAG pipeline"
  caption="Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context."
  stages={[
    { label: 'Chunk', sub: 'MDocument' },
    { label: 'Embed', sub: 'embedder model' },
    { label: 'Store', sub: 'vector index', tone: 'core' },
    { label: 'Retrieve', sub: 'top-K by similarity' },
    { label: 'Augment', sub: 'answer from context', tone: 'good' },
  ]}
/>

## Build the index

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

```ts

const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });

const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
});

await pgVector.upsert({ indexName: 'docs', vectors: embeddings });
```

## Retrieve at query time

The cleanest path is a **vector query tool** — hand it to an agent and the model retrieves grounding context on its own.

```ts

const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});

// Give it to an agent like any other tool:

  id: 'docs-agent',
  name: 'Docs Agent',
  instructions: 'Answer using retrieved context. Cite the source chunks.',
  model: 'openai/gpt-4o',
  tools: { vectorQueryTool },
});
```

## Vector stores

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

| Store | Package | Notes |
| --- | --- | --- |
| **PgVector** | `@mastra/pg` | Postgres + pgvector; pairs with `PostgresStore` for memory too. |
| **LibSQL** | `@mastra/libsql` | `LibSQLVector` — zero-infra local/dev option. |
| **Pinecone** | `@mastra/pinecone` | Managed, serverless vector DB. |
| **Qdrant** | `@mastra/qdrant` | Self-host or cloud. |

> [!TIP]
> Chunk size is the highest-leverage knob: too large dilutes relevance, too small loses context. Start at ~512 tokens with light overlap and tune against an [eval set](/ai-systems/agent-frameworks/mastra/learn/handbooks/evals/).

---

**Reference:** [RAG overview](https://mastra.ai/docs/rag/overview) · [Chunking & embedding](https://mastra.ai/docs/rag/chunking-and-embedding) · [Retrieval](https://mastra.ai/docs/rag/retrieval)

Next: [**Multi-agent systems**](/ai-systems/agent-frameworks/mastra/learn/handbooks/multi-agent/) — coordinate specialists under a supervisor.

# RAG

> Source: https://library.zajapps.com/ai-systems/agent-frameworks/mastra/learn/handbooks/rag/

<p class="eyebrow">Ground answers</p>

**RAG** (retrieval-augmented generation) grounds an agent's answers in *your* documents. The shape is always the same pipeline: split documents into **chunks**, turn each chunk into an **embedding**, **store** those vectors, and at query time **retrieve** the closest chunks to feed the model as context. Mastra ships building blocks for each stage so you don't wire it from scratch.

<StageFlow
  title="The RAG pipeline"
  caption="Build-time: chunk → embed → store. Query-time: embed the question, retrieve nearest chunks, and pass them to the agent as grounding context."
  stages={[
    { label: 'Chunk', sub: 'MDocument' },
    { label: 'Embed', sub: 'embedder model' },
    { label: 'Store', sub: 'vector index', tone: 'core' },
    { label: 'Retrieve', sub: 'top-K by similarity' },
    { label: 'Augment', sub: 'answer from context', tone: 'good' },
  ]}
/>

## Build the index

Split a document, embed the chunks, and upsert them into a vector store. The embedder is a model-router string; the vector store is whichever backend you registered.

```ts

const doc = MDocument.fromText(longText);
const chunks = await doc.chunk({ strategy: 'recursive', size: 512, overlap: 50 });

const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
});

await pgVector.upsert({ indexName: 'docs', vectors: embeddings });
```

## Retrieve at query time

The cleanest path is a **vector query tool** — hand it to an agent and the model retrieves grounding context on its own.

```ts

const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
});

// Give it to an agent like any other tool:

  id: 'docs-agent',
  name: 'Docs Agent',
  instructions: 'Answer using retrieved context. Cite the source chunks.',
  model: 'openai/gpt-4o',
  tools: { vectorQueryTool },
});
```

## Vector stores

The pipeline is backend-agnostic — swap the store, keep the code. Common targets:

| Store | Package | Notes |
| --- | --- | --- |
| **PgVector** | `@mastra/pg` | Postgres + pgvector; pairs with `PostgresStore` for memory too. |
| **LibSQL** | `@mastra/libsql` | `LibSQLVector` — zero-infra local/dev option. |
| **Pinecone** | `@mastra/pinecone` | Managed, serverless vector DB. |
| **Qdrant** | `@mastra/qdrant` | Self-host or cloud. |

> [!TIP]
> Chunk size is the highest-leverage knob: too large dilutes relevance, too small loses context. Start at ~512 tokens with light overlap and tune against an [eval set](/ai-systems/agent-frameworks/mastra/learn/handbooks/evals/).

---

**Reference:** [RAG overview](https://mastra.ai/docs/rag/overview) · [Chunking & embedding](https://mastra.ai/docs/rag/chunking-and-embedding) · [Retrieval](https://mastra.ai/docs/rag/retrieval)

Next: [**Multi-agent systems**](/ai-systems/agent-frameworks/mastra/learn/handbooks/multi-agent/) — coordinate specialists under a supervisor.