Skip to content
prod e051e98
Browse

Add a chatbot to your app

Learn · guide · AI System

You are here: LearnGuidesAdd a chatbot to your app This guide is for you if: you want a chat assistant that answers from your content, not the open web. Worked example: the live assistant at library.zajapps.com/chat.

A useful product chatbot doesn’t free-associate — it answers from your content and cites it. The pattern is Retrieval-Augmented Generation (RAG): the model searches your library first, then answers grounded in what it found. ZajLibrary’s /chat does exactly this, reusing the same search engine that powers its MCP server.


A React island for the UI, one streaming API route, and the model calling a search tool — with each turn saved to Postgres.

graph TB
User["Visitor on /chat"] --> Island["React island<br/>(assistant-ui)"]
Island -->|"POST messages"| API["/api/chat<br/>(AI SDK streamText)"]
API -->|"tool call"| Search["search_library<br/>(same engine as MCP)"]
Search --> Index[("content index")]
API -->|"stream answer + citations"| Island
API -->|"save turn"| DB[("Postgres<br/>threads + messages")]
Island -->|"list / resume"| DB

The stack: assistant-ui (chat primitives), the Vercel AI SDK (streamText + tools), OpenRouter for the model, and Neon Postgres (via Drizzle) for persistence — all inside the existing Astro site.


/api/chat runs streamText with a backend tool the model can call to search your content, then toUIMessageStreamResponse() to stream back. Ground the model with a system prompt that says search first, then answer, and cite:

  • Define a search_library tool whose execute calls your existing search; return title + url + snippet.
  • stopWhen: stepCountIs(8) caps the tool loop.
  • Cite sources as markdown links so answers are verifiable.

assistant-ui’s headless primitives (ThreadPrimitive, MessagePrimitive, ComposerPrimitive) need no Tailwind — style them with your own tokens. Wire the runtime with useChatRuntime({ transport: new AssistantChatTransport({ api: '/api/chat' }) }) inside <AssistantRuntimeProvider>. In Astro, mount it as a client:only="react" island.

Save every turn in the route’s onFinish (best-effort — a DB hiccup must never break the chat). Key threads by an owner id (an anonymous localStorage id to start; a real user id once you add auth). Add list/load endpoints and a “recent chats” sidebar; resume a thread via ?thread=<id>.

Render tool calls as rich UI with makeAssistantToolUI — e.g. show search_library results as clickable cards while the assistant works, instead of raw JSON.


Real lessons from building this one:

  • Pick the model for the job. A reasoning model over-searched (12 tool calls) and blew the step cap before answering. A direct tool-calling model (gpt-4.1-mini) + a prompt that says “search once, then answer” gave a clean, cited response.
  • Secrets at runtime, not build time. Read keys/DB URLs from process.env (read at runtime) — not import.meta.env (can be inlined into the bundle). Locally, load .env.local into process.env for the dev server.
  • One retrieval layer, many surfaces. The chatbot’s search_library tool is the same engine behind the MCP server and (conceptually) site search — build retrieval once, reuse everywhere.
  • Search → chat hand-off is just a query param. /chat?q=<query> lets any search box escalate to the assistant.

Open /chat and ask a question, or read Build an MCP server for the retrieval side.