💬 Part 5: Build the RAG Chat Endpoint

✅ Goals

Let users ask questions in natural language
Use Vectorize to retrieve the most relevant document chunks
Use Workers AI to generate answers grounded in the retrieved content
Return the result to the frontend

🛠️ Instructions

1. Implement the `/api/chat` endpoint

In your index.ts, replace your existing chat route with the following RAG-enabled logic:

app.post('/api/chat', async (c) => {
  const env = c.env;
  const { question } = await c.req.json();

  try {
    // Step 1: Embed the user question
    const embeddingResponse = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: question,
    });
    const queryEmbedding = embeddingResponse.data[0];

    // Step 2: Query Vectorize for relevant context
    const vectorResult = await env.VECTORIZE.query(queryEmbedding, {
      topK: 3,
      returnValues: true,
      returnMetadata: 'all',
    });

    const matches = vectorResult.matches || [];
    const contextChunks = matches.map((m) => m.metadata?.text).filter(Boolean);

    // Step 3: Format context for the AI model
    const contextBlock = contextChunks.length
      ? `Context:\n${contextChunks.map((chunk) => `- ${chunk}`).join('\n')}`
      : 'No relevant context found.';

    const prompt = [
      { role: 'system', content: contextBlock },
      { role: 'user', content: question },
    ];

    // Step 4: Generate AI response
    const response = await env.AI.run(
      '@cf/meta/llama-4-scout-17b-16e-instruct',
      {
        messages: prompt,
      },
      {
        gateway: {
          id: 'cf-gateway', // Replace with your Gateway ID
          skipCache: true,
        },
      },
    );

    return c.json({
      message: response
      context: contextChunks,
    });
  } catch (err) {
    console.error('RAG error:', err);
    return c.json({ error: 'Failed to generate response' }, 500);
  }
});

Then deploy your Worker:

npm run deploy

✅ Summary

You’ve now built a full-stack RAG (Retrieval-Augmented Generation) chat system:

💬 Users send a question
🧠 It’s embedded and matched against your vector store
🔍 Relevant content is retrieved and passed into the AI model
🤖 The AI generates a grounded response
🌐 The frontend receives the answer + matched context

All powered by Cloudflare Workers, Vectorize, Workflows, and Workers AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💬 Part 5: Build the RAG Chat Endpoint

✅ Goals

🛠️ Instructions

1. Implement the `/api/chat` endpoint

✅ Summary

FilesExpand file tree

part-5.md

Latest commit

History

part-5.md

File metadata and controls

💬 Part 5: Build the RAG Chat Endpoint

✅ Goals

🛠️ Instructions

1. Implement the /api/chat endpoint

✅ Summary

1. Implement the `/api/chat` endpoint