All Pages Edit on GitHub

Build a Personal Knowledge Base Assistant

English | 中文

This tutorial walks you through building a personal knowledge base assistant from zero to one. The goal is not to stack frameworks. The goal is to understand the full RAG application flow from a user question to a grounded answer.

By the end, you should be able to explain:

Final Result

After a user uploads or stores Markdown files, PDFs, or web notes, they can ask:

In my 30-day AI learning plan, what should I do in week 2?

The system returns:

Week 2 should focus on RAG basics, including reading the RAG docs, preparing test documents, implementing chunking, retrieval, a minimal RAG flow, source citations, and 20 evaluation questions.

Sources:
- 30-day-plan.md / Week 2

Recommended Stack

For beginners, start with a small stack:

Module Recommendation
Language Python
API service FastAPI
Local storage SQLite
Vector search Start with in-memory or SQLite, then move to pgvector/Qdrant
Document format Support Markdown first, then PDF
Evaluation CSV/JSON + a simple scoring script

Do not introduce too many frameworks on day one. First make the data flow work, then replace components as needed.

Overall Architecture

User question
  -> Backend API
  -> Query rewrite
  -> Retrieve relevant chunks
  -> Build prompt
  -> Call model
  -> Parse answer and citations
  -> Log result and feedback

Offline indexing flow:

Document ingest
  -> Document parsing
  -> Text cleanup
  -> Chunking
  -> Generate embeddings
  -> Store chunks + metadata + vectors

Step 1: Make a Minimal Chat Call

Do not start with a knowledge base. First verify that your model API call works.

You should be able to:

Checkpoints:

Step 2: Read a Local Document

Start with Markdown files. Read a file, then put its content and the user question into the prompt.

Example prompt:

You are a personal knowledge base assistant. Answer only from the material below.

Material:
{document_text}

User question:
{question}

If the material does not contain the answer, say "I could not find the answer in the provided material."

This step will quickly run into context length limits, so you should not keep sending the entire document to the model.

Step 3: Implement Chunking

A chunk is a smaller section of a document. Beginners can start with a simple strategy:

Example data structure:

{
  "chunk_id": "30-day-plan#week-2#001",
  "title": "30-Day Plan",
  "source": "checklists/30-day-plan.md",
  "section": "Week 2: RAG Basics",
  "text": "Day 8: Read the RAG docs..."
}

Good chunking is not about the number of chunks. It is about whether retrieval can find complete evidence for real user questions.

Step 4: Start with Keyword Retrieval

Before adding vector search, build a keyword retrieval version. It is not fancy, but it helps you understand the retrieval flow.

Minimum requirements:

This step lets you run the main RAG flow before adding embeddings.

Step 5: Add Embeddings and Vector Retrieval

Embeddings convert text into vectors so you can perform semantic similarity search.

During indexing:

chunk text -> embedding model -> vector -> store

During querying:

question -> embedding model -> query vector -> similarity search -> top-k chunks

Important notes:

Step 6: Build the RAG Prompt

Treat retrieved chunks as available evidence. Do not let the model freely invent answers.

Recommended template:

You are a personal knowledge base assistant.

Rules:
- Answer only from the available material
- Do not invent information that is not in the material
- Cite sources for every important claim
- If the material is insufficient, explain what is missing

Available material:
{retrieved_chunks}

User question:
{question}

Output format:
Answer:
{answer}

Sources:
- {source title} / {section}

Step 7: Add Source Citations

Citations are not decoration. They are the trust layer of a RAG system. Each answer should trace back to:

Recommended API response:

{
  "answer": "string",
  "sources": [
    {
      "title": "30-Day Plan",
      "source": "checklists/30-day-plan.md",
      "section": "Week 2",
      "chunk_id": "30-day-plan#week-2#001",
      "score": 0.82
    }
  ]
}

Step 8: Handle Unanswerable Questions

When the material is insufficient, the system should refuse to answer or clearly state what is missing.

Common refusal conditions:

Example refusal:

I could not find a clear answer in the provided material. The retrieved content is mostly about the RAG learning plan and does not include deployment cost information.

Step 9: Build a Minimal Evaluation Set

Prepare 30 questions:

Each record can look like this:

{
  "id": "kb_001",
  "question": "What should I learn in week 2?",
  "expected_behavior": "Answer the RAG basics tasks and cite 30-day-plan",
  "required_sources": ["checklists/30-day-plan.md"],
  "tags": ["rag", "easy"]
}

Step 10: Check Before Launch

Before launch, confirm that:

For a more complete checklist, see RAG Production Checklist.

Common Iteration Directions

Problem Optimization
Retrieval misses the answer Improve chunking, metadata, and query rewrite
Retrieval finds evidence but the answer is wrong Improve the prompt, citation checks, and refusal behavior
Citations are inaccurate Store more precise metadata and restrict source usage
Cost is too high Reduce top-k, compress context, cache embeddings
Latency is too high Use async indexing, caching, and faster models

What You Should Deliver

That is much more convincing than simply saying "I know RAG."