Build a Personal Knowledge Base Assistant

中文 English

This tutorial walks you through building a personal knowledge base assistant from zero to one. The goal is not to stack frameworks. The goal is to understand the full RAG application flow from a user question to a grounded answer.

By the end, you should be able to explain how documents enter the system, why chunking is necessary, what embeddings and retrieval do, how prompts constrain the model to answer from evidence, how source citations work, and how evaluation helps you improve the system over time.

Final Result

After a user uploads or stores Markdown files, PDFs, or web notes, they can ask:

In my 30-day AI learning plan, what should I do in week 2?

The system returns:

Week 2 should focus on RAG basics, including reading the RAG docs, preparing test documents, implementing chunking, retrieval, a minimal RAG flow, source citations, and 20 evaluation questions.

Sources:
- 30-day-plan.md / Week 2

Recommended Stack

ModuleRecommendation
LanguagePython
API serviceFastAPI
Local storageSQLite
Vector searchStart with in-memory or SQLite, then move to pgvector/Qdrant
Document formatSupport Markdown first, then PDF
EvaluationCSV/JSON + a simple scoring script

Overall Architecture

User question
  -> Backend API
  -> Query rewrite
  -> Retrieve relevant chunks
  -> Build prompt
  -> Call model
  -> Parse answer and citations
  -> Log result and feedback

Offline indexing flow:

Document ingest
  -> Document parsing
  -> Text cleanup
  -> Chunking
  -> Generate embeddings
  -> Store chunks + metadata + vectors

Step 1: Make a Minimal Chat Call

Do not start with a knowledge base. First verify that your model API call works. You should be able to store the API key in an environment variable, send a user question, print the model answer, and record latency and errors.

Step 2: Read a Local Document

Start with Markdown files. Read a file, then put its content and the user question into the prompt.

You are a personal knowledge base assistant. Answer only from the material below.

Material:
{document_text}

User question:
{question}

If the material does not contain the answer, say "I could not find the answer in the provided material."

This step will quickly run into context length limits, so you should not keep sending the entire document to the model.

Step 3: Implement Chunking

A chunk is a smaller section of a document. Beginners can start with a simple strategy:

{
  "chunk_id": "30-day-plan#week-2#001",
  "title": "30-Day Plan",
  "source": "checklists/30-day-plan.md",
  "section": "Week 2: RAG Basics",
  "text": "Day 8: Read the RAG docs..."
}

Step 4: Start with Keyword Retrieval

Before adding vector search, build a keyword retrieval version. It is not fancy, but it helps you understand the retrieval flow. Take a user question as input, match keywords against chunk text and titles, return top-k chunks, and put those chunks into the prompt.

Step 5: Add Embeddings and Vector Retrieval

Embeddings convert text into vectors so you can perform semantic similarity search.

chunk text -> embedding model -> vector -> store
question -> embedding model -> query vector -> similarity search -> top-k chunks

Step 6: Build the RAG Prompt

Treat retrieved chunks as available evidence. Do not let the model freely invent answers.

You are a personal knowledge base assistant.

Rules:
- Answer only from the available material
- Do not invent information that is not in the material
- Cite sources for every important claim
- If the material is insufficient, explain what is missing

Available material:
{retrieved_chunks}

User question:
{question}

Output format:
Answer:
{answer}

Sources:
- {source title} / {section}

Step 7: Add Source Citations

Citations are not decoration. They are the trust layer of a RAG system. Each answer should trace back to the document name, section or page, chunk_id, and source snippet.

{
  "answer": "string",
  "sources": [
    {
      "title": "30-Day Plan",
      "source": "checklists/30-day-plan.md",
      "section": "Week 2",
      "chunk_id": "30-day-plan#week-2#001",
      "score": 0.82
    }
  ]
}

Step 8: Handle Unanswerable Questions

When the material is insufficient, the system should refuse to answer or clearly state what is missing. Common refusal conditions include low retrieval scores, unrelated retrieved chunks, no direct answer in the material, or a user request outside the knowledge base scope.

I could not find a clear answer in the provided material. The retrieved content is mostly about the RAG learning plan and does not include deployment cost information.

Step 9: Build a Minimal Evaluation Set

Prepare 30 questions: 15 with clear answers in the material, 5 that require combining multiple chunks, 5 that should be refused, and 5 historical bad cases.

{
  "id": "kb_001",
  "question": "What should I learn in week 2?",
  "expected_behavior": "Answer the RAG basics tasks and cite 30-day-plan",
  "required_sources": ["checklists/30-day-plan.md"],
  "tags": ["rag", "easy"]
}

Step 10: Check Before Launch

For a more complete checklist, see the RAG Production Checklist in the GitHub repository.

Common Iteration Directions

ProblemOptimization
Retrieval misses the answerImprove chunking, metadata, and query rewrite
Retrieval finds evidence but the answer is wrongImprove the prompt, citation checks, and refusal behavior
Citations are inaccurateStore more precise metadata and restrict source usage
Cost is too highReduce top-k, compress context, cache embeddings
Latency is too highUse async indexing, caching, and faster models

What You Should Deliver

That is much more convincing than simply saying "I know RAG."