Build a Personal Knowledge Base Assistant

This tutorial walks you through building a personal knowledge base assistant from zero to one. The goal is not to stack frameworks. The goal is to understand the full RAG application flow from a user question to a grounded answer.

By the end, you should be able to explain how documents enter the system, why chunking is necessary, what embeddings and retrieval do, how prompts constrain the model to answer from evidence, how source citations work, and how evaluation helps you improve the system over time.

Final Result

After a user uploads or stores Markdown files, PDFs, or web notes, they can ask:

In my 30-day AI learning plan, what should I do in week 2?

The system returns:

Week 2 should focus on RAG basics, including reading the RAG docs, preparing test documents, implementing chunking, retrieval, a minimal RAG flow, source citations, and 20 evaluation questions.

Sources:
- 30-day-plan.md / Week 2

Recommended Stack

Module	Recommendation
Language	Python
API service	FastAPI
Local storage	SQLite
Vector search	Start with in-memory or SQLite, then move to pgvector/Qdrant
Document format	Support Markdown first, then PDF
Evaluation	CSV/JSON + a simple scoring script

Overall Architecture

User question
  -> Backend API
  -> Query rewrite
  -> Retrieve relevant chunks
  -> Build prompt
  -> Call model
  -> Parse answer and citations
  -> Log result and feedback

Offline indexing flow:

Document ingest
  -> Document parsing
  -> Text cleanup
  -> Chunking
  -> Generate embeddings
  -> Store chunks + metadata + vectors

Step 1: Make a Minimal Chat Call

Do not start with a knowledge base. First verify that your model API call works. You should be able to store the API key in an environment variable, send a user question, print the model answer, and record latency and errors.

The API key is not hardcoded
Failures show useful error messages
Outputs can be logged

Step 2: Read a Local Document

Start with Markdown files. Read a file, then put its content and the user question into the prompt.

You are a personal knowledge base assistant. Answer only from the material below.

Material:
{document_text}

User question:
{question}

If the material does not contain the answer, say "I could not find the answer in the provided material."

This step will quickly run into context length limits, so you should not keep sending the entire document to the model.

Step 3: Implement Chunking

A chunk is a smaller section of a document. Beginners can start with a simple strategy:

Split Markdown by headings when possible
Keep each chunk around 500 to 1,000 Chinese characters, or roughly 300 to 700 English words
Keep a small overlap between neighboring chunks
Store title, source, section, and chunk_id for every chunk

{
  "chunk_id": "30-day-plan#week-2#001",
  "title": "30-Day Plan",
  "source": "checklists/30-day-plan.md",
  "section": "Week 2: RAG Basics",
  "text": "Day 8: Read the RAG docs..."
}

Step 4: Start with Keyword Retrieval

Before adding vector search, build a keyword retrieval version. It is not fancy, but it helps you understand the retrieval flow. Take a user question as input, match keywords against chunk text and titles, return top-k chunks, and put those chunks into the prompt.

Step 5: Add Embeddings and Vector Retrieval

Embeddings convert text into vectors so you can perform semantic similarity search.

chunk text -> embedding model -> vector -> store
question -> embedding model -> query vector -> similarity search -> top-k chunks

Regenerate embeddings when a chunk changes
Store the embedding model name and version
Keep chunk metadata for filtering and citations
Log similarity scores for debugging

Step 6: Build the RAG Prompt

Treat retrieved chunks as available evidence. Do not let the model freely invent answers.

You are a personal knowledge base assistant.

Rules:
- Answer only from the available material
- Do not invent information that is not in the material
- Cite sources for every important claim
- If the material is insufficient, explain what is missing

Available material:
{retrieved_chunks}

User question:
{question}

Output format:
Answer:
{answer}

Sources:
- {source title} / {section}

Step 7: Add Source Citations

Citations are not decoration. They are the trust layer of a RAG system. Each answer should trace back to the document name, section or page, chunk_id, and source snippet.

{
  "answer": "string",
  "sources": [
    {
      "title": "30-Day Plan",
      "source": "checklists/30-day-plan.md",
      "section": "Week 2",
      "chunk_id": "30-day-plan#week-2#001",
      "score": 0.82
    }
  ]
}

Step 8: Handle Unanswerable Questions

When the material is insufficient, the system should refuse to answer or clearly state what is missing. Common refusal conditions include low retrieval scores, unrelated retrieved chunks, no direct answer in the material, or a user request outside the knowledge base scope.

I could not find a clear answer in the provided material. The retrieved content is mostly about the RAG learning plan and does not include deployment cost information.

Step 9: Build a Minimal Evaluation Set

Prepare 30 questions: 15 with clear answers in the material, 5 that require combining multiple chunks, 5 that should be refused, and 5 historical bad cases.

{
  "id": "kb_001",
  "question": "What should I learn in week 2?",
  "expected_behavior": "Answer the RAG basics tasks and cite 30-day-plan",
  "required_sources": ["checklists/30-day-plan.md"],
  "tags": ["rag", "easy"]
}

Step 10: Check Before Launch

The API key is not in frontend code or Git history
Users can only access their own documents
Retrieval and answers are logged
Answers include citations
The system refuses when material is insufficient
Token usage, cost, and latency are tracked
There is a feedback path for bad cases

For a more complete checklist, see the RAG Production Checklist in the GitHub repository.

Common Iteration Directions

Problem	Optimization
Retrieval misses the answer	Improve chunking, metadata, and query rewrite
Retrieval finds evidence but the answer is wrong	Improve the prompt, citation checks, and refusal behavior
Citations are inaccurate	Store more precise metadata and restrict source usage
Cost is too high	Reduce top-k, compress context, cache embeddings
Latency is too high	Use async indexing, caching, and faster models

What You Should Deliver

A runnable RAG Q&A demo
A set of test documents
30 evaluation questions
A project README
A bad case log
An architecture diagram

That is much more convincing than simply saying "I know RAG."