RAG Knowledge Base - Luong Hong Thuan

A production-ready RAG (Retrieval-Augmented Generation) system that transforms internal documents into a searchable, conversational knowledge base.

The Problem

Teams drown in documentation — Confluence pages, PDFs, Slack threads, README files. Finding the right answer means searching across multiple tools and hoping you find the latest version.

The Solution

This system ingests documents from multiple sources, chunks and embeds them, and uses Claude to generate accurate answers grounded in the actual content — with source citations.

Key Features

Multi-format ingestion - PDF, Markdown, HTML, plain text, and Confluence pages
Hybrid retrieval - Combines semantic search with keyword matching for better recall
Source citations - Every answer includes links back to the original documents
Conversation memory - Follow-up questions maintain context from previous turns
Admin dashboard - Monitor usage, view popular queries, and manage document sources

Architecture

Documents → Chunking → Embedding → ChromaDB (Vector Store)
                                         ↓
User Query → Query Expansion → Hybrid Retrieval → Re-ranking
                                                      ↓
                                              Claude API → Answer + Sources

Tech Stack

Framework: Python + LlamaIndex for the RAG pipeline
Vector DB: ChromaDB for embeddings storage and retrieval
LLM: Claude API (Sonnet for queries, Haiku for summarization)
Frontend: React + TypeScript chat interface
Deployment: Docker containers on AWS ECS

Lessons Learned

Chunking strategy matters more than model choice — overlapping chunks of ~500 tokens with parent-document retrieval gave the best results
Hybrid retrieval (semantic + keyword) consistently outperforms pure vector search
Query expansion before retrieval improves answer quality by ~25%