← All Projects
RAG Knowledge Base
A retrieval-augmented generation system that lets teams ask questions about their internal documents and get accurate, sourced answers.
A production-ready RAG (Retrieval-Augmented Generation) system that transforms internal documents into a searchable, conversational knowledge base.
The Problem
Teams drown in documentation — Confluence pages, PDFs, Slack threads, README files. Finding the right answer means searching across multiple tools and hoping you find the latest version.
The Solution
This system ingests documents from multiple sources, chunks and embeds them, and uses Claude to generate accurate answers grounded in the actual content — with source citations.
Key Features
- Multi-format ingestion - PDF, Markdown, HTML, plain text, and Confluence pages
- Hybrid retrieval - Combines semantic search with keyword matching for better recall
- Source citations - Every answer includes links back to the original documents
- Conversation memory - Follow-up questions maintain context from previous turns
- Admin dashboard - Monitor usage, view popular queries, and manage document sources
Architecture
Documents → Chunking → Embedding → ChromaDB (Vector Store)
↓
User Query → Query Expansion → Hybrid Retrieval → Re-ranking
↓
Claude API → Answer + Sources
Tech Stack
- Framework: Python + LlamaIndex for the RAG pipeline
- Vector DB: ChromaDB for embeddings storage and retrieval
- LLM: Claude API (Sonnet for queries, Haiku for summarization)
- Frontend: React + TypeScript chat interface
- Deployment: Docker containers on AWS ECS
Lessons Learned
- Chunking strategy matters more than model choice — overlapping chunks of ~500 tokens with parent-document retrieval gave the best results
- Hybrid retrieval (semantic + keyword) consistently outperforms pure vector search
- Query expansion before retrieval improves answer quality by ~25%