← All Projects

RAG Knowledge Base

A retrieval-augmented generation system that lets teams ask questions about their internal documents and get accurate, sourced answers.

PythonLlamaIndexChromaDBClaude APIAI

A production-ready RAG (Retrieval-Augmented Generation) system that transforms internal documents into a searchable, conversational knowledge base.

The Problem

Teams drown in documentation — Confluence pages, PDFs, Slack threads, README files. Finding the right answer means searching across multiple tools and hoping you find the latest version.

The Solution

This system ingests documents from multiple sources, chunks and embeds them, and uses Claude to generate accurate answers grounded in the actual content — with source citations.

Key Features

  • Multi-format ingestion - PDF, Markdown, HTML, plain text, and Confluence pages
  • Hybrid retrieval - Combines semantic search with keyword matching for better recall
  • Source citations - Every answer includes links back to the original documents
  • Conversation memory - Follow-up questions maintain context from previous turns
  • Admin dashboard - Monitor usage, view popular queries, and manage document sources

Architecture

Documents → Chunking → Embedding → ChromaDB (Vector Store)

User Query → Query Expansion → Hybrid Retrieval → Re-ranking

                                              Claude API → Answer + Sources

Tech Stack

  • Framework: Python + LlamaIndex for the RAG pipeline
  • Vector DB: ChromaDB for embeddings storage and retrieval
  • LLM: Claude API (Sonnet for queries, Haiku for summarization)
  • Frontend: React + TypeScript chat interface
  • Deployment: Docker containers on AWS ECS

Lessons Learned

  • Chunking strategy matters more than model choice — overlapping chunks of ~500 tokens with parent-document retrieval gave the best results
  • Hybrid retrieval (semantic + keyword) consistently outperforms pure vector search
  • Query expansion before retrieval improves answer quality by ~25%