All Projects

Azure RAG Document Q&A

Production

Full-stack document intelligence platform built on Azure — zero to deployed in 3 hours

Problem

Organizations struggle with information discoverability. Employees need fast, accurate answers from internal documents without manual searching. Traditional full-text search yields poor results for semantic queries, while raw LLM chatbots hallucinate when facts matter.

Solution

A web application that ingests documents, generates vector embeddings via Azure OpenAI, stores them in Azure AI Search with hybrid indexing, and answers questions using GPT-4o grounded in retrieved document context. Includes both Q&A interface and Azure Bot Service chat.

Key Features

  • Document upload with automatic chunking (800 chars + 200 overlap)
  • Vector embeddings via Azure OpenAI text-embedding-3-small
  • Hybrid search (vector similarity + keyword matching)
  • GPT-4o answer generation with source citations
  • Dual interface: web Q&A and Azure Bot Service chat
  • Analytics endpoint tracking visitor activity
  • Environment-agnostic configuration (local dev to Azure)

Tech Stack

PythonFlaskAzure OpenAIAzure AI SearchAzure App ServiceAzure Bot Service

Architecture

RAG pipeline: Documents → Chunks → Embeddings (text-embedding-3-small) → Azure AI Search (HNSW vector index). Queries → Embedding → Hybrid search (top 5) → Context + question → GPT-4o → Grounded answer with citations. Flask app deployed on Azure App Service (Linux, Python 3.12, gunicorn).

Screenshots

Screenshots coming soon

Metrics

~3 hours
Build Time
6
Azure Services
1536
Vector Dimensions
GPT-4o
LLM

My Role

Sole developer. Built from zero Azure experience to deployed, publicly accessible system in approximately 3 hours. Configured 6 Azure services, implemented the full RAG pipeline, and optimized deployment from 17 minutes to 80 seconds.