Azure RAG Document Q&A

Production

Full-stack document intelligence platform built on Azure — zero to deployed in 3 hours

Problem

Organizations struggle with information discoverability. Employees need fast, accurate answers from internal documents without manual searching. Traditional full-text search yields poor results for semantic queries, while raw LLM chatbots hallucinate when facts matter.

Solution

A web application that ingests documents, generates vector embeddings via Azure OpenAI, stores them in Azure AI Search with hybrid indexing, and answers questions using GPT-4o grounded in retrieved document context. Includes both Q&A interface and Azure Bot Service chat.

Key Features

Document upload with automatic chunking (800 chars + 200 overlap)
Vector embeddings via Azure OpenAI text-embedding-3-small
Hybrid search (vector similarity + keyword matching)
GPT-4o answer generation with source citations
Dual interface: web Q&A and Azure Bot Service chat
Analytics endpoint tracking visitor activity
Environment-agnostic configuration (local dev to Azure)

Tech Stack

PythonFlaskAzure OpenAIAzure AI SearchAzure App ServiceAzure Bot Service

Architecture

RAG pipeline: Documents → Chunks → Embeddings (text-embedding-3-small) → Azure AI Search (HNSW vector index). Queries → Embedding → Hybrid search (top 5) → Context + question → GPT-4o → Grounded answer with citations. Flask app deployed on Azure App Service (Linux, Python 3.12, gunicorn).

Screenshots

Screenshots coming soon

Metrics

~3 hours

Build Time

Azure Services

1536

Vector Dimensions

GPT-4o

LLM

My Role

Sole developer. Built from zero Azure experience to deployed, publicly accessible system in approximately 3 hours. Configured 6 Azure services, implemented the full RAG pipeline, and optimized deployment from 17 minutes to 80 seconds.