All Projects

Document Intake Agent

Production

LangGraph agent with a deterministic test suite — the testing discipline AI agents need

Visit live demo

Problem

AI agents are non-deterministic, which makes them hard to test — yet untested agents can't be trusted in production. The hard part of agent work is proving the workflow behaves correctly under controlled conditions.

Solution

A multi-node LangGraph graph (ingest → extract → validate/route → summarize) with a conditional retry loop, fronted by a FastAPI demo. The model is swapped for a scripted fake in tests — making the whole suite deterministic and free — while a real Claude model powers the live, PIN-gated demo.

Key Features

  • Multimodal input — reads photos, scans (even handwriting), and PDFs via Claude vision
  • LangGraph graph with a conditional retry edge (re-extract until schema-valid)
  • Node-level pytest unit tests with mocked model state
  • Graph-level integration tests with controlled inputs
  • LLM mocking via a scripted fake model for deterministic, zero-cost tests
  • Pydantic structured-output validation
  • Regression tests that name the failure mode each one guards
  • Playwright E2E test against the live FastAPI UI
  • PIN-gated live Claude mode with a daily cost cap
  • Containerized and deployed to GCP Cloud Run

Tech Stack

PythonLangGraphClaude APIPydanticpytestPlaywrightFastAPIDockerGCP Cloud Run

Architecture

LangGraph StateGraph with a dependency-injected LLM interface: production passes a real Claude client, tests pass a scripted FakeLLM, and the graph is unchanged. FastAPI front door with PIN-gated live mode and an in-memory daily cap. Deployed on Cloud Run (single instance) with secrets in Secret Manager. 96% test coverage; tiered suite (unit/graph by default, Playwright + real-model integration opt-in).

Screenshots

Screenshots coming soon

Metrics

96%
Test Coverage
LangGraph
Stack
Claude
Model
Cloud Run
Deploy

My Role

Sole developer. Scoped the project to mirror a real AI/agent-testing role, built the graph and the full test apparatus AI-augmented, and deployed it live with cost guardrails.