LLMs & GenAI Platform
DEPLOYEDEnterprise document intelligence at production scale — built to handle the edge cases that kill most LLM demos before they reach real users.
Reduced document review time by 68% across a team of 30 analysts
Processes 10,000+ documents per day with P95 latency under 3 seconds
99.2% uptime over 6 months of production operation
Sub-agent routing achieves 91% task decomposition accuracy on held-out eval set
Enterprise clients needed to extract structured insights from thousands of unstructured documents — contracts, reports, research papers — without manual review. Existing keyword search failed on semantic queries. Off-the-shelf RAG solutions couldn't handle domain-specific terminology or multi-document reasoning.
Built a hierarchical document processing pipeline: semantic chunking preserves contextual coherence, a custom reranker filters noise before LLM calls, and a router dispatches queries to specialised sub-agents (summarisation, extraction, comparison). Each reasoning step is logged to an audit chain so outputs are traceable and correctable.
FastAPI backend handles concurrent requests via async workers. Pinecone stores embeddings with metadata filters for tenant isolation. Redis caches frequent query patterns, cutting LLM costs by 34%. The whole stack runs on AWS ECS with auto-scaling tied to queue depth.
The hardest part wasn't the model — it was evaluation. Building a harness that catches retrieval regressions before they reach production required creating a domain-specific test set of 500 question-answer pairs. That investment paid off within two weeks when a chunking change that looked neutral in offline metrics turned out to degrade a specific query type by 22%.