Status: Released, Phases 1–3 complete (CLI MVP, multi-agent orchestration, quality and evaluation with model selection). View on GitHub.
A multi-agent retrieval-augmented research assistant that combines cloud LLM reasoning, live research APIs, and persistent vector memory with intelligent model selection and factual-consistency scoring.
The Problem
Research workflows that lean on a single LLM run into three recurring failures. The model picks a planning strategy that’s wrong for the complexity of the query. It synthesizes from its own pre-training instead of live sources. And it hallucinates citations or stitches contradictory sources together without flagging the conflict. A general-purpose chat interface gives you no way to harden any of those steps.
The Approach
A three-agent workflow with model selection scoped to each agent’s task:
- Seed Agent decomposes the query and plans the search. Runs on
o1-minibecause the work is structural, not synthesis. - Sourcing Agent calls research APIs, filters and evaluates content. Runs on
sonar-pro(Perplexity) for live web research. - Research Agent does retrieval, synthesis, and conflict detection across local Chroma vectors and live results. Runs on
o1for the heavier reasoning.
A model selection layer routes by context: smaller models for planning, larger models for synthesis, and a high-context fallback (gpt-4.1) when the working set crosses ~100k tokens. Fallback chains catch model errors without dropping the run.
What’s Implemented
- Three-agent orchestration: Seed → Sourcing → Research, with structured handoffs and per-agent prompt templates
- Local vector store: Chroma DB with persistent memory across sessions;
python app/cli.py addingests text, titles, and URLs - Live research: Perplexity Sonar integration for real-time web sources, mixed with local citations and conflict detection
- Vectara factual consistency scoring (FCS): runs on synthesized responses; combined with model confidence and citation quality into a multi-factor confidence score
- Performance monitoring: real-time metrics on model usage, success rates, and per-model comparison; exportable
- CLI surface:
ask,add,stats,report,models,performance,select-model - Test suite: end-to-end basic pipeline test, smoke tests, and model-selection tests under
tests/
Architecture
Query → Seed Agent (o1-mini) # plan search
→ Sourcing Agent (sonar-pro) # fetch + evaluate live sources
→ Research Agent (o1) # synthesize against Chroma + live
→ Vectara FCS # score factual consistency
→ Response + multi-factor confidence
Configuration lives in model_config.yaml. The default chain pairs OpenAI’s reasoning models with sonar-pro for research and falls back to gpt-4.1 for very large contexts. Embeddings use text-embedding-3-large; reranking uses Cohere’s rerank-english-v3.0. Alternate LLMs (Claude 3.5 Sonnet) can be plugged in for diverse perspectives.
What This Demonstrates
- Multi-agent orchestration with per-agent model selection grounded in actual cost/quality tradeoffs, not “one model for everything”
- Hybrid RAG with both persistent local vectors and live web research, with explicit conflict surfacing rather than silent merging
- Quality engineering for LLM outputs: factual consistency scoring, multi-factor confidence, and performance instrumentation built into the pipeline rather than bolted on after