3 min read
AI Research Pipeline

Status: Released, Phases 1–3 complete (CLI MVP, multi-agent orchestration, quality and evaluation with model selection). View on GitHub.

A multi-agent retrieval-augmented research assistant that combines cloud LLM reasoning, live research APIs, and persistent vector memory with intelligent model selection and factual-consistency scoring.

The Problem

Research workflows that lean on a single LLM run into three recurring failures. The model picks a planning strategy that’s wrong for the complexity of the query. It synthesizes from its own pre-training instead of live sources. And it hallucinates citations or stitches contradictory sources together without flagging the conflict. A general-purpose chat interface gives you no way to harden any of those steps.

The Approach

A three-agent workflow with model selection scoped to each agent’s task:

  • Seed Agent decomposes the query and plans the search. Runs on o1-mini because the work is structural, not synthesis.
  • Sourcing Agent calls research APIs, filters and evaluates content. Runs on sonar-pro (Perplexity) for live web research.
  • Research Agent does retrieval, synthesis, and conflict detection across local Chroma vectors and live results. Runs on o1 for the heavier reasoning.

A model selection layer routes by context: smaller models for planning, larger models for synthesis, and a high-context fallback (gpt-4.1) when the working set crosses ~100k tokens. Fallback chains catch model errors without dropping the run.

What’s Implemented

  • Three-agent orchestration: Seed → Sourcing → Research, with structured handoffs and per-agent prompt templates
  • Local vector store: Chroma DB with persistent memory across sessions; python app/cli.py add ingests text, titles, and URLs
  • Live research: Perplexity Sonar integration for real-time web sources, mixed with local citations and conflict detection
  • Vectara factual consistency scoring (FCS): runs on synthesized responses; combined with model confidence and citation quality into a multi-factor confidence score
  • Performance monitoring: real-time metrics on model usage, success rates, and per-model comparison; exportable
  • CLI surface: ask, add, stats, report, models, performance, select-model
  • Test suite: end-to-end basic pipeline test, smoke tests, and model-selection tests under tests/

Architecture

Query → Seed Agent (o1-mini)         # plan search
      → Sourcing Agent (sonar-pro)   # fetch + evaluate live sources
      → Research Agent (o1)          # synthesize against Chroma + live
      → Vectara FCS                  # score factual consistency
      → Response + multi-factor confidence

Configuration lives in model_config.yaml. The default chain pairs OpenAI’s reasoning models with sonar-pro for research and falls back to gpt-4.1 for very large contexts. Embeddings use text-embedding-3-large; reranking uses Cohere’s rerank-english-v3.0. Alternate LLMs (Claude 3.5 Sonnet) can be plugged in for diverse perspectives.

What This Demonstrates

  • Multi-agent orchestration with per-agent model selection grounded in actual cost/quality tradeoffs, not “one model for everything”
  • Hybrid RAG with both persistent local vectors and live web research, with explicit conflict surfacing rather than silent merging
  • Quality engineering for LLM outputs: factual consistency scoring, multi-factor confidence, and performance instrumentation built into the pipeline rather than bolted on after