1.5 RAG Over Agent History

The problem

Classic RAG retrieves from a static document corpus — a wiki, a PDF set, a transcript dump. The retriever's job is to find passages that look like the user's question.

An agent that has been running for months has a different problem. Its "corpus" is the chain of decisions, mistakes, lessons, and policies it has appended over time — thousands of thoughts in the agent's own voice, full of corrections and supersedes. The retriever's job is no longer "find passages that look like the question." It is "find the agent's own position on this question, including every prior turn of the argument."

The user asks "what did we decide about rate limiting?" and the agent must answer not just with the latest Decision, but with the chain of Constraint, Mistake, and Supersedes thoughts that produced it. That is what this chapter builds.

Why it's hard

Short query, long answer. "What did we decide about X?" is three words, but the answer may span five thoughts across two weeks.
The latest position is rarely the only position. A Decision from March may have been Superseded in May, and the superseder is the one that matters.
Time-sensitive. "What was true on April 12th?" requires a temporal filter that respects valid_at / invalid_at.
Off-chain. A teammate's Decision on a shared topic may live in a branch chain that forked from yours.
Pure lexical misses the reasoning chain (a decision about rate limiting may not contain the words "rate limiting"). Pure vector misses corrections. You need both, plus the graph.

The retrieval pipeline

When the user asks an open-ended question and you call chain.query_ranked, MentisDB runs a six-stage pipeline:

Filter — a deterministic ThoughtQuery narrows the universe by thought_types, concepts_any, tags_any, agent_ids, since, until, and as_of. Cheap; runs first.
Lexical — BM25 over the surviving thoughts, with automatic thesaurus expansion (since 0.9.9).
Vector — cosine similarity from any registered embedding sidecars.
Graph expansion — from the top lexical seeds, traverse refs and ThoughtRelation edges up to max_depth hops. This is what surfaces the reasoning chain behind a decision.
RRF fusion (only if enable_reranking) — produce lexical-only and vector-only rank lists, then merge via 1/(60 + rank_lex) + 1/(60 + rank_vec). Graph, importance, recency, and session cohesion are added back as small tie-breakers.
Rerank — apply min_score cutoff, sort, truncate to limit, return.

Heads up: graph expansion is seeded from the lexical matches, not from arbitrary indices. If with_text is omitted, the graph pass is silently skipped — always pass a text query when you want graph expansion to run.

Implementation: a complete ask → retrieve → answer flow

1. Record a few decisions about rate limiting

use mentisdb::{MentisDb, ThoughtInput, ThoughtType, ThoughtRelationKind,
               RankedSearchQuery, RankedSearchGraph};

fn seed_decisions(chain: &mut MentisDb, agent_id: &str) -> Result<()> {
    let d1 = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Decision,
            "Rate limit public API endpoints to 100 req/min per user. \
             Rationale: matches tier-1 traffic; protects against \
             unbounded retry loops in client SDKs.")
        .with_concepts(["api", "rate-limiting", "policy"])
        .with_tags(["policy:api", "scope:user"])
        .with_importance(0.9)
    )?;
    chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Constraint,
            "Tier-1 customers average 60 req/min per user; peaks hit 200 \
             req/min during business hours.")
        .with_concepts(["api", "rate-limiting"]).with_importance(0.7)
        .with_refs(vec![d1])
    )?;
    let m1 = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Mistake,
            "2026-04-03: 4xx rate-limit errors for top 1% of customers. \
             100 req/min was too tight; we briefly raised the limit to 500 \
             in prod without a Decision.")
        .with_concepts(["api", "rate-limiting", "incident"])
        .with_importance(0.85)
    )?;
    let d2 = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Decision,
            "Rate limit public API to 500 req/min per user by default, \
             with per-customer override up to 5000. Supersedes the 100 \
             req/min decision from 2026-03-12.")
        .with_concepts(["api", "rate-limiting", "policy"])
        .with_importance(0.95)
        .with_refs(vec![d1, m1])
    )?;
    // NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
    Ok(())
}

2. Ask the question

fn answer_about_rate_limiting(chain: &MentisDb) -> Result<String> {
    let results = chain.query_ranked(
        &RankedSearchQuery::new()
            .with_text("what did we decide about rate limiting")
            .with_filter(mentisdb::ThoughtQuery::new()
                .with_types([ThoughtType::Decision, ThoughtType::Correction,
                             ThoughtType::Constraint, ThoughtType::LessonLearned])
                .with_concepts_any(["rate-limiting"]))
            .with_graph(RankedSearchGraph::new()
                .with_max_depth(2).with_max_visited(50))
            .with_reranking(50)
            .with_limit(10)
            .with_min_score(0.1)
    );

    Ok(results.hits.iter().map(|h| format!(
        "- [{}] (score {:.3}) {}",
        h.thought.thought_type, h.score,
        h.thought.content.lines().next().unwrap_or(""),
    )).collect::<Vec<_>>().join("\n"))
}

What comes back, in roughly this order: the latest Decision at the top, then the superseded older Decision, then the Mistake and Constraint that the graph pass pulled in from the top hit's refs chain. Those last two don't lexically match the query "rate limiting" — that's the point of the graph pass: it surfaces the reasoning chain, not just the keyword match.

Temporal queries: "what was true at time T?"

Decisions get superseded. To answer "what did we believe on April 12th?", pass as_of:

use chrono::{TimeZone, Utc};

let april_12 = Utc.with_ymd_and_hms(2026, 4, 12, 0, 0, 0).unwrap();
let results = chain.query_ranked(
    &RankedSearchQuery::new()
        .with_text("rate limiting policy")
        .with_as_of(april_12)
        .with_types([ThoughtType::Decision])
        .with_limit(5)
);
// Returns the 100 req/min Decision but NOT the 500 req/min
// one — the superseder was appended on April 14.

as_of excludes thoughts appended after the timestamp, relations whose valid_at / invalid_at window doesn't cover it, and targets of any Supersedes / Corrects / Invalidates relation that was already in force.

Gotcha: as_of filters the view at that point in time; it doesn't edit history. A Supersedes appended in May still exists, it just doesn't apply for an April 12th query. Right behavior for audit and replay. If you want the latest answer, just omit as_of.

Cross-chain queries: "did any team decide this?"

Two teams may have made conflicting decisions in branch chains forked from a shared parent. To ask "did any team in our org decide this?", use the federated search path:

use std::collections::BTreeMap;
use mentisdb::federated::FederatedSearchRequest;

let policy_q = || RankedSearchQuery::new()
    .with_text("rate limiting policy")
    .with_types([ThoughtType::Decision])
    .with_limit(5);

let org_query = FederatedSearchRequest::new()
    .with_self_query(policy_q())
    .with_chain_queries(BTreeMap::from([
        ("team-payments".to_string(), policy_q()),
        ("team-search".to_string(),   policy_q()),
    ]));

let federated = chain.federated_search(&org_query)?;
for (chain_key, ranked) in federated.per_chain.iter() {
    println!("=== {} ===", chain_key);
    for hit in &ranked.hits {
        println!("  [{}] {}", hit.thought.thought_type,
            hit.thought.content.lines().next().unwrap_or(""));
    }
}

To walk a branch line back to its origin, use the BranchesFrom relation: chain.ancestor_chain_keys(&chain_key, max_depth) returns every parent chain key reachable via BranchesFrom edges.

Multi-hop: following a chain of decisions

Often the answer isn't in any single thought; it's in a sequence — a Decision based on a Constraint that came from a LessonLearned triggered by a Mistake months earlier. Raise graph depth and read graph_path to walk it:

let results = chain.query_ranked(
    &RankedSearchQuery::new()
        .with_text("current API rate limit policy")
        .with_graph(RankedSearchGraph::new()
            .with_max_depth(3).with_max_visited(100)
            .with_mode(mentisdb::search::GraphExpansionMode::Bidirectional))
        .with_limit(20)
);

let mut timeline: Vec<_> = results.hits.iter().collect();
timeline.sort_by_key(|h| h.thought.index);
for hit in timeline {
    let hops = hit.graph_path.as_ref()
        .map(|p| p.hops_from_seed).unwrap_or(0);
    println!("[{}] (hops={}) {}", hit.thought.timestamp.format("%Y-%m-%d"),
             hops, hit.thought.content.lines().next().unwrap_or(""));
}

How the hybrid score is built

By default (no RRF), a hit's score is a weighted blend of lexical similarity, vector cosine (if a sidecar is present), graph path bonus, importance, confidence, recency, and session cohesion. Weights live in src/search/scoring.rs; you don't usually need to touch them.

With enable_reranking on, the pipeline runs in two stages: score the top rerank_k with the default blend to pick a candidate set, then produce fresh lexical-only and vector-only rank lists over that set and combine them with RRF — 1/(60 + rank_lex) + 1/(60 + rank_vec). The other signals come back as small tie-breakers. When lexical and vector agree, RRF amplifies the agreement; when they disagree, a thought that ranks moderately on both can outrank one that ranks first on one and fiftieth on the other — usually right for agent history, where a routine re-mention may lexically trump the real policy and RRF pulls them back into balance.

Production notes

When RRF helps

Vague questions where lexical and vector pull in different directions ("what went wrong with the auth refactor?" — lexical wants Mistake, vector wants the related LessonLearned).
Chains with lots of routinely rephrased thoughts: RRF demotes them since they rank moderately on both signals instead of top on either.
Small candidate sets (< ~200): RRF is cheap at that scale.

When RRF hurts

Highly specific queries (an error code, a UUID, a tag): vector side adds noise.
Huge chains: RRF only reranks the top rerank_k of pre-RRF candidates; an answer at rank 200 won't be seen. Pre-filter harder or raise limit.
No vector sidecar: RRF degenerates to lexical ranking and you've paid for nothing — just call query_ranked without with_reranking.

The `min_score` threshold

Depends on the embedding model and domain. Short, technical queries: 0.2 - 0.3. Long, vague queries: drop to 0.05 or omit. Always inspect the top hits with the threshold disabled before setting one — a threshold that hides the top hit is worse than no threshold.

The `enable_reranking` flag

Master switch. Default false. On for end-user retrieval; off for internal pipelines (consolidation, audit walks) where you want the cheapest possible score.

Common questions and how to phrase them

"What did we decide about X?"

RankedSearchQuery::new()
    .with_text(format!("what did we decide about {}", topic))
    .with_filter(ThoughtQuery::new()
        .with_types(vec![ThoughtType::Decision])
        .with_concepts_any([topic]))
    .with_graph(RankedSearchGraph::new().with_max_depth(2))
    .with_reranking(50)
    .with_limit(10)

"What did we learn about X?" / "What went wrong when we did X?" / "What's our policy on X?"

// Lessons / insights about a topic:
RankedSearchQuery::new()
    .with_text(format!("lessons about {}", topic))
    .with_types([ThoughtType::LessonLearned, ThoughtType::Insight])
    .with_min_importance(0.6)
    .with_limit(15)

// Mistakes + lessons scoped to a task tag:
RankedSearchQuery::new()
    .with_text(format!("mistakes failures errors during {}", task))
    .with_types([ThoughtType::Mistake, ThoughtType::LessonLearned])
    .with_tags_any([format!("task:{}", task)])
    .with_graph(RankedSearchGraph::new().with_max_depth(2))

// Current policy (Decision + Constraint + Correction, scoped by concept):
RankedSearchQuery::new()
    .with_text(format!("policy for {}", topic))
    .with_types([ThoughtType::Decision, ThoughtType::Constraint,
                         ThoughtType::Correction])
    .with_concepts_any([format!("policy:{}", topic)])
    .with_reranking(50)
    .with_limit(5)

Testing this pattern

A minimal test that the ask → retrieve → answer flow surfaces the supersession chain:

#[test]
fn rate_limit_query_returns_supersede_chain() {
    let mut chain = test_chain();
    chain.upsert_agent("executor", None, None, None, None).unwrap();
    seed_decisions(&mut chain, "executor").unwrap();

    let results = chain.query_ranked(
        &RankedSearchQuery::new()
            .with_text("rate limiting")
            .with_graph(RankedSearchGraph::new().with_max_depth(2))
            .with_limit(20)
    );

    let types: Vec<_> = results.hits.iter()
        .map(|h| h.thought.thought_type).collect();
    assert!(types.contains(&ThoughtType::Decision),
        "should find the latest decision");
    assert!(results.hits.iter().any(|h|
        h.thought.content.contains("100 req/min")),
        "should also surface the superseded older decision");
    assert!(results.hits.iter().any(|h|
        h.thought.thought_type == ThoughtType::Mistake),
        "graph expansion should pull in the triggering mistake");
}

Cross-references

0.4 Search-First Discipline — the same RankedSearchQuery powers the search-before-append loop.
1.1 Episodic Task Memory — produces the graph edges this chapter's graph pass traverses.
2.4 Federated Team Memory — the cross-chain search and BranchesFrom machinery in production.

What's next

You can now answer "what did we decide about X?" with the full reasoning chain, at any point in time, across chain boundaries. The next pattern, Preference Learning, closes the loop: the agent not only retrieves prior decisions, it updates them as the user expresses new preferences — and the same retrieval pipeline is what surfaces the conflict.