1.5 RAG Over Agent History
The problem
Classic RAG retrieves from a static document corpus — a wiki, a PDF set, a transcript dump. The retriever's job is to find passages that look like the user's question.
An agent that has been running for months has a different problem. Its "corpus" is the chain of decisions, mistakes, lessons, and policies it has appended over time — thousands of thoughts in the agent's own voice, full of corrections and supersedes. The retriever's job is no longer "find passages that look like the question." It is "find the agent's own position on this question, including every prior turn of the argument."
The user asks "what did we decide about rate limiting?" and the agent
must answer not just with the latest Decision, but with the chain
of Constraint, Mistake, and Supersedes
thoughts that produced it. That is what this chapter builds.
Why it's hard
- Short query, long answer. "What did we decide about X?" is three words, but the answer may span five thoughts across two weeks.
-
The latest position is rarely the only position. A
Decisionfrom March may have beenSupersededin May, and the superseder is the one that matters. -
Time-sensitive. "What was true on April 12th?" requires a temporal
filter that respects
valid_at/invalid_at. -
Off-chain. A teammate's
Decisionon a shared topic may live in a branch chain that forked from yours. - Pure lexical misses the reasoning chain (a decision about rate limiting may not contain the words "rate limiting"). Pure vector misses corrections. You need both, plus the graph.
The retrieval pipeline
When the user asks an open-ended question and you call
chain.query_ranked, MentisDB runs a six-stage pipeline:
-
Filter — a deterministic
ThoughtQuerynarrows the universe bythought_types,concepts_any,tags_any,agent_ids,since,until, andas_of. Cheap; runs first. - Lexical — BM25 over the surviving thoughts, with automatic thesaurus expansion (since 0.9.9).
- Vector — cosine similarity from any registered embedding sidecars.
-
Graph expansion — from the top lexical seeds, traverse
refsandThoughtRelationedges up tomax_depthhops. This is what surfaces the reasoning chain behind a decision. -
RRF fusion (only if
enable_reranking) — produce lexical-only and vector-only rank lists, then merge via1/(60 + rank_lex) + 1/(60 + rank_vec). Graph, importance, recency, and session cohesion are added back as small tie-breakers. -
Rerank — apply
min_scorecutoff, sort, truncate tolimit, return.
with_text is omitted, the
graph pass is silently skipped — always pass a text query when you
want graph expansion to run.
Implementation: a complete ask → retrieve → answer flow
1. Record a few decisions about rate limiting
use mentisdb::{MentisDb, ThoughtInput, ThoughtType, ThoughtRelationKind,
RankedSearchQuery, RankedSearchGraph};
fn seed_decisions(chain: &mut MentisDb, agent_id: &str) -> Result<()> {
let d1 = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Decision,
"Rate limit public API endpoints to 100 req/min per user. \
Rationale: matches tier-1 traffic; protects against \
unbounded retry loops in client SDKs.")
.with_concepts(["api", "rate-limiting", "policy"])
.with_tags(["policy:api", "scope:user"])
.with_importance(0.9)
)?;
chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Constraint,
"Tier-1 customers average 60 req/min per user; peaks hit 200 \
req/min during business hours.")
.with_concepts(["api", "rate-limiting"]).with_importance(0.7)
.with_refs(vec![d1])
)?;
let m1 = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Mistake,
"2026-04-03: 4xx rate-limit errors for top 1% of customers. \
100 req/min was too tight; we briefly raised the limit to 500 \
in prod without a Decision.")
.with_concepts(["api", "rate-limiting", "incident"])
.with_importance(0.85)
)?;
let d2 = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Decision,
"Rate limit public API to 500 req/min per user by default, \
with per-customer override up to 5000. Supersedes the 100 \
req/min decision from 2026-03-12.")
.with_concepts(["api", "rate-limiting", "policy"])
.with_importance(0.95)
.with_refs(vec![d1, m1])
)?;
// NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
Ok(())
}
2. Ask the question
fn answer_about_rate_limiting(chain: &MentisDb) -> Result<String> {
let results = chain.query_ranked(
&RankedSearchQuery::new()
.with_text("what did we decide about rate limiting")
.with_filter(mentisdb::ThoughtQuery::new()
.with_types([ThoughtType::Decision, ThoughtType::Correction,
ThoughtType::Constraint, ThoughtType::LessonLearned])
.with_concepts_any(["rate-limiting"]))
.with_graph(RankedSearchGraph::new()
.with_max_depth(2).with_max_visited(50))
.with_reranking(50)
.with_limit(10)
.with_min_score(0.1)
);
Ok(results.hits.iter().map(|h| format!(
"- [{}] (score {:.3}) {}",
h.thought.thought_type, h.score,
h.thought.content.lines().next().unwrap_or(""),
)).collect::<Vec<_>>().join("\n"))
}
What comes back, in roughly this order: the latest Decision at the
top, then the superseded older Decision, then the
Mistake and Constraint that the graph pass pulled in
from the top hit's refs chain. Those last two don't lexically match
the query "rate limiting" — that's the point of the graph pass: it surfaces
the reasoning chain, not just the keyword match.
Temporal queries: "what was true at time T?"
Decisions get superseded. To answer "what did we believe on April 12th?",
pass as_of:
use chrono::{TimeZone, Utc};
let april_12 = Utc.with_ymd_and_hms(2026, 4, 12, 0, 0, 0).unwrap();
let results = chain.query_ranked(
&RankedSearchQuery::new()
.with_text("rate limiting policy")
.with_as_of(april_12)
.with_types([ThoughtType::Decision])
.with_limit(5)
);
// Returns the 100 req/min Decision but NOT the 500 req/min
// one — the superseder was appended on April 14.
as_of excludes thoughts appended after the timestamp, relations whose
valid_at / invalid_at window doesn't cover it, and
targets of any Supersedes / Corrects /
Invalidates relation that was already in force.
as_of filters the view at that
point in time; it doesn't edit history. A Supersedes appended in
May still exists, it just doesn't apply for an April 12th query. Right behavior
for audit and replay. If you want the latest answer, just omit
as_of.
Cross-chain queries: "did any team decide this?"
Two teams may have made conflicting decisions in branch chains forked from a shared parent. To ask "did any team in our org decide this?", use the federated search path:
use std::collections::BTreeMap;
use mentisdb::federated::FederatedSearchRequest;
let policy_q = || RankedSearchQuery::new()
.with_text("rate limiting policy")
.with_types([ThoughtType::Decision])
.with_limit(5);
let org_query = FederatedSearchRequest::new()
.with_self_query(policy_q())
.with_chain_queries(BTreeMap::from([
("team-payments".to_string(), policy_q()),
("team-search".to_string(), policy_q()),
]));
let federated = chain.federated_search(&org_query)?;
for (chain_key, ranked) in federated.per_chain.iter() {
println!("=== {} ===", chain_key);
for hit in &ranked.hits {
println!(" [{}] {}", hit.thought.thought_type,
hit.thought.content.lines().next().unwrap_or(""));
}
}
To walk a branch line back to its origin, use the BranchesFrom
relation: chain.ancestor_chain_keys(&chain_key, max_depth)
returns every parent chain key reachable via BranchesFrom edges.
Multi-hop: following a chain of decisions
Often the answer isn't in any single thought; it's in a sequence — a
Decision based on a Constraint that came from a
LessonLearned triggered by a Mistake months earlier.
Raise graph depth and read graph_path to walk it:
let results = chain.query_ranked(
&RankedSearchQuery::new()
.with_text("current API rate limit policy")
.with_graph(RankedSearchGraph::new()
.with_max_depth(3).with_max_visited(100)
.with_mode(mentisdb::search::GraphExpansionMode::Bidirectional))
.with_limit(20)
);
let mut timeline: Vec<_> = results.hits.iter().collect();
timeline.sort_by_key(|h| h.thought.index);
for hit in timeline {
let hops = hit.graph_path.as_ref()
.map(|p| p.hops_from_seed).unwrap_or(0);
println!("[{}] (hops={}) {}", hit.thought.timestamp.format("%Y-%m-%d"),
hops, hit.thought.content.lines().next().unwrap_or(""));
}
How the hybrid score is built
By default (no RRF), a hit's score is a weighted blend of lexical
similarity, vector cosine (if a sidecar is present), graph path bonus,
importance, confidence, recency, and session cohesion. Weights live in
src/search/scoring.rs; you don't usually need to touch them.
With enable_reranking on, the pipeline runs in two stages: score
the top rerank_k with the default blend to pick a candidate set,
then produce fresh lexical-only and vector-only rank lists over that set and
combine them with RRF —
1/(60 + rank_lex) + 1/(60 + rank_vec). The other signals come back
as small tie-breakers. When lexical and vector agree, RRF amplifies the
agreement; when they disagree, a thought that ranks moderately on both can
outrank one that ranks first on one and fiftieth on the other — usually right
for agent history, where a routine re-mention may lexically trump the real
policy and RRF pulls them back into balance.
Production notes
When RRF helps
-
Vague questions where lexical and vector pull in different directions
("what went wrong with the auth refactor?" — lexical wants
Mistake, vector wants the relatedLessonLearned). - Chains with lots of routinely rephrased thoughts: RRF demotes them since they rank moderately on both signals instead of top on either.
- Small candidate sets (< ~200): RRF is cheap at that scale.
When RRF hurts
- Highly specific queries (an error code, a UUID, a tag): vector side adds noise.
-
Huge chains: RRF only reranks the top
rerank_kof pre-RRF candidates; an answer at rank 200 won't be seen. Pre-filter harder or raiselimit. -
No vector sidecar: RRF degenerates to lexical ranking and you've paid for
nothing — just call
query_rankedwithoutwith_reranking.
The min_score threshold
Depends on the embedding model and domain. Short, technical queries:
0.2 - 0.3. Long, vague queries: drop to 0.05 or omit.
Always inspect the top hits with the threshold disabled before setting
one — a threshold that hides the top hit is worse than no threshold.
The enable_reranking flag
Master switch. Default false. On for end-user retrieval; off for
internal pipelines (consolidation, audit walks) where you want the cheapest
possible score.
Common questions and how to phrase them
"What did we decide about X?"
RankedSearchQuery::new()
.with_text(format!("what did we decide about {}", topic))
.with_filter(ThoughtQuery::new()
.with_types(vec![ThoughtType::Decision])
.with_concepts_any([topic]))
.with_graph(RankedSearchGraph::new().with_max_depth(2))
.with_reranking(50)
.with_limit(10)
"What did we learn about X?" / "What went wrong when we did X?" / "What's our policy on X?"
// Lessons / insights about a topic:
RankedSearchQuery::new()
.with_text(format!("lessons about {}", topic))
.with_types([ThoughtType::LessonLearned, ThoughtType::Insight])
.with_min_importance(0.6)
.with_limit(15)
// Mistakes + lessons scoped to a task tag:
RankedSearchQuery::new()
.with_text(format!("mistakes failures errors during {}", task))
.with_types([ThoughtType::Mistake, ThoughtType::LessonLearned])
.with_tags_any([format!("task:{}", task)])
.with_graph(RankedSearchGraph::new().with_max_depth(2))
// Current policy (Decision + Constraint + Correction, scoped by concept):
RankedSearchQuery::new()
.with_text(format!("policy for {}", topic))
.with_types([ThoughtType::Decision, ThoughtType::Constraint,
ThoughtType::Correction])
.with_concepts_any([format!("policy:{}", topic)])
.with_reranking(50)
.with_limit(5)
Testing this pattern
A minimal test that the ask → retrieve → answer flow surfaces the supersession chain:
#[test]
fn rate_limit_query_returns_supersede_chain() {
let mut chain = test_chain();
chain.upsert_agent("executor", None, None, None, None).unwrap();
seed_decisions(&mut chain, "executor").unwrap();
let results = chain.query_ranked(
&RankedSearchQuery::new()
.with_text("rate limiting")
.with_graph(RankedSearchGraph::new().with_max_depth(2))
.with_limit(20)
);
let types: Vec<_> = results.hits.iter()
.map(|h| h.thought.thought_type).collect();
assert!(types.contains(&ThoughtType::Decision),
"should find the latest decision");
assert!(results.hits.iter().any(|h|
h.thought.content.contains("100 req/min")),
"should also surface the superseded older decision");
assert!(results.hits.iter().any(|h|
h.thought.thought_type == ThoughtType::Mistake),
"graph expansion should pull in the triggering mistake");
}
Cross-references
-
0.4 Search-First Discipline — the
same
RankedSearchQuerypowers the search-before-append loop. - 1.1 Episodic Task Memory — produces the graph edges this chapter's graph pass traverses.
-
2.4 Federated Team Memory
— the cross-chain search and
BranchesFrommachinery in production.
What's next
You can now answer "what did we decide about X?" with the full reasoning chain, at any point in time, across chain boundaries. The next pattern, Preference Learning, closes the loop: the agent not only retrieves prior decisions, it updates them as the user expresses new preferences — and the same retrieval pipeline is what surfaces the conflict.