A common question from agent authors integrating MentisDB is: when I search
for a thought type with some keywords, do I have to refine in a loop?
The short answer is no. One call to mentisdb_ranked_search runs
the entire hybrid retrieval pipeline — filter, BM25, vector similarity,
graph expansion, session cohesion, Reciprocal Rank Fusion — and returns a
single ordered list with a fully decomposed score breakdown for every hit.
This post walks through the code path top to bottom so you can reason about
what each knob in RankedSearchQuery actually changes. Every
reference points at the real file and line number in the current source tree.
TL;DR. A single mentisdb_ranked_search call
returns hits scored by lexical BM25 + optional dense vector + optional graph
expansion, blended through smooth exponential fusion, optionally reranked
with RRF, and sorted deterministically. Every hit carries its matched terms,
match sources, and a per-signal score vector so the agent can see why
it ranked where it did — no follow-up queries needed.
┌──── 1. filter-first (indexed) ────┐
request ───► │ thought_type / role / agent_id / │ ───► candidate set
│ tags / concepts / since / until │ (Vec<&Thought>)
└───────────────────────────────────┘
│
2. as_of temporal filter (optional)
│
┌──────────────────── 3. three parallel scorers ────────────────────┐
▼ ▼ ▼
rank_candidates_ rank_candidates_ expand_ranked_candidates
lexically(text) semantically(text) (if graph + lexical hits)
│ │ │
│ BM25 per-field │ embed query, cosine vs │ top-20 lexical seeds,
│ (content / tags / │ sidecar, max across │ BFS on adjacency index,
│ concepts / │ providers, freshness │ bounded by depth,
│ agent_id / │ weight (0.5 fresh, 0.3 stale) │ visits, direction
│ agent_registry) │ │ + typed-edge boosts
│ + Porter + lemma │ │ (ContinuesFrom=0.60,
│ + per-field DF gate │ │ Corrects=0.50, …)
▼ ▼ ▼
HashMap<pos,LexicalHit> HashMap<pos,f32> HashMap<pos,RankedGraphHit>
│
4. rank_search_hit per candidate
(combine signals — formula below)
│
5. session cohesion boost (mutates in place)
│
6. RRF rerank (opt-in, top rerank_k)
│
7. stable deterministic sort + truncate(limit)
▼
RankedSearchResult
Everything below is one function: MentisDb::query_ranked at
src/lib.rs:4962. The agent hands in a RankedSearchQuery
and receives a RankedSearchResult. No intermediate steps, no
follow-up refinement, no tool ping-pong.
Before walking the phases in order, it is worth asking the more fundamental question: why does the pipeline carry three different scorers at all? The answer is that each of them is strong exactly where the other two are blind. They don't duplicate each other — they cover each other's failure modes.
BM25 matches literal words after normalization. The tokenizer
lowercases, strips punctuation, Porter-stems (prefers,
preferred, preferences all collapse to
prefer), and expands irregular-verb lemmas (went
also matches go-stemmed documents). A term either appears in
the document or it doesn't. BM25 then scores by
term frequency × inverse document frequency × length
normalization.
It is precise and explainable — you can see exactly which tokens
matched and which indexed field they came from. But it has a fundamental
blind spot: paraphrase. A query for
"cache latency" will not match a document that says
"memory lookups are slow", because the two share zero tokens.
The best BM25 implementation in the world still cannot score words it never
sees.
Dense vector similarity sidesteps that. fastembed-minilm
(local ONNX inference, no cloud) maps every thought's text into a
384-dimensional vector such that meanings near each other in conceptual
space sit near each other in vector space. The query is embedded the same
way. Candidates are scored by cosine similarity — the
angle between vectors.
happy and joyful score high.
"cache latency" and "memory lookups are slow"
score high. The trade-off is that the score is less explainable and noisier
on exact-match queries: a thought about databases can look close to a
thought about warehouses because the embedding model learned they cluster.
That is why MentisDB fuses vector and lexical rather than replacing one
with the other — when the lexical score is strong, trust lexical;
when it is weak, let the vector carry the hit. That is what the
vector * (1 + 35 * exp(-lexical / 3)) term does.
BM25 and the vector model both score a single document in isolation. Neither of them knows that a great hit is sitting two hops away, hidden behind the wrong vocabulary.
Thoughts in MentisDB form a graph. Two kinds of edges connect them:
refs: positional back-references
(raw append-order indices);relations: typed edges
— ContinuesFrom, Corrects,
Invalidates, Supersedes,
DerivedFrom, Summarizes,
CausedBy, Supports,
Contradicts, BranchesFrom,
RelatedTo, References.
These edges encode why one memory relates to another. A
Decision thought that says “we picked LRU eviction”
can carry a DerivedFrom edge pointing at the
Finding that measured the cache-miss rate — even though
the Finding uses entirely different vocabulary.
That is exactly the failure mode lexical and vector can't fix alone: the
right answer exists, but it is a hop or two away. So after lexical scoring
finds the top-20 seed hits, BFS walks the adjacency graph outward from each
seed. Every reached thought inherits a graph-proximity score of
1 / depth, plus a per-relation-kind boost
(ContinuesFrom is worth more than RelatedTo,
which is worth more than References). The expanded neighbors
re-enter the ranking pool alongside the original lexical and vector hits.
Once you have decided to traverse the graph, choice of traversal matters:
graph_proximity = 1 / depth a well-defined
signal. DFS has no such guarantee.max_visited cap on
BFS is a clean, monotonic budget. DFS can wander deep into one branch
and never explore a closer branch sitting right next to the seed.In one line: lexical = vocabulary match, semantic = meaning match, BFS = relationship match. The three cover different failure modes of each other, and MentisDB fuses all three into one score.
The pipeline starts at src/lib.rs:4913 with
MentisDb::query(&request.filter). The filter is
an embedded ThoughtQuery with exactly the same semantics as the
plain mentisdb_search tool: thought_type, role,
agent_id, tags, concepts, and the
since/until position bounds narrow the candidate
set using per-index lookups before any ranking happens.
This matters: BM25 never scores a thought that was eliminated by the type filter. That keeps the scoring corpus small and the results deterministic across repeat calls.
If the request carries as_of, src/lib.rs:4968
removes two groups of thoughts from the candidate set:
timestamp > as_of — the agent asked
“what did we know at time T?”, so later writes are invisible;invalidated_thought_ids set
whose Supersedes / Corrects / Invalidates
edge was authored at or before as_of.
Without as_of, this phase is a no-op. With it, you get
point-in-time retrieval semantics for free.
The surviving candidates are handed to three independent scoring functions. Each produces a map keyed by the thought's append-order index.
rank_candidates_lexically (src/lib.rs:6345) builds
a LexicalIndex over the full chain (with agent-registry tokens
included so agent-name hits can surface) and runs
search_in_positions restricted to the candidate positions. The
index is a classic inverted-file BM25 implementation
(src/search/lexical.rs) with two refinements:
content, tags, and
concepts, but can still fire in agent_registry
(60%) or agent_id (70%), where repetition is inherent.
rank_candidates_semantically (src/lib.rs:6365) runs
only when a managed vector sidecar is configured — by default, local
ONNX inference via fastembed-minilm. The query text is embedded
once, cosine-compared against every candidate's stored vector, filtered by a
minimum cosine of 0.04, and the scores across providers are combined by
max weighted by sidecar freshness (0.5 if fresh, 0.3 if stale).
If no sidecar is active, this phase returns an empty map and the final score falls back to pure lexical.
If request.graph is set and there are lexical hits,
expand_ranked_candidates (src/lib.rs:6475) picks
the top 20 lexical seeds, builds a ThoughtAdjacencyIndex from
the chain's refs and typed relations, and runs
bounded BFS from each seed.
Expansion is controlled by max_depth, max_visited,
and the traversal mode (Outgoing / Incoming
/ Bidirectional). Each reached thought records its depth, the
number of seed paths that reached it, and the relation kinds traversed.
Relation kinds carry different boosts — ContinuesFrom 0.60,
Corrects/Invalidates 0.50, Supersedes
0.45, down to References 0.06. Graph proximity contributes
1 / depth.
For each surviving candidate, rank_search_hit
(src/lib.rs:6213) assembles one RankedSearchScore:
vector_contribution = vector * (1 + 35 * exp(-lexical / 3)) // smooth fusion
importance_boost = lexical * (importance - 0.5) * 0.3 // if lex > 0
(importance - 0.5) * 0.1 // otherwise
confidence = thought.confidence * 0.1
recency = recency_score(thought)
total = lexical + vector_contribution + graph + relation + seed_support
+ importance_boost + confidence + recency
The vector fusion term is worth a second look. When lexical is zero
(pure-semantic match), vector is amplified by roughly 36×. By
lexical = 3 the boost has decayed to ~12×. By
lexical = 6 it is effectively additive. A smooth exponential
avoids the nasty rank discontinuities that tiered boost step-functions
introduce at bin boundaries.
The importance term is differential rather than flat: a higher-importance thought gets a boost proportional to its lexical score, so it tips close BM25 races without overriding strong lexical signal.
Lines 5047–5076 walk the hits and promote thoughts that
sit near a strong lexical seed, even if they themselves share no
query terms. The rule (current v0.8.9 values):
lexical ≥ 3.0;lexical < 5.0 within 12
positions of any seed receive a boost of
1.2 · (1 − nearest / 12);lexical ≥ 5.0 are skipped — they
stand on their own.This is the trick that surfaces evidence turns adjacent to the matching turn in LoCoMo-style benchmarks, and it's one of the three changes that moved LoCoMo-10P from 72.8% to 74.6% in v0.8.5.
If the query sets enable_reranking and rerank_k
(default 50), lines 5084–5147 do a second, rank-based
pass:
graph + relation + seed_support —
list G;rerank_k of each;sum(1 / (60 + rank_i(d))) across the three
lists (src/search/ranked.rs:28);total with
rrf + graph + relation + seed_support + importance + confidence
+ recency + session_cohesion.RRF is robust when the absolute magnitudes of lexical and vector scores are not comparable: a document that places in the top-10 of both lists dominates a document that tops one list and is absent from the other. It is pure arithmetic — no LLM, no external call.
Finally, lines 5151–5174 sort by
total descending, with a long tiebreaker cascade over the
individual score components, importance, confidence,
and finally thought.index. The result is then truncated to
limit.
That determinism means two identical queries on the same chain snapshot return identical results in the same order. No hidden randomness, no “try again and hope.”
Every returned hit carries a full RankedSearchScore plus the
matched query terms and the indexed fields that fired:
{
"score": {
"lexical": 2.91,
"vector": 0.27,
"graph": 0.18,
"relation": 0.05,
"seed_support": 0.00,
"importance": 0.00,
"confidence": 0.00,
"recency": 0.00,
"session_cohesion": 0.40,
"rrf": 0.00,
"total": 3.14
},
"matched_terms": ["latency", "ranking"],
"match_sources": ["content", "tags", "agent_registry"],
"graph_distance": 1,
"graph_path": [ ...locators and relation kinds... ]
}
This is the auditability story. If a result surprises you, the breakdown tells you exactly which signal fired — strong BM25? a graph-expanded neighbor? a cohesion boost from an adjacent turn? — with no guesswork.
| Phase | Function | Location |
|---|---|---|
| 1. Filter-first | MentisDb::query | src/lib.rs:4913 |
| 2. As-of filter | inline in query_ranked | src/lib.rs:4968 |
| 3a. Lexical BM25 | rank_candidates_lexically | src/lib.rs:6345 |
| 3a. Lexical index | LexicalIndex::search_in_positions | src/search/lexical.rs:500 |
| 3b. Vector | rank_candidates_semantically | src/lib.rs:6365 |
| 3c. Graph expand | expand_ranked_candidates | src/lib.rs:6475 |
| 3c. BFS | GraphExpansionResult::expand | src/search/expansion.rs:117 |
| 4. Per-hit fusion | rank_search_hit | src/lib.rs:6213 |
| 5. Session cohesion | inline | src/lib.rs:5042–5076 |
| 6. RRF rerank | inline + rrf_merge_three | src/lib.rs:5084, src/search/ranked.rs:28 |
| 7. Final sort | inline | src/lib.rs:5151–5174 |
You do not need a refinement loop. One call returns a
fully-scored, ordered list. For grouped context (seed plus its
graph neighbors clustered beneath it), call
mentisdb_context_bundles instead — also single-pass.
The knobs that shape the pipeline without changing its shape:
graph — turns on phase 3c and adds a graph list to RRF;enable_reranking / rerank_k — turns on phase 6;as_of — enables phase 2 point-in-time semantics;min_confidence / min_importance / since /
until — pre-filter during phase 1;thought_types, roles, agent_ids,
tags_any, concepts_any, entity_type
— the embedded ThoughtQuery used by phase 1.Everything composes inside one function call. The agent reads the score breakdown, decides what to do with it, and moves on. That simplicity is the whole point.