April 17, 2026

Inside MentisDB Ranked Search — One Call, One Pipeline

A common question from agent authors integrating MentisDB is: when I search for a thought type with some keywords, do I have to refine in a loop? The short answer is no. One call to mentisdb_ranked_search runs the entire hybrid retrieval pipeline — filter, BM25, vector similarity, graph expansion, session cohesion, Reciprocal Rank Fusion — and returns a single ordered list with a fully decomposed score breakdown for every hit.

This post walks through the code path top to bottom so you can reason about what each knob in RankedSearchQuery actually changes. Every reference points at the real file and line number in the current source tree.

TL;DR. A single mentisdb_ranked_search call returns hits scored by lexical BM25 + optional dense vector + optional graph expansion, blended through smooth exponential fusion, optionally reranked with RRF, and sorted deterministically. Every hit carries its matched terms, match sources, and a per-signal score vector so the agent can see why it ranked where it did — no follow-up queries needed.

The Big Picture

                ┌──── 1. filter-first (indexed) ────┐
   request ───► │  thought_type / role / agent_id / │ ───► candidate set
                │  tags / concepts / since / until  │       (Vec<&Thought>)
                └───────────────────────────────────┘
                                 │
                    2. as_of temporal filter (optional)
                                 │
         ┌──────────────────── 3. three parallel scorers ────────────────────┐
         ▼                      ▼                                             ▼
 rank_candidates_      rank_candidates_                        expand_ranked_candidates
   lexically(text)        semantically(text)                     (if graph + lexical hits)
      │                      │                                          │
      │ BM25 per-field        │ embed query, cosine vs                  │ top-20 lexical seeds,
      │ (content / tags /     │ sidecar, max across                     │ BFS on adjacency index,
      │  concepts /           │ providers, freshness                    │ bounded by depth,
      │  agent_id /           │ weight (0.5 fresh, 0.3 stale)           │ visits, direction
      │  agent_registry)      │                                          │ + typed-edge boosts
      │ + Porter + lemma      │                                          │ (ContinuesFrom=0.60,
      │ + per-field DF gate   │                                          │  Corrects=0.50, …)
      ▼                      ▼                                           ▼
  HashMap<pos,LexicalHit>  HashMap<pos,f32>               HashMap<pos,RankedGraphHit>
                                 │
                    4. rank_search_hit per candidate
                       (combine signals — formula below)
                                 │
                    5. session cohesion boost (mutates in place)
                                 │
                    6. RRF rerank (opt-in, top rerank_k)
                                 │
                    7. stable deterministic sort + truncate(limit)
                                 ▼
                         RankedSearchResult

Everything below is one function: MentisDb::query_ranked at src/lib.rs:4962. The agent hands in a RankedSearchQuery and receives a RankedSearchResult. No intermediate steps, no follow-up refinement, no tool ping-pong.

Three Signals, Three Failure Modes

Before walking the phases in order, it is worth asking the more fundamental question: why does the pipeline carry three different scorers at all? The answer is that each of them is strong exactly where the other two are blind. They don't duplicate each other — they cover each other's failure modes.

Lexical — vocabulary match

BM25 matches literal words after normalization. The tokenizer lowercases, strips punctuation, Porter-stems (prefers, preferred, preferences all collapse to prefer), and expands irregular-verb lemmas (went also matches go-stemmed documents). A term either appears in the document or it doesn't. BM25 then scores by term frequency × inverse document frequency × length normalization.

It is precise and explainable — you can see exactly which tokens matched and which indexed field they came from. But it has a fundamental blind spot: paraphrase. A query for "cache latency" will not match a document that says "memory lookups are slow", because the two share zero tokens. The best BM25 implementation in the world still cannot score words it never sees.

Semantic — meaning match

Dense vector similarity sidesteps that. fastembed-minilm (local ONNX inference, no cloud) maps every thought's text into a 384-dimensional vector such that meanings near each other in conceptual space sit near each other in vector space. The query is embedded the same way. Candidates are scored by cosine similarity — the angle between vectors.

happy and joyful score high. "cache latency" and "memory lookups are slow" score high. The trade-off is that the score is less explainable and noisier on exact-match queries: a thought about databases can look close to a thought about warehouses because the embedding model learned they cluster. That is why MentisDB fuses vector and lexical rather than replacing one with the other — when the lexical score is strong, trust lexical; when it is weak, let the vector carry the hit. That is what the vector * (1 + 35 * exp(-lexical / 3)) term does.

Graph — relationship match

BM25 and the vector model both score a single document in isolation. Neither of them knows that a great hit is sitting two hops away, hidden behind the wrong vocabulary.

Thoughts in MentisDB form a graph. Two kinds of edges connect them:

refs: positional back-references (raw append-order indices);
relations: typed edges — ContinuesFrom, Corrects, Invalidates, Supersedes, DerivedFrom, Summarizes, CausedBy, Supports, Contradicts, BranchesFrom, RelatedTo, References.

These edges encode why one memory relates to another. A Decision thought that says “we picked LRU eviction” can carry a DerivedFrom edge pointing at the Finding that measured the cache-miss rate — even though the Finding uses entirely different vocabulary.

That is exactly the failure mode lexical and vector can't fix alone: the right answer exists, but it is a hop or two away. So after lexical scoring finds the top-20 seed hits, BFS walks the adjacency graph outward from each seed. Every reached thought inherits a graph-proximity score of 1 / depth, plus a per-relation-kind boost (ContinuesFrom is worth more than RelatedTo, which is worth more than References). The expanded neighbors re-enter the ranking pool alongside the original lexical and vector hits.

Why BFS specifically, not DFS

Once you have decided to traverse the graph, choice of traversal matters:

Shortest paths. BFS guarantees that the first time a node is reached, it is by its shortest path from the seed — which is what makes graph_proximity = 1 / depth a well-defined signal. DFS has no such guarantee.
Bounded blast radius. A max_visited cap on BFS is a clean, monotonic budget. DFS can wander deep into one branch and never explore a closer branch sitting right next to the seed.
Determinism. Combined with a canonical edge ordering, BFS gives identical traversals on identical inputs — which is what RRF needs to produce a stable rank list across calls.

In one line: lexical = vocabulary match, semantic = meaning match, BFS = relationship match. The three cover different failure modes of each other, and MentisDB fuses all three into one score.

Phase 1 — Filter First

The pipeline starts at src/lib.rs:4913 with MentisDb::query(&request.filter). The filter is an embedded ThoughtQuery with exactly the same semantics as the plain mentisdb_search tool: thought_type, role, agent_id, tags, concepts, and the since/until position bounds narrow the candidate set using per-index lookups before any ranking happens.

This matters: BM25 never scores a thought that was eliminated by the type filter. That keeps the scoring corpus small and the results deterministic across repeat calls.

Phase 2 — Temporal (As-Of) Filter

If the request carries as_of, src/lib.rs:4968 removes two groups of thoughts from the candidate set:

any thought with timestamp > as_of — the agent asked “what did we know at time T?”, so later writes are invisible;
any thought in the precomputed invalidated_thought_ids set whose Supersedes / Corrects / Invalidates edge was authored at or before as_of.

Without as_of, this phase is a no-op. With it, you get point-in-time retrieval semantics for free.

Phase 3 — Three Parallel Scorers

The surviving candidates are handed to three independent scoring functions. Each produces a map keyed by the thought's append-order index.

3a. Lexical BM25

rank_candidates_lexically (src/lib.rs:6345) builds a LexicalIndex over the full chain (with agent-registry tokens included so agent-name hits can surface) and runs search_in_positions restricted to the candidate positions. The index is a classic inverted-file BM25 implementation (src/search/lexical.rs) with two refinements:

Porter stemming + irregular-verb lemma expansion. prefers, preferred, preferences all collapse to prefer. Queries carrying went also match documents containing go, and so on across ~170 irregular forms.
Per-field document-frequency gate. Terms with DF ratio above 30% are suppressed in content, tags, and concepts, but can still fire in agent_registry (60%) or agent_id (70%), where repetition is inherent.

3b. Dense vector similarity

rank_candidates_semantically (src/lib.rs:6365) runs only when a managed vector sidecar is configured — by default, local ONNX inference via fastembed-minilm. The query text is embedded once, cosine-compared against every candidate's stored vector, filtered by a minimum cosine of 0.04, and the scores across providers are combined by max weighted by sidecar freshness (0.5 if fresh, 0.3 if stale).

If no sidecar is active, this phase returns an empty map and the final score falls back to pure lexical.

3c. Graph expansion

If request.graph is set and there are lexical hits, expand_ranked_candidates (src/lib.rs:6475) picks the top 20 lexical seeds, builds a ThoughtAdjacencyIndex from the chain's refs and typed relations, and runs bounded BFS from each seed.

Expansion is controlled by max_depth, max_visited, and the traversal mode (Outgoing / Incoming / Bidirectional). Each reached thought records its depth, the number of seed paths that reached it, and the relation kinds traversed. Relation kinds carry different boosts — ContinuesFrom 0.60, Corrects/Invalidates 0.50, Supersedes 0.45, down to References 0.06. Graph proximity contributes 1 / depth.

Phase 4 — Per-Hit Fusion

For each surviving candidate, rank_search_hit (src/lib.rs:6213) assembles one RankedSearchScore:

vector_contribution = vector * (1 + 35 * exp(-lexical / 3))     // smooth fusion
importance_boost    = lexical * (importance - 0.5) * 0.3        // if lex > 0
                      (importance - 0.5) * 0.1                   // otherwise
confidence          = thought.confidence * 0.1
recency             = recency_score(thought)

total = lexical + vector_contribution + graph + relation + seed_support
        + importance_boost + confidence + recency

The vector fusion term is worth a second look. When lexical is zero (pure-semantic match), vector is amplified by roughly 36×. By lexical = 3 the boost has decayed to ~12×. By lexical = 6 it is effectively additive. A smooth exponential avoids the nasty rank discontinuities that tiered boost step-functions introduce at bin boundaries.

The importance term is differential rather than flat: a higher-importance thought gets a boost proportional to its lexical score, so it tips close BM25 races without overriding strong lexical signal.

Phase 5 — Session Cohesion

Lines 5047–5076 walk the hits and promote thoughts that sit near a strong lexical seed, even if they themselves share no query terms. The rule (current v0.8.9 values):

seeds are hits with lexical ≥ 3.0;
candidates with lexical < 5.0 within 12 positions of any seed receive a boost of 1.2 · (1 − nearest / 12);
candidates with lexical ≥ 5.0 are skipped — they stand on their own.

This is the trick that surfaces evidence turns adjacent to the matching turn in LoCoMo-style benchmarks, and it's one of the three changes that moved LoCoMo-10P from 72.8% to 74.6% in v0.8.5.

Phase 6 — Reciprocal Rank Fusion (Opt-In)

If the query sets enable_reranking and rerank_k (default 50), lines 5084–5147 do a second, rank-based pass:

sort hits by lexical-only score — list L;
sort hits by vector-only score — list V;
sort hits by graph + relation + seed_support — list G;
take the top rerank_k of each;
fuse with RRF: sum(1 / (60 + rank_i(d))) across the three lists (src/search/ranked.rs:28);
replace total with rrf + graph + relation + seed_support + importance + confidence + recency + session_cohesion.

RRF is robust when the absolute magnitudes of lexical and vector scores are not comparable: a document that places in the top-10 of both lists dominates a document that tops one list and is absent from the other. It is pure arithmetic — no LLM, no external call.

Phase 7 — Deterministic Sort

Finally, lines 5151–5174 sort by total descending, with a long tiebreaker cascade over the individual score components, importance, confidence, and finally thought.index. The result is then truncated to limit.

That determinism means two identical queries on the same chain snapshot return identical results in the same order. No hidden randomness, no “try again and hope.”

The Score Vector

Every returned hit carries a full RankedSearchScore plus the matched query terms and the indexed fields that fired:

{
  "score": {
    "lexical":          2.91,
    "vector":           0.27,
    "graph":            0.18,
    "relation":         0.05,
    "seed_support":     0.00,
    "importance":       0.00,
    "confidence":       0.00,
    "recency":          0.00,
    "session_cohesion": 0.40,
    "rrf":              0.00,
    "total":            3.14
  },
  "matched_terms":  ["latency", "ranking"],
  "match_sources":  ["content", "tags", "agent_registry"],
  "graph_distance": 1,
  "graph_path":     [ ...locators and relation kinds... ]
}

This is the auditability story. If a result surprises you, the breakdown tells you exactly which signal fired — strong BM25? a graph-expanded neighbor? a cohesion boost from an adjacent turn? — with no guesswork.

Code Landmarks

Phase	Function	Location
1. Filter-first	`MentisDb::query`	`src/lib.rs:4913`
2. As-of filter	inline in `query_ranked`	`src/lib.rs:4968`
3a. Lexical BM25	`rank_candidates_lexically`	`src/lib.rs:6345`
3a. Lexical index	`LexicalIndex::search_in_positions`	`src/search/lexical.rs:500`
3b. Vector	`rank_candidates_semantically`	`src/lib.rs:6365`
3c. Graph expand	`expand_ranked_candidates`	`src/lib.rs:6475`
3c. BFS	`GraphExpansionResult::expand`	`src/search/expansion.rs:117`
4. Per-hit fusion	`rank_search_hit`	`src/lib.rs:6213`
5. Session cohesion	inline	`src/lib.rs:5042–5076`
6. RRF rerank	inline + `rrf_merge_three`	`src/lib.rs:5084`, `src/search/ranked.rs:28`
7. Final sort	inline	`src/lib.rs:5151–5174`

The Agent's Contract

You do not need a refinement loop. One call returns a fully-scored, ordered list. For grouped context (seed plus its graph neighbors clustered beneath it), call mentisdb_context_bundles instead — also single-pass.

The knobs that shape the pipeline without changing its shape:

graph — turns on phase 3c and adds a graph list to RRF;
enable_reranking / rerank_k — turns on phase 6;
as_of — enables phase 2 point-in-time semantics;
min_confidence / min_importance / since / until — pre-filter during phase 1;
thought_types, roles, agent_ids, tags_any, concepts_any, entity_type — the embedded ThoughtQuery used by phase 1.

Everything composes inside one function call. The agent reads the score breakdown, decides what to do with it, and moves on. That simplicity is the whole point.