MentisDB 0.8.0 is the biggest search quality release we've ever shipped. We took our LongMemEval score from 57.2% to 65.0% recall, hardened the security model, made writes 13.8% faster, and cleaned up the skill file that agents actually read at startup. Here's what changed and why.
The headline number: 65.0% R@5 on LongMemEval — the standard benchmark for long-term memory retrieval. That means two out of three times, the correct memory surfaces in the first five results your agent sees. We started this release at 57.2%.
Three changes got us there. None of them required changing the storage format, re-indexing manually, or breaking the API.
The single biggest improvement. Our lexical tokenizer now stems every token before indexing and querying using the Porter stemming algorithm. "prefers", "preferred", and "preferences" all map to "prefer" — so a query for "food preferences" now matches a memory that says "I prefer Thai cuisine."
This one change took overall R@5 from 57.2% to 61.6%. Temporal reasoning improved +9.0% and user facts +8.5% — categories where queries often use different word forms than the original evidence.
Stemming is automatic. Your existing chains rebuild their lexical index on first access after upgrading. No configuration needed.
The old scoring just added BM25 and vector scores together. The problem: vector scores range from 0–0.35 while BM25 goes from 0–30+. Semantic-only matches — thoughts with zero word overlap but strong meaning similarity — never surfaced because their vector signal was drowned out.
We replaced flat addition with a tiered boost:
| Lexical Match | Vector Treatment |
|---|---|
| No match at all | 60× boost — semantic hits compete with BM25 |
| Weak match (< 1.0) | 20× ramp — partial credit for both signals |
| Strong match | Additive — vector nudges, doesn't disrupt |
We also tried Reciprocal Rank Fusion (RRF), the standard hybrid search approach. It hurt. RRF demoted strong BM25 hits by giving vector matches equal weight. For memory retrieval where keyword search is already strong, re-ranking by vector similarity makes things worse, not better. Tiered boost preserves what works and only uses vectors to promote what BM25 can't find.
User-originated thoughts carry importance ≈ 0.8; verbose
assistant responses carry importance ≈ 0.2. Previously, the
importance weight in scoring was 0.2× — essentially noise. We raised it to
3.0×.
Now user thoughts get +2.4 vs assistant +0.6. When BM25 scores are close (which they often are for preference and factual queries), the importance signal tips the race toward the memory the user actually said, not the assistant's paraphrase.
Append latency dropped 13.8% (statistically significant, p=0.01) from three hot-path improvements:
MENTISDB_GROUP_COMMIT_MS env var (default 2ms) lets you
tune the write-batching window. Set to 0 for lowest latency, higher for
bulk-load throughput.
The background writer also got double the queue capacity (128 slots) and pre-allocated buffers, reducing backpressure stalls under concurrent multi-agent writes.
The local-embeddings feature flag now uses
FastEmbed all-MiniLM-L6-v2 (384 dimensions, ONNX runtime)
instead of the previous local-text-v1 provider. It runs entirely on your
machine with no cloud dependencies and no GPU required.
The daemon auto-detects and registers FastEmbed for search when the feature is compiled in. Vector sidecars rebuild automatically on first access after upgrading.
Several changes close attack surfaces without affecting normal usage:
read_skill() now
returns SkillReadOutput { content, warnings, status }
instead of a plain string. Callers can no longer silently ignore skill
revocation status or safety warnings.
subtle::ConstantTimeEq to prevent timing attacks on the
dashboard login.
The MENTISDB_SKILL.md — the operating instructions agents read
at startup — went from 68,270 characters across 86 sections down to ~8,000
characters.
Why? Because agents were skimming it. When you hand an LLM a 68K document and tell it to follow the rules inside, it reads the first few sections and misses the operational constraints buried at line 280.
The rewrite:
Goal captures high-level objectives — broader than
Plan (which describes how) and Subgoal (which
is a component). Use it to record what the agent is trying to achieve so
future sessions can orient quickly even if plan details have changed.
That makes 30 semantic thought types total. The new variant is appended at the end of the enum (bincode-safe — no reindexing needed).
New REST endpoint POST /v1/chains/merge and MCP tool
mentisdb_merge_chains let you merge all thoughts from a
source chain into a target chain, then permanently delete the source.
Agent identities are remapped automatically by similarity matching.
mcp-remote
bridge now uses the explicit node path as the command,
bypassing shebang resolution issues on systems with multiple Node
versions. Node ≥ 20 is required and validated at setup time.
PlatformPaths,
HasOptionalQueryFields trait, and
read_length_prefixed_thoughts helper, eliminating ~260
lines of duplication.
cargo install mentisdb
Or from source:
git pull
cargo install --path . --locked
Existing chains, vector sidecars, and skill registries are migrated automatically on first startup. No manual steps required.
Single-session-preference queries (13.3% R@5) are the clearest target. These require bridging the semantic gap between implicit evidence ("I've been really into Thai cuisine lately") and preference queries ("What kind of food do I like?"). Better embeddings or a reranking step are the most likely path forward.
Multi-session recall (59.4%) also has room to grow. Graph expansion currently contributes 0.0 on misses — the traversal finds related thoughts but not the specific evidence buried in long conversations.
MentisDB is an open-source durable memory layer for AI agents. It stores memories in an append-only hash-chained log, retrieves them with hybrid lexical+semantic+graph search, and runs entirely locally with no cloud dependencies. GitHub · Docs · Website