← Blog
April 14, 2026

MentisDB 0.9.1 — The 0.9.x Journey: From Competitive Analysis to 74% R@10

Four days ago we published a competitive analysis and discovered MentisDB was missing temporal facts, memory dedup, a CLI, webhooks, federated search, and an official Python client. Today, after 11 releases (0.8.2 → 0.9.1), all of those gaps are closed and we've run our first full 10-persona LoCoMo benchmark: 74.0% R@10 on 1,977 queries.

This post is the full story: what we found, what we shipped, how we compare to the competitive field today, and what the benchmark tells us about where to improve next.


The Starting Point: April 10 Competitive Analysis

On April 10, we published a detailed competitive analysis of six agentic memory systems. The conclusion was honest: MentisDB had unique strengths (Rust, embedded storage, cryptographic hash chain, no-LLM-required core) but was missing most of the features users expected — temporal facts, memory dedup, multi-level scopes, custom ontology, episode provenance, and an MCP server.

The competitive landscape we surveyed:

SystemLanguageStorageLLM RequiredLocal-FirstCrypto IntegrityHybrid Retrieval
MentisDBRustEmbedded (sled)No (opt-in)YesHash chainBM25+vec+graph
Mem0PythonExternal DBYesSelf-host optionNovec+keyword
Graphiti/ZepPythonExternal DBYesSelf-host onlyNosemantic+kw+graph
Letta/MemGPTPython/TSExternal DBYesSelf-host optionNoNo
Neo4j LLM KBPythonNeo4jYesNoNoMulti-mode
CogneePythonExternal DBYesPartialNovec+graph

The analysis identified six major gaps we needed to close before 1.0. We set a roadmap targeting 0.8.2 for temporal facts, dedup, and CLI; 0.9.0 for ecosystem features. We shipped all of it — and more.


What We Shipped: 0.8.2 → 0.9.1

In 11 releases, we closed every feature gap identified in the April 10 analysis:

FeatureVersionDescription
Temporal Facts0.8.2valid_at / invalid_at on thoughts; as_of query parameter for point-in-time retrieval
Memory Dedup0.8.2Jaccard similarity threshold on append; auto-Supersedes relation for near-duplicate thoughts
Multi-Level Scopes0.8.2MemoryScope enum (User, Session, Agent) on thoughts; scoped search filters
CLI Tool0.8.2mentisdb CLI: add, search, list, agents, chain subcommands
Reciprocal Rank Fusion0.8.6RRF reranking merges BM25, vector, and graph signals; --reranking flag on benchmark
Memory Branching0.8.6BranchesFrom relation; POST /v1/chains/branch creates divergent chains from any checkpoint
Per-Field BM25 DF Cutoffs0.8.6Document-frequency-based field weighting improves precision on high-signal fields
Custom Ontology0.8.7entity_type field on thoughts; per-chain entity type registry; schema validation at API layer
Episode Provenance0.8.8source_episode field; DerivedFrom relation kind; full lineage from derived fact to source
LLM Reranking0.8.8Optional cross-encoder reranking on candidate lists; pluggable reranker interface
Federated Cross-Chain Search0.9.1ancestor_chain_keys() walks BranchesFrom; ranked search transparently queries ancestors
Webhooks0.9.1HTTP POST callbacks on thought append; mentisdb_register_webhook MCP tool + REST endpoint
Opt-in LLM Extraction0.9.1GPT-4o (or any OpenAI-compatible endpoint) extracts structured ThoughtInput records from raw text; review-before-append workflow
Python Client (pymentisdb)0.9.1Full MentisDbClient on PyPI; LangChain MentisDbMemory; typed enums and relations
Wizard Brew-First Setup0.9.1Interactive setup wizard detects Homebrew mcp-remote and writes correct Claude Desktop config automatically

LoCoMo Benchmark Results: 74.0% R@10

We ran the full LoCoMo 10-persona benchmark against MentisDB 0.9.1: 1,977 queries across 10 persona × ~197 sessions, with up to 300 conversation turns each, ingested into a single chain with ContinuesFrom relations.

LoCoMo 10-Persona Results (1977 queries)

R@10: 74.0% R@20: 80.8% R@50: 88.5%

Single-hop: 78.0%  |  Multi-hop: 59.1%

Evaluation: 94 seconds  |  20.9 queries/second

By Question Type

TypeR@10R@20R@50Correct / Total
Single-hop78.0%84.0%90.2%1,212 / 1,554
Multi-hop59.1%69.0%82.0%250 / 423
Overall74.0%80.8%88.5%1,462 / 1,977

Near-Miss Analysis

Of the 515 missed queries, 44.3% do not appear in the top-50 results at all — a lexical coverage gap, not a ranking problem:

Sample Misses (Single-Hop)

Missed queries consistently show high lexical scores but near-zero vector scores. The lexical matcher finds related content using surface-term overlap, but vector similarity fails to connect semantically related content with different wording:

Q: what did caroline research?
Evidence: "researching adoption agencies — it's been a dream to have a family..."
Retrieved: "that's great news. what did you do?"  (lexical=9.9, vector=0.0)

Q: when did caroline have a picnic?
Evidence: "...picnic last week...talked about my transition journey..."
Retrieved: "sounds good, when did you have in mind?"  (lexical=9.7, vector=0.0)

This is the core improvement opportunity: closing the multi-hop gap and improving semantic matching when vocabulary diverges between query and memory.


The Competitive Field Today

Since April 10, three significant new entrants have emerged that change the landscape:

Hindsight (vectorize-io/hindsight)

9.2k GitHub stars. Server mode with external PostgreSQL, or embedded Python mode. LLM required for core operations (retain/recall/reflect). Claims SOTA on LongMemEval with independently verified scores from Virginia Tech. Four-signal retrieval (semantic, keyword, graph, temporal) merged via RRF + cross-encoder reranking. Unique: Mental Models — reflected higher-order understanding generated by the LLM.

Cognee (topoteretes/cognee)

Crossed 15k GitHub stars and shipped v1.0.0 on April 11. Native Hermes Agent integration as memory provider, MCP client package, and Cognee Cloud managed service. Still requires external databases and an LLM for the cognify pipeline.

LangMem (langchain-ai/langmem)

1.4k stars. Memory primitives library integrated natively into LangGraph Platform. Pluggable storage backend (in-memory → Postgres). The default in LangGraph Platform deployments gives it massive distribution advantage. LLM required; no graph traversal or temporal facts.

Updated Feature Comparison (April 14)

Feature MentisDB Hindsight Cognee LangMem Mem0 Graphiti
LanguageRustPythonPythonPythonPythonPython
StorageEmbedded (sled)External (PG)ExternalExternalExternal DBExternal DB
LLM RequiredNo (opt-in)YesYesYesYesYes
Local-FirstYesNoNoNoPartialNo
Crypto IntegrityHash chainNoNoNoNoNo
Hybrid RetrievalBM25+vec+graph4-signal RRFvec+graphvec onlyvec+keywordsem+kw+graph
MCP ServerBuilt-inNoMCP clientNoNoYes
Agent RegistryYesNoNoNoNoNo
Federated SearchCross-chainNoNoNoNoNo
Skills/ExtensionsSkill registryNoNoNoNoNo
WebhooksYesNoNoNoNoNo
Temporal Facts0.8.2Via metadataNoNoUpdatesvalid_at
Memory Dedup0.8.2NoMergeNoYesMerge
Benchmark R@1074.0%SOTA (indep.)N/AN/AN/AN/A

What Makes MentisDB Different Today

The combination of properties that no competitor shares has grown since April 10:

  1. Rust + embedded storage + no-LLM-required + cryptographic hash chain. Still unique. Every competitor requires external databases and/or an LLM for core operations.
  2. Federated cross-chain search (0.9.1). Walk BranchesFrom relations to query parent chains from a branch. No competitor has this.
  3. Skill registry with versioning, deprecation, and revocation (0.9.x). Signed thought support. No competitor has anything comparable.
  4. Webhooks (0.9.1). HTTP POST callbacks on thought append. Fire-and-forget delivery via tokio spawn. No competitor has this.
  5. Opt-in LLM extraction (0.9.1). Keep the no-LLM core for pure structural retrieval; add LLM-powered extraction only when needed. All competitors require LLM for core.
  6. pymentisdb Python client (0.9.1). Full MentisDbClient with LangChain integration, complete enum coverage, typed relations. Enables Python ecosystem adoption.

What We're Still Missing

Honest gaps — what we'd need to win in specific competitive situations:

GapImpactPath to Close
Academic benchmark verification Hindsight's scores are independently verified by Virginia Tech; ours are self-reported Partner with an academic group (Sanghani Center at VT or similar)
Native LangChain store LangMem is the default in LangGraph Platform; massive distribution advantage Build langchain-mentisdb pip package with BaseStore implementation
Multi-hop recall (59.1% R@10) Multi-hop is 19pp behind single-hop; biggest single improvement opportunity Improve graph traversal depth, add entity coreference, expand ContinuesFrom chains
Vector sidecar contribution Near-zero vector scores on misses; semantic layer not firing in many cases Debug FastEmbed loading; improve embedding coverage; add query expansion
Managed cloud service Mem0, Cognee, Hindsight, Fast.io all offer hosted versions MentisDB Cloud — managed service with zero infrastructure setup
Memory consolidation tiers agentmemory uses Ebbinghaus decay; Hindsight uses Mental Models reflection Implement automatic memory consolidation (working → episodic → semantic)

What's Next

The 0.9.x release establishes the feature foundation. The next push is on three fronts:

  1. Benchmark quality: Multi-hop is the primary gap (59.1% vs 78.0%). Improving ContinuesFrom chain density, adding entity coreference resolution, and fixing the vector sidecar contribution should close most of the 19pp gap.
  2. Ecosystem distribution: Native LangChain store, LlamaIndex connector, and explicit Claude Code / Cursor plugins. The skill registry gives us a hook for agent-specific memory capture.
  3. Academic credibility: Independent benchmark verification would put us on equal footing with Hindsight's independently verified claims.

See the ROADMAP.md for the full 1.0 pipeline.


Upgrade Instructions

cargo install mentisdb --force

Or download the binary from GitHub Releases.

pip install pymentisdb --upgrade