Four days ago we published a competitive analysis and discovered MentisDB was missing temporal facts, memory dedup, a CLI, webhooks, federated search, and an official Python client. Today, after 11 releases (0.8.2 → 0.9.1), all of those gaps are closed and we've run our first full 10-persona LoCoMo benchmark: 74.0% R@10 on 1,977 queries.
This post is the full story: what we found, what we shipped, how we compare to the competitive field today, and what the benchmark tells us about where to improve next.
On April 10, we published a detailed competitive analysis of six agentic memory systems. The conclusion was honest: MentisDB had unique strengths (Rust, embedded storage, cryptographic hash chain, no-LLM-required core) but was missing most of the features users expected — temporal facts, memory dedup, multi-level scopes, custom ontology, episode provenance, and an MCP server.
The competitive landscape we surveyed:
| System | Language | Storage | LLM Required | Local-First | Crypto Integrity | Hybrid Retrieval |
|---|---|---|---|---|---|---|
| MentisDB | Rust | Embedded (sled) | No (opt-in) | Yes | Hash chain | BM25+vec+graph |
| Mem0 | Python | External DB | Yes | Self-host option | No | vec+keyword |
| Graphiti/Zep | Python | External DB | Yes | Self-host only | No | semantic+kw+graph |
| Letta/MemGPT | Python/TS | External DB | Yes | Self-host option | No | No |
| Neo4j LLM KB | Python | Neo4j | Yes | No | No | Multi-mode |
| Cognee | Python | External DB | Yes | Partial | No | vec+graph |
The analysis identified six major gaps we needed to close before 1.0. We set a roadmap targeting 0.8.2 for temporal facts, dedup, and CLI; 0.9.0 for ecosystem features. We shipped all of it — and more.
In 11 releases, we closed every feature gap identified in the April 10 analysis:
| Feature | Version | Description |
|---|---|---|
| Temporal Facts | 0.8.2 | valid_at / invalid_at on thoughts; as_of query parameter for point-in-time retrieval |
| Memory Dedup | 0.8.2 | Jaccard similarity threshold on append; auto-Supersedes relation for near-duplicate thoughts |
| Multi-Level Scopes | 0.8.2 | MemoryScope enum (User, Session, Agent) on thoughts; scoped search filters |
| CLI Tool | 0.8.2 | mentisdb CLI: add, search, list, agents, chain subcommands |
| Reciprocal Rank Fusion | 0.8.6 | RRF reranking merges BM25, vector, and graph signals; --reranking flag on benchmark |
| Memory Branching | 0.8.6 | BranchesFrom relation; POST /v1/chains/branch creates divergent chains from any checkpoint |
| Per-Field BM25 DF Cutoffs | 0.8.6 | Document-frequency-based field weighting improves precision on high-signal fields |
| Custom Ontology | 0.8.7 | entity_type field on thoughts; per-chain entity type registry; schema validation at API layer |
| Episode Provenance | 0.8.8 | source_episode field; DerivedFrom relation kind; full lineage from derived fact to source |
| LLM Reranking | 0.8.8 | Optional cross-encoder reranking on candidate lists; pluggable reranker interface |
| Federated Cross-Chain Search | 0.9.1 | ancestor_chain_keys() walks BranchesFrom; ranked search transparently queries ancestors |
| Webhooks | 0.9.1 | HTTP POST callbacks on thought append; mentisdb_register_webhook MCP tool + REST endpoint |
| Opt-in LLM Extraction | 0.9.1 | GPT-4o (or any OpenAI-compatible endpoint) extracts structured ThoughtInput records from raw text; review-before-append workflow |
| Python Client (pymentisdb) | 0.9.1 | Full MentisDbClient on PyPI; LangChain MentisDbMemory; typed enums and relations |
| Wizard Brew-First Setup | 0.9.1 | Interactive setup wizard detects Homebrew mcp-remote and writes correct Claude Desktop config automatically |
We ran the full LoCoMo 10-persona benchmark against MentisDB 0.9.1: 1,977 queries
across 10 persona × ~197 sessions, with up to 300 conversation turns each, ingested
into a single chain with ContinuesFrom relations.
LoCoMo 10-Persona Results (1977 queries)
R@10: 74.0% R@20: 80.8% R@50: 88.5%
Single-hop: 78.0% | Multi-hop: 59.1%
Evaluation: 94 seconds | 20.9 queries/second
| Type | R@10 | R@20 | R@50 | Correct / Total |
|---|---|---|---|---|
| Single-hop | 78.0% | 84.0% | 90.2% | 1,212 / 1,554 |
| Multi-hop | 59.1% | 69.0% | 82.0% | 250 / 423 |
| Overall | 74.0% | 80.8% | 88.5% | 1,462 / 1,977 |
Of the 515 missed queries, 44.3% do not appear in the top-50 results at all — a lexical coverage gap, not a ranking problem:
Missed queries consistently show high lexical scores but near-zero vector scores. The lexical matcher finds related content using surface-term overlap, but vector similarity fails to connect semantically related content with different wording:
Q: what did caroline research?
Evidence: "researching adoption agencies — it's been a dream to have a family..."
Retrieved: "that's great news. what did you do?" (lexical=9.9, vector=0.0)
Q: when did caroline have a picnic?
Evidence: "...picnic last week...talked about my transition journey..."
Retrieved: "sounds good, when did you have in mind?" (lexical=9.7, vector=0.0)
This is the core improvement opportunity: closing the multi-hop gap and improving semantic matching when vocabulary diverges between query and memory.
Since April 10, three significant new entrants have emerged that change the landscape:
vectorize-io/hindsight)9.2k GitHub stars. Server mode with external PostgreSQL, or embedded Python mode. LLM required for core operations (retain/recall/reflect). Claims SOTA on LongMemEval with independently verified scores from Virginia Tech. Four-signal retrieval (semantic, keyword, graph, temporal) merged via RRF + cross-encoder reranking. Unique: Mental Models — reflected higher-order understanding generated by the LLM.
topoteretes/cognee)Crossed 15k GitHub stars and shipped v1.0.0 on April 11. Native Hermes Agent integration as memory provider, MCP client package, and Cognee Cloud managed service. Still requires external databases and an LLM for the cognify pipeline.
langchain-ai/langmem)1.4k stars. Memory primitives library integrated natively into LangGraph Platform. Pluggable storage backend (in-memory → Postgres). The default in LangGraph Platform deployments gives it massive distribution advantage. LLM required; no graph traversal or temporal facts.
| Feature | MentisDB | Hindsight | Cognee | LangMem | Mem0 | Graphiti |
|---|---|---|---|---|---|---|
| Language | Rust | Python | Python | Python | Python | Python |
| Storage | Embedded (sled) | External (PG) | External | External | External DB | External DB |
| LLM Required | No (opt-in) | Yes | Yes | Yes | Yes | Yes |
| Local-First | Yes | No | No | No | Partial | No |
| Crypto Integrity | Hash chain | No | No | No | No | No |
| Hybrid Retrieval | BM25+vec+graph | 4-signal RRF | vec+graph | vec only | vec+keyword | sem+kw+graph |
| MCP Server | Built-in | No | MCP client | No | No | Yes |
| Agent Registry | Yes | No | No | No | No | No |
| Federated Search | Cross-chain | No | No | No | No | No |
| Skills/Extensions | Skill registry | No | No | No | No | No |
| Webhooks | Yes | No | No | No | No | No |
| Temporal Facts | 0.8.2 | Via metadata | No | No | Updates | valid_at |
| Memory Dedup | 0.8.2 | No | Merge | No | Yes | Merge |
| Benchmark R@10 | 74.0% | SOTA (indep.) | N/A | N/A | N/A | N/A |
The combination of properties that no competitor shares has grown since April 10:
BranchesFrom
relations to query parent chains from a branch. No competitor has this.
Honest gaps — what we'd need to win in specific competitive situations:
| Gap | Impact | Path to Close |
|---|---|---|
| Academic benchmark verification | Hindsight's scores are independently verified by Virginia Tech; ours are self-reported | Partner with an academic group (Sanghani Center at VT or similar) |
| Native LangChain store | LangMem is the default in LangGraph Platform; massive distribution advantage | Build langchain-mentisdb pip package with BaseStore implementation |
| Multi-hop recall (59.1% R@10) | Multi-hop is 19pp behind single-hop; biggest single improvement opportunity | Improve graph traversal depth, add entity coreference, expand ContinuesFrom chains |
| Vector sidecar contribution | Near-zero vector scores on misses; semantic layer not firing in many cases | Debug FastEmbed loading; improve embedding coverage; add query expansion |
| Managed cloud service | Mem0, Cognee, Hindsight, Fast.io all offer hosted versions | MentisDB Cloud — managed service with zero infrastructure setup |
| Memory consolidation tiers | agentmemory uses Ebbinghaus decay; Hindsight uses Mental Models reflection | Implement automatic memory consolidation (working → episodic → semantic) |
The 0.9.x release establishes the feature foundation. The next push is on three fronts:
ContinuesFrom chain density, adding entity coreference resolution,
and fixing the vector sidecar contribution should close most of the 19pp gap.
See the ROADMAP.md for the full 1.0 pipeline.
cargo install mentisdb --force
Or download the binary from GitHub Releases.
pip install pymentisdb --upgrade