← Blog
April 16, 2026

What Our Own MentisDB Memory Chain Revealed

We audited our own live MentisDB project chain to answer a simple question: are agents using the memory model the way we designed it, or are they collapsing into a much smaller subset of the system?

The answer was both encouraging and uncomfortable. Agents are definitely using MentisDB. The chain is active, valuable, and full of durable lessons. But they are also using a much narrower slice of the semantic model than the code supports.

Short version:

Agents were writing plenty of memory, but most of the graph semantics were being left on the table. The chain strongly over-indexed on Summary, LessonLearned, and generic References links. We updated MENTISDB_SKILL.md to correct that.


What we audited

We analyzed the real local MentisDB project chain under ~/.cloudllm/mentisdb, not a synthetic fixture. The main project chain was mentisdb.

We counted:

Then we compared real usage with the full set of enums supported by the codebase.


Headline numbers

MetricValue
Total thoughts in main project chain373
Thoughts using refs116
Total refs361
Thoughts using typed relations117
Total typed relations362
Cross-chain relations0
Time-bounded relations0

The immediate red flag was backlinks. Only about 31% of thoughts used refs at all. That means a lot of memories were still being written as isolated notes instead of part of a reusable thought graph.


Thought type usage

ThoughtTypeCount
Summary128
LessonLearned125
Insight31
Decision27
PreferenceUpdate22
Plan11
Constraint7
Checkpoint6
Correction3
Mistake3
TaskComplete3
Surprise2
Wonder2
Finding1
Idea1
Question1

Used: 16 / 31 supported thought types.

The healthy part is obvious: agents are writing lots of durable memory. The concern is the shape. The chain was dominated by just a few categories, especially summaries and retrospectives.

What was underused: Hypothesis, AssumptionInvalidated, Subgoal, StrategyShift, StateSnapshot, Handoff, Reframe, Goal, and the new LLMExtracted marker were absent from the main project chain.


Thought role usage

ThoughtRoleCount
Memory148
Checkpoint116
Retrospective100
Summary7
Handoff2

Used: 5 / 8 roles.

Unused roles like WorkingMemory, Compression, and Audit are not necessarily a problem. They are naturally more niche. The more important signal was that the chain had settled into a durable pattern around checkpoints and retrospectives, which is useful but not the full system.


Relation usage was the real issue

ThoughtRelationKindCount
References361
ContinuesFrom1

Used: 2 / 12 relation kinds.

This was the most important finding in the entire audit.

MentisDB supports much richer graph semantics than this: Corrects, Invalidates, DerivedFrom, Summarizes, Supersedes, BranchesFrom, Supports, Contradicts, and more. In practice, agents were mostly collapsing all of that into generic References edges.

That means the memory graph was valid, but semantically flatter than it should be.


What went wrong

The good news is that this was mostly not a core model failure. The thought and relation system is fine. The problem was guidance.

Our own MENTISDB_SKILL.md strongly encouraged a few high-value workflows:

All of that was good, and agents followed it. But the skill was not nearly forceful enough about:

There were also two documentation gaps:


What we fixed

We revised MENTISDB_SKILL.md to drive better real-world usage in the next version.

The changes were concrete, not cosmetic:

The point of the skill change was not to force every enum to be used. It was to make sure agents actually exploit the graph and semantic distinctions that are already valuable.


What this means for the next version

The next version should not just ship more features. It should make the existing model easier to use correctly in practice.

The audit suggests three priorities:

  1. Better prompting and operator guidance so agents choose richer relations and more precise types.
  2. More graph-aware workflows so summaries, checkpoints, and lessons do not live as isolated blobs.
  3. Smarter evaluation of real usage so we keep checking how the memory model is being used, not just whether it exists in the code.

In other words: MentisDB's model was already more expressive than our actual usage. That is a better problem than the reverse, but it is still a problem worth fixing.


Bottom line

The audit was a good sign for MentisDB overall. Agents are using it for exactly the kinds of durable memory we care about most: summaries, lessons, decisions, and checkpoints.

But it also showed that the richest parts of the model were being underused. We designed a thought graph. In practice, we were often still writing linked notes.

That is fixable, and the fix is already underway.