April 16, 2026

What Our Own MentisDB Memory Chain Revealed

We audited our own live MentisDB project chain to answer a simple question: are agents using the memory model the way we designed it, or are they collapsing into a much smaller subset of the system?

The answer was both encouraging and uncomfortable. Agents are definitely using MentisDB. The chain is active, valuable, and full of durable lessons. But they are also using a much narrower slice of the semantic model than the code supports.

Short version:

Agents were writing plenty of memory, but most of the graph semantics were being left on the table. The chain strongly over-indexed on Summary, LessonLearned, and generic References links. We updated MENTISDB_SKILL.md to correct that.

What we audited

We analyzed the real local MentisDB project chain under ~/.cloudllm/mentisdb, not a synthetic fixture. The main project chain was mentisdb.

We counted:

Thought types
Thought roles
Relation kinds
How often refs were used at all

Then we compared real usage with the full set of enums supported by the codebase.

Headline numbers

Metric	Value
Total thoughts in main project chain	373
Thoughts using `refs`	116
Total `refs`	361
Thoughts using typed relations	117
Total typed relations	362
Cross-chain relations	0
Time-bounded relations	0

The immediate red flag was backlinks. Only about 31% of thoughts used refs at all. That means a lot of memories were still being written as isolated notes instead of part of a reusable thought graph.

Thought type usage

ThoughtType	Count
`Summary`	128
`LessonLearned`	125
`Insight`	31
`Decision`	27
`PreferenceUpdate`	22
`Plan`	11
`Constraint`	7
`Checkpoint`	6
`Correction`	3
`Mistake`	3
`TaskComplete`	3
`Surprise`	2
`Wonder`	2
`Finding`	1
`Idea`	1
`Question`	1

Used: 16 / 31 supported thought types.

The healthy part is obvious: agents are writing lots of durable memory. The concern is the shape. The chain was dominated by just a few categories, especially summaries and retrospectives.

What was underused: Hypothesis, AssumptionInvalidated, Subgoal, StrategyShift, StateSnapshot, Handoff, Reframe, Goal, and the new LLMExtracted marker were absent from the main project chain.

Thought role usage

ThoughtRole	Count
`Memory`	148
`Checkpoint`	116
`Retrospective`	100
`Summary`	7
`Handoff`	2

Used: 5 / 8 roles.

Unused roles like WorkingMemory, Compression, and Audit are not necessarily a problem. They are naturally more niche. The more important signal was that the chain had settled into a durable pattern around checkpoints and retrospectives, which is useful but not the full system.

Relation usage was the real issue

ThoughtRelationKind	Count
`References`	361
`ContinuesFrom`	1

Used: 2 / 12 relation kinds.

This was the most important finding in the entire audit.

MentisDB supports much richer graph semantics than this: Corrects, Invalidates, DerivedFrom, Summarizes, Supersedes, BranchesFrom, Supports, Contradicts, and more. In practice, agents were mostly collapsing all of that into generic References edges.

That means the memory graph was valid, but semantically flatter than it should be.

What went wrong

The good news is that this was mostly not a core model failure. The thought and relation system is fine. The problem was guidance.

Our own MENTISDB_SKILL.md strongly encouraged a few high-value workflows:

Summary checkpoints
LessonLearned retrospectives
Decision and Constraint capture

All of that was good, and agents followed it. But the skill was not nearly forceful enough about:

adding backlinks whenever a thought depended on prior context
choosing a stronger typed relation instead of defaulting to References
distinguishing Correction vs AssumptionInvalidated vs Reframe
using Question, Subgoal, and StateSnapshot where they were actually the better fit

There were also two documentation gaps:

Invalidates existed in code but was missing from the skill relation table
LLMExtracted existed in code but was missing from the skill's thought-type list

What we fixed

We revised MENTISDB_SKILL.md to drive better real-world usage in the next version.

The changes were concrete, not cosmetic:

Added a stronger backlink expectation for non-standalone thoughts
Added a minimum graph rule
Expanded relation-selection guidance so DerivedFrom, Corrects, Invalidates, Summarizes, and ContinuesFrom are easier to choose correctly
Added concrete linking patterns for lessons, corrections, invalidations, summaries, and task completions
Added LLMExtracted to the thought-type list
Added Invalidates to the relation table
Added a section calling out high-value but commonly skipped types

The point of the skill change was not to force every enum to be used. It was to make sure agents actually exploit the graph and semantic distinctions that are already valuable.

What this means for the next version

The next version should not just ship more features. It should make the existing model easier to use correctly in practice.

The audit suggests three priorities:

Better prompting and operator guidance so agents choose richer relations and more precise types.
More graph-aware workflows so summaries, checkpoints, and lessons do not live as isolated blobs.
Smarter evaluation of real usage so we keep checking how the memory model is being used, not just whether it exists in the code.

In other words: MentisDB's model was already more expressive than our actual usage. That is a better problem than the reverse, but it is still a problem worth fixing.

Bottom line

The audit was a good sign for MentisDB overall. Agents are using it for exactly the kinds of durable memory we care about most: summaries, lessons, decisions, and checkpoints.

But it also showed that the richest parts of the model were being underused. We designed a thought graph. In practice, we were often still writing linked notes.

That is fixable, and the fix is already underway.