We audited our own live MentisDB project chain to answer a simple question: are agents using the memory model the way we designed it, or are they collapsing into a much smaller subset of the system?
The answer was both encouraging and uncomfortable. Agents are definitely using MentisDB. The chain is active, valuable, and full of durable lessons. But they are also using a much narrower slice of the semantic model than the code supports.
Short version:
Agents were writing plenty of memory, but most of the graph semantics were being left on the
table. The chain strongly over-indexed on Summary, LessonLearned, and
generic References links. We updated MENTISDB_SKILL.md to correct that.
We analyzed the real local MentisDB project chain under ~/.cloudllm/mentisdb, not a
synthetic fixture. The main project chain was mentisdb.
We counted:
refs were used at allThen we compared real usage with the full set of enums supported by the codebase.
| Metric | Value |
|---|---|
| Total thoughts in main project chain | 373 |
Thoughts using refs | 116 |
Total refs | 361 |
| Thoughts using typed relations | 117 |
| Total typed relations | 362 |
| Cross-chain relations | 0 |
| Time-bounded relations | 0 |
The immediate red flag was backlinks. Only about 31% of thoughts used refs at all.
That means a lot of memories were still being written as isolated notes instead of part of a
reusable thought graph.
| ThoughtType | Count |
|---|---|
Summary | 128 |
LessonLearned | 125 |
Insight | 31 |
Decision | 27 |
PreferenceUpdate | 22 |
Plan | 11 |
Constraint | 7 |
Checkpoint | 6 |
Correction | 3 |
Mistake | 3 |
TaskComplete | 3 |
Surprise | 2 |
Wonder | 2 |
Finding | 1 |
Idea | 1 |
Question | 1 |
Used: 16 / 31 supported thought types.
The healthy part is obvious: agents are writing lots of durable memory. The concern is the shape. The chain was dominated by just a few categories, especially summaries and retrospectives.
What was underused: Hypothesis, AssumptionInvalidated, Subgoal, StrategyShift, StateSnapshot, Handoff, Reframe, Goal, and the new LLMExtracted marker were absent from the main project chain.
| ThoughtRole | Count |
|---|---|
Memory | 148 |
Checkpoint | 116 |
Retrospective | 100 |
Summary | 7 |
Handoff | 2 |
Used: 5 / 8 roles.
Unused roles like WorkingMemory, Compression, and Audit are not
necessarily a problem. They are naturally more niche. The more important signal was that the chain
had settled into a durable pattern around checkpoints and retrospectives, which is useful but not
the full system.
| ThoughtRelationKind | Count |
|---|---|
References | 361 |
ContinuesFrom | 1 |
Used: 2 / 12 relation kinds.
This was the most important finding in the entire audit.
MentisDB supports much richer graph semantics than this: Corrects,
Invalidates, DerivedFrom, Summarizes,
Supersedes, BranchesFrom, Supports,
Contradicts, and more. In practice, agents were mostly collapsing all of that into
generic References edges.
That means the memory graph was valid, but semantically flatter than it should be.
The good news is that this was mostly not a core model failure. The thought and relation system is fine. The problem was guidance.
Our own MENTISDB_SKILL.md strongly encouraged a few high-value workflows:
Summary checkpointsLessonLearned retrospectivesDecision and Constraint captureAll of that was good, and agents followed it. But the skill was not nearly forceful enough about:
ReferencesCorrection vs AssumptionInvalidated vs ReframeQuestion, Subgoal, and StateSnapshot where they were actually the better fitThere were also two documentation gaps:
Invalidates existed in code but was missing from the skill relation tableLLMExtracted existed in code but was missing from the skill's thought-type list
We revised MENTISDB_SKILL.md to drive better real-world usage in the next version.
The changes were concrete, not cosmetic:
DerivedFrom, Corrects, Invalidates, Summarizes, and ContinuesFrom are easier to choose correctlyLLMExtracted to the thought-type listInvalidates to the relation tableThe point of the skill change was not to force every enum to be used. It was to make sure agents actually exploit the graph and semantic distinctions that are already valuable.
The next version should not just ship more features. It should make the existing model easier to use correctly in practice.
The audit suggests three priorities:
In other words: MentisDB's model was already more expressive than our actual usage. That is a better problem than the reverse, but it is still a problem worth fixing.
The audit was a good sign for MentisDB overall. Agents are using it for exactly the kinds of durable memory we care about most: summaries, lessons, decisions, and checkpoints.
But it also showed that the richest parts of the model were being underused. We designed a thought graph. In practice, we were often still writing linked notes.
That is fixable, and the fix is already underway.