1.7 Error and Mistake Memory
The problem
Last Tuesday your agent rewrote a Postgres migration in the wrong order, dropped a column, broke staging for forty minutes. You corrected it, the agent apologized, the day moved on. This Tuesday it did the same thing again.
The agent didn't learn — because nothing was recorded. The apology lived only in the conversation transcript; the correction lived only in your chat reply, which the agent treats as ephemeral feedback, not durable state. This is the mistake-memory gap: an agent that does not explicitly persist its failures will, with high probability, repeat them. LLMs have no built-in feedback loop from "I was wrong" to "I will not be wrong this way again." MentisDB gives you the storage; this chapter gives you the pattern.
Why it's hard
- Failures are often silent. A tool call returns an error, the agent retries with a slightly different parameter, the second attempt works. The chain of reasoning that almost failed is invisible to anyone not watching the live session.
-
User corrections look like normal conversation.
"no, the table is
users_v2, notusers" is a correction, but it reads like a clarification. Without a heuristic, the agent treats it as a one-off. -
Lessons decay. A rule true in March may be wrong
in June. A
LessonLearnedthat is never re-validated becomes folklore. - Apologies are cheap. "Sorry, you're right" takes one token. Writing the structured triple takes three deliberate appends. The asymmetry means most agents skip the latter.
The pattern: the mistake triple
Every durable failure gets three thoughts, in order:
-
Mistake— recorded at the moment of failure. What went wrong, what was attempted, what the error was. Raw, specific, no spin. -
Correction— recorded when the correct approach is identified. By the user, a successful retry, or an external tool. This is the "what to do instead" thought. -
LessonLearned— recorded later, often at end of session. A generalization of the pair. "When X, do Y. When X, never do Z." The lesson is what future agents will actually search for.
The triple is connected by two relations:
-
LessonLearned —[Corrects]→ Mistake: the lesson invalidates the original mistake as a viable approach. -
LessonLearned —[DerivedFrom]→ Correction: the lesson is a synthesis derived from the working alternative.
Optionally, Correction —[Corrects]→ Mistake as well, so
graph expansion from the mistake surfaces both the immediate fix
and the generalized lesson. Three thoughts, not one, because the
moment of failure, the moment of understanding, and the moment of
synthesis are three different cognitive events at three different
times. Collapsing them loses the timing — and the timing is what
makes the graph queryable. A future agent asking "when did I learn
this?" should be able to follow the DerivedFrom edge
backward in time.
Implementation
The helper below records the full triple and wires the two relations in one call. Use it from the failure branch of any tool call.
use mentisdb::{
MentisDb, ThoughtInput, ThoughtType, ThoughtRole,
ThoughtRelationKind, RankedSearchQuery,
};
pub struct MistakeRecord {
pub mistake_index: u32,
pub correction_index: u32,
pub lesson_index: u32,
}
pub fn record_mistake_triple(
chain: &mut MentisDb, agent_id: &str, step: u32,
mistake_desc: &str, correction_desc: &str,
lesson: &str, concepts: &[&str],
) -> Result<MistakeRecord> {
let m = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Mistake, mistake_desc)
.with_concepts(concepts).with_importance(0.7)
.with_tags(["mistake"]).with_refs(vec![step]))?;
let c = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::Correction, correction_desc)
.with_concepts(concepts).with_importance(0.7)
.with_tags(["correction"]).with_refs(vec![m, step]))?;
// NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
let l = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::LessonLearned, lesson)
.with_concepts(concepts).with_importance(0.85)
.with_tags(["lesson", "scope:user"])
.with_role(ThoughtRole::Retrospective)
.with_confidence(0.6).with_refs(vec![m, c, step]))?;
// NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
// NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
Ok(MistakeRecord { mistake_index: m, correction_index: c, lesson_index: l })
}
// In the failure branch of any tool call, after retrying and
// finding the right approach:
record_mistake_triple(&mut chain, agent_id, current_step,
"Tried X. Failed: .",
"Do Y instead. .",
"When in situation Z, always do Y; never do X because .",
&["topic-a", "topic-b"],
)?;
A future agent asking "what do I know about postgres migrations?" does
one ranked search, scoped to LessonLearned with a
confidence floor. See
chapter 1.1 for
the RankedSearchQuery builder pattern.
Auto-detecting user corrections in chat
A user correcting the agent is the highest-signal source of
Mistake content, but it looks like normal prose. A
small heuristic catches most English corrections — direct
negation ("no, that's wrong"), soft correction
("actually, ..."), and replacement patterns ("use X, not
Y"). When the heuristic fires, do not write a
Mistake immediately — the user may be correcting
something from two messages ago. Search the recent chain for what
the agent just said (use mentisdb_recent_context or a
RankedSearchQuery); if a candidate
Mistake already exists, append a Correction
referencing it; if not, append a fresh Mistake with low
confidence so a human (or weekly digest) can verify. "Actually, I
think we should ship on Friday" is a preference update, not a
mistake correction — confirm the agent actually said something
wrong before appending.
The weekly lessons-learned digest
Mistakes accumulate faster than lessons. The end-of-session (or
weekly cron) digest is what promotes them: pull all
Mistake thoughts from the last 7 days, cluster by
shared concept, and for any concept with 2+ occurrences write a
single LessonLearned referencing every mistake in the
cluster. If a lesson already exists for that concept, boost its
confidence instead of duplicating. Run on a schedule (cron, or at
end of every long session). The output is a
LessonLearned per concept cluster, with
Corrects edges pointing at every individual mistake.
Future agents searching for "what do I know about X" find the digest
first, then drill into the underlying mistakes on demand.
use chrono::{Duration, Utc};
use std::collections::HashMap;
pub fn weekly_lesson_digest(
chain: &mut MentisDb, agent_id: &str,
) -> Result<u32> {
let week_ago = Utc::now() - Duration::days(7);
let mistakes = chain.query_ranked(
&RankedSearchQuery::new()
.with_types([ThoughtType::Mistake])
.with_since(week_ago).with_limit(200)
)?;
// Cluster mistakes by shared concept.
let mut by_concept: HashMap<String, Vec<u32>> = HashMap::new();
for hit in &mistakes.hits {
for c in &hit.thought.concepts {
by_concept.entry(c.clone()).or_default()
.push(hit.thought.index);
}
}
let mut promoted = 0;
for (concept, indices) in by_concept {
if indices.len() < 2 { continue; } // one-off, not a pattern
// Search-first: skip if a lesson already exists; boost
// its confidence via read-modify-write instead.
let existing = chain.query_ranked(
&RankedSearchQuery::new()
.with_text(&format!("lesson about {}", concept))
.with_types([ThoughtType::LessonLearned])
.with_concepts_any([concept.as_str()]).with_limit(1)
)?;
if !existing.hits.is_empty() { continue; }
let l = chain.append_thought(agent_id,
ThoughtInput::new(ThoughtType::LessonLearned,
format!("Recurring issue in `{}` ({} this week). \
See linked mistakes.", concept, indices.len()))
.with_concepts([concept.as_str()]).with_importance(0.9)
.with_tags(["lesson", "scope:user", "digest:weekly"])
.with_role(ThoughtRole::Retrospective)
.with_confidence(0.7).with_refs(indices.clone()))?;
for idx in &indices {
// NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
}
promoted += 1;
}
Ok(promoted)
}
Run on a schedule (cron, or at end of every long session). The output
is a LessonLearned per concept cluster, with
Corrects edges pointing at every individual mistake.
Future agents searching for "what do I know about X" find the digest
first, then drill into the underlying mistakes on demand.
Production notes
Promotion cadence — not every Mistake
deserves a LessonLearned. Immediate: a mistake
that broke the task at hand; write the Mistake and
Correction, defer the lesson to the digest.
Weekly digest: a mistake in a concept with 2+ occurrences.
Never: a one-off typo, a transient network error, a
misunderstanding of the user's prompt. Confidence decay —
LessonLearned carries confidence. Start at
0.6, add +0.1 per re-confirmation by a new
mistake in the same cluster, subtract 0.15 if
contradicted by a newer LessonLearned (record a
Supersedes edge), cap at 1.0. Set a soft
floor of confidence >= 0.5 when retrieving; older
lessons still exist (durable, append-only) — they just don't
pollute retrieval. Tagging for scope — tag
Mistake with scope:session and
LessonLearned with scope:user, matching
the scope pattern from
chapter 1.1.
Anti-patterns
Apologizing without recording
The most common mistake. The agent says "Sorry, you're right" and
moves on. Nothing was persisted. The same mistake will recur next
week. Rule: an apology without a Mistake append
is a bug in the agent. The mirror failure — writing
Mistake + Correction thoughts every time
something fails, but never synthesizing a LessonLearned —
turns the chain into a forensic log, useless for generalization. Run
the digest. Promote ruthlessly.
Promoting without verifying, or recording every transient error
Two related failure modes. Promoting without verifying: a
LessonLearned is synthesized from a
Correction that was itself wrong, so the lesson is now
confidently wrong with a confidence: 0.7 badge, and
every future agent that retrieves it will be misled.
Recording every transient error: a flaky network blip isn't
a lesson, but the digest's "2+ occurrences" filter will eventually
promote it to a bogus LessonLearned about network
reliability. The fix for both: search-first
(chapter 0.4) before promoting, and only record
Mistake thoughts for failures the agent had control
over or that the user explicitly corrected.
Testing this pattern
A minimal test verifies all three relations exist with the right endpoints:
#[test]
fn mistake_triple_creates_correct_graph() {
let mut chain = test_chain();
chain.upsert_agent("executor", None, None, None, None).unwrap();
let plan = chain.append_thought("executor",
ThoughtInput::new(ThoughtType::Plan, "Test plan")).unwrap();
let rec = record_mistake_triple(&mut chain, "executor", plan,
"Used SELECT * in production query",
"Use explicit column list; SELECT * hides schema changes",
"Always use explicit columns in production SQL.",
&["sql", "postgres"]).unwrap();
let has = |from, kind, to| chain.outbound_relations(from).iter()
.any(|r| r.kind == kind && r.target_index == to);
assert!(has(rec.lesson_index, ThoughtRelationKind::Corrects, rec.mistake_index));
assert!(has(rec.lesson_index, ThoughtRelationKind::DerivedFrom, rec.correction_index));
assert!(has(rec.correction_index, ThoughtRelationKind::Corrects, rec.mistake_index));
}
What's next
Mistake memory closes the loop on what the agent did wrong, but
doesn't compress the chain. Part 2 starts with
2.1 Semantic
Compression: when and how to roll a thousand
Subgoal thoughts into a single Summary the
agent can actually re-read.