1.7 Error and Mistake Memory

The problem

Last Tuesday your agent rewrote a Postgres migration in the wrong order, dropped a column, broke staging for forty minutes. You corrected it, the agent apologized, the day moved on. This Tuesday it did the same thing again.

The agent didn't learn — because nothing was recorded. The apology lived only in the conversation transcript; the correction lived only in your chat reply, which the agent treats as ephemeral feedback, not durable state. This is the mistake-memory gap: an agent that does not explicitly persist its failures will, with high probability, repeat them. LLMs have no built-in feedback loop from "I was wrong" to "I will not be wrong this way again." MentisDB gives you the storage; this chapter gives you the pattern.

Why it's hard

Failures are often silent. A tool call returns an error, the agent retries with a slightly different parameter, the second attempt works. The chain of reasoning that almost failed is invisible to anyone not watching the live session.
User corrections look like normal conversation. "no, the table is users_v2, not users" is a correction, but it reads like a clarification. Without a heuristic, the agent treats it as a one-off.
Lessons decay. A rule true in March may be wrong in June. A LessonLearned that is never re-validated becomes folklore.
Apologies are cheap. "Sorry, you're right" takes one token. Writing the structured triple takes three deliberate appends. The asymmetry means most agents skip the latter.

The pattern: the mistake triple

Every durable failure gets three thoughts, in order:

Mistake — recorded at the moment of failure. What went wrong, what was attempted, what the error was. Raw, specific, no spin.
Correction — recorded when the correct approach is identified. By the user, a successful retry, or an external tool. This is the "what to do instead" thought.
LessonLearned — recorded later, often at end of session. A generalization of the pair. "When X, do Y. When X, never do Z." The lesson is what future agents will actually search for.

The triple is connected by two relations:

LessonLearned —[Corrects]→ Mistake: the lesson invalidates the original mistake as a viable approach.
LessonLearned —[DerivedFrom]→ Correction: the lesson is a synthesis derived from the working alternative.

Optionally, Correction —[Corrects]→ Mistake as well, so graph expansion from the mistake surfaces both the immediate fix and the generalized lesson. Three thoughts, not one, because the moment of failure, the moment of understanding, and the moment of synthesis are three different cognitive events at three different times. Collapsing them loses the timing — and the timing is what makes the graph queryable. A future agent asking "when did I learn this?" should be able to follow the DerivedFrom edge backward in time.

Implementation

The helper below records the full triple and wires the two relations in one call. Use it from the failure branch of any tool call.

use mentisdb::{
    MentisDb, ThoughtInput, ThoughtType, ThoughtRole,
    ThoughtRelationKind, RankedSearchQuery,
};

pub struct MistakeRecord {
    pub mistake_index: u32,
    pub correction_index: u32,
    pub lesson_index: u32,
}

pub fn record_mistake_triple(
    chain: &mut MentisDb, agent_id: &str, step: u32,
    mistake_desc: &str, correction_desc: &str,
    lesson: &str, concepts: &[&str],
) -> Result<MistakeRecord> {
    let m = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Mistake, mistake_desc)
            .with_concepts(concepts).with_importance(0.7)
            .with_tags(["mistake"]).with_refs(vec![step]))?;
    let c = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::Correction, correction_desc)
            .with_concepts(concepts).with_importance(0.7)
            .with_tags(["correction"]).with_refs(vec![m, step]))?;
    // NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
    let l = chain.append_thought(agent_id,
        ThoughtInput::new(ThoughtType::LessonLearned, lesson)
            .with_concepts(concepts).with_importance(0.85)
            .with_tags(["lesson", "scope:user"])
            .with_role(ThoughtRole::Retrospective)
            .with_confidence(0.6).with_refs(vec![m, c, step]))?;
    // NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
    // NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
    Ok(MistakeRecord { mistake_index: m, correction_index: c, lesson_index: l })
}

// In the failure branch of any tool call, after retrying and
// finding the right approach:
record_mistake_triple(&mut chain, agent_id, current_step,
    "Tried X. Failed: .",
    "Do Y instead. .",
    "When in situation Z, always do Y; never do X because .",
    &["topic-a", "topic-b"],
)?;

A future agent asking "what do I know about postgres migrations?" does one ranked search, scoped to LessonLearned with a confidence floor. See chapter 1.1 for the RankedSearchQuery builder pattern.

Auto-detecting user corrections in chat

A user correcting the agent is the highest-signal source of Mistake content, but it looks like normal prose. A small heuristic catches most English corrections — direct negation ("no, that's wrong"), soft correction ("actually, ..."), and replacement patterns ("use X, not Y"). When the heuristic fires, do not write a Mistake immediately — the user may be correcting something from two messages ago. Search the recent chain for what the agent just said (use mentisdb_recent_context or a RankedSearchQuery); if a candidate Mistake already exists, append a Correction referencing it; if not, append a fresh Mistake with low confidence so a human (or weekly digest) can verify. "Actually, I think we should ship on Friday" is a preference update, not a mistake correction — confirm the agent actually said something wrong before appending.

The weekly lessons-learned digest

Mistakes accumulate faster than lessons. The end-of-session (or weekly cron) digest is what promotes them: pull all Mistake thoughts from the last 7 days, cluster by shared concept, and for any concept with 2+ occurrences write a single LessonLearned referencing every mistake in the cluster. If a lesson already exists for that concept, boost its confidence instead of duplicating. Run on a schedule (cron, or at end of every long session). The output is a LessonLearned per concept cluster, with Corrects edges pointing at every individual mistake. Future agents searching for "what do I know about X" find the digest first, then drill into the underlying mistakes on demand.

use chrono::{Duration, Utc};
use std::collections::HashMap;

pub fn weekly_lesson_digest(
    chain: &mut MentisDb, agent_id: &str,
) -> Result<u32> {
    let week_ago = Utc::now() - Duration::days(7);
    let mistakes = chain.query_ranked(
        &RankedSearchQuery::new()
            .with_types([ThoughtType::Mistake])
            .with_since(week_ago).with_limit(200)
    )?;

    // Cluster mistakes by shared concept.
    let mut by_concept: HashMap<String, Vec<u32>> = HashMap::new();
    for hit in &mistakes.hits {
        for c in &hit.thought.concepts {
            by_concept.entry(c.clone()).or_default()
                .push(hit.thought.index);
        }
    }

    let mut promoted = 0;
    for (concept, indices) in by_concept {
        if indices.len() < 2 { continue; } // one-off, not a pattern
        // Search-first: skip if a lesson already exists; boost
        // its confidence via read-modify-write instead.
        let existing = chain.query_ranked(
            &RankedSearchQuery::new()
                .with_text(&format!("lesson about {}", concept))
                .with_types([ThoughtType::LessonLearned])
                .with_concepts_any([concept.as_str()]).with_limit(1)
        )?;
        if !existing.hits.is_empty() { continue; }

        let l = chain.append_thought(agent_id,
            ThoughtInput::new(ThoughtType::LessonLearned,
                format!("Recurring issue in `{}` ({} this week). \
                         See linked mistakes.", concept, indices.len()))
                .with_concepts([concept.as_str()]).with_importance(0.9)
                .with_tags(["lesson", "scope:user", "digest:weekly"])
                .with_role(ThoughtRole::Retrospective)
                .with_confidence(0.7).with_refs(indices.clone()))?;
        for idx in &indices {
            // NOTE: add_relation does not exist. Use .with_relations(vec![ThoughtRelation::new(kind, target_uuid)]) on the ThoughtInput being appended.
        }
        promoted += 1;
    }
    Ok(promoted)
}

Run on a schedule (cron, or at end of every long session). The output is a LessonLearned per concept cluster, with Corrects edges pointing at every individual mistake. Future agents searching for "what do I know about X" find the digest first, then drill into the underlying mistakes on demand.

Production notes

Promotion cadence — not every Mistake deserves a LessonLearned. Immediate: a mistake that broke the task at hand; write the Mistake and Correction, defer the lesson to the digest. Weekly digest: a mistake in a concept with 2+ occurrences. Never: a one-off typo, a transient network error, a misunderstanding of the user's prompt. Confidence decay — LessonLearned carries confidence. Start at 0.6, add +0.1 per re-confirmation by a new mistake in the same cluster, subtract 0.15 if contradicted by a newer LessonLearned (record a Supersedes edge), cap at 1.0. Set a soft floor of confidence >= 0.5 when retrieving; older lessons still exist (durable, append-only) — they just don't pollute retrieval. Tagging for scope — tag Mistake with scope:session and LessonLearned with scope:user, matching the scope pattern from chapter 1.1.

Anti-patterns

Apologizing without recording

The most common mistake. The agent says "Sorry, you're right" and moves on. Nothing was persisted. The same mistake will recur next week. Rule: an apology without a Mistake append is a bug in the agent. The mirror failure — writing Mistake + Correction thoughts every time something fails, but never synthesizing a LessonLearned — turns the chain into a forensic log, useless for generalization. Run the digest. Promote ruthlessly.

Promoting without verifying, or recording every transient error

Two related failure modes. Promoting without verifying: a LessonLearned is synthesized from a Correction that was itself wrong, so the lesson is now confidently wrong with a confidence: 0.7 badge, and every future agent that retrieves it will be misled. Recording every transient error: a flaky network blip isn't a lesson, but the digest's "2+ occurrences" filter will eventually promote it to a bogus LessonLearned about network reliability. The fix for both: search-first (chapter 0.4) before promoting, and only record Mistake thoughts for failures the agent had control over or that the user explicitly corrected.

Testing this pattern

A minimal test verifies all three relations exist with the right endpoints:

#[test]
fn mistake_triple_creates_correct_graph() {
    let mut chain = test_chain();
    chain.upsert_agent("executor", None, None, None, None).unwrap();
    let plan = chain.append_thought("executor",
        ThoughtInput::new(ThoughtType::Plan, "Test plan")).unwrap();
    let rec = record_mistake_triple(&mut chain, "executor", plan,
        "Used SELECT * in production query",
        "Use explicit column list; SELECT * hides schema changes",
        "Always use explicit columns in production SQL.",
        &["sql", "postgres"]).unwrap();
    let has = |from, kind, to| chain.outbound_relations(from).iter()
        .any(|r| r.kind == kind && r.target_index == to);
    assert!(has(rec.lesson_index, ThoughtRelationKind::Corrects, rec.mistake_index));
    assert!(has(rec.lesson_index, ThoughtRelationKind::DerivedFrom, rec.correction_index));
    assert!(has(rec.correction_index, ThoughtRelationKind::Corrects, rec.mistake_index));
}

What's next

Mistake memory closes the loop on what the agent did wrong, but doesn't compress the chain. Part 2 starts with 2.1 Semantic Compression: when and how to roll a thousand Subgoal thoughts into a single Summary the agent can actually re-read.