The MentisDB Agent Memory Cookbook

Patterns and recipes for building AI agents that remember

1.2 Semantic Fact Extraction

The problem

Your agent has just read 400 lines of Slack from a product sync. Inside that thread are durable facts: a deadline slipped, a constraint was agreed upon, a decision was made. The agent cannot paste the whole thread into a single memory thought — that would be a 6,000-token "blob" with no type, no importance score, and no way to find the one fact that matters in six weeks.

The fix is semantic fact extraction: an LLM reads the raw text, splits it into individually-meaningful facts, classifies each one as Decision, Constraint, Insight, PreferenceUpdate, or Mistake, and emits a list of typed ThoughtInput records. The agent then reviews them before appending, using the search-first discipline to filter out duplicates and attach proper provenance.

The "always review" rule. LLM output is untrusted. The extractor is a proposer, not a writer. Every extracted thought must pass through the review loop (or a human, or both) before it gets appended. Extraction just makes the input noisier, not different.

Why it's hard

The LLM invents deadlines, misattributes quotes, and summarizes opinions as decisions (hallucination). It over-splits or under-splits (granularity). It blurs Insight and Decision (type confusion). Two adjacent thoughts will say the same thing in different words (duplication). And every thought needs to remember which source document it came from (provenance). Review is what keeps the chain from accumulating garbage.

The pattern: extract → review → append

Three components, each with a single responsibility:

  1. Extractor — takes raw text plus a prompt template, returns Vec<ThoughtInput>. Stateless. Easy to swap (OpenAI, local model, regex heuristic).
  2. Reviewer — runs query_ranked against the chain, classifies each candidate as duplicate, related, or new, and suggests relations.
  3. Applier — for each surviving candidate, optionally asks a human (or a stricter model) to approve, then appends with proper refs and relations.

The extractor never writes to the chain. The applier never talks to an LLM. This separation is what keeps the system debuggable.

Implementation

1. The extractor (OpenAI-compatible)

The crate ships an opt-in LLM extractor behind the llm-extraction feature flag. It calls an OpenAI-compatible chat completion API, asks the model for typed JSON, and validates the response. The wrapper is a one-liner — the interesting logic is the prompt and the reviewer.

use mentisdb::{LlmExtractionConfig, MentisDb, ThoughtInput};

pub struct Extractor { config: LlmExtractionConfig }

impl Extractor {
    pub fn from_env() -> Result<Self, mentisdb::LlmExtractionError> {
        Ok(Self { config: LlmExtractionConfig::from_env()? })
    }

    /// Run the extraction. Returns the candidate thoughts; does NOT append.
    pub async fn extract(
        &self,
        chain: &MentisDb,
        raw_text: &str,
    ) -> Result<Vec<ThoughtInput>, mentisdb::LlmExtractionError> {
        let result = chain.extract_memories(raw_text, &self.config).await?;
        Ok(result.thoughts)
    }
}

2. The prompt template

A narrow template with type definitions baked in works dramatically better than a generic "extract memories" prompt.

const PRODUCT_SYNC_PROMPT: &str = r#"
You extract durable product decisions from meeting transcripts.
Emit one JSON object per distinct fact. Use ONLY these thought_type values:

- Decision: an irreversible choice the team committed to (e.g. "ship X by Q3")
- Constraint: a binding rule (e.g. "must work offline", "no new vendors")
- Insight: a non-obvious finding backed by evidence in the text
- PreferenceUpdate: a user or stakeholder preference that affects future work
- Mistake: a past action the team explicitly flagged as wrong

Rules:
1. Each thought must be a single factual statement — no compound thoughts.
2. Include the speaker's name in the content when attributing an opinion.
3. Confidence < 0.7 means "I am guessing" — set it low rather than fabricating.
4. Skip routine status updates and unanswered questions.
5. Return {"thoughts": [...]}. Empty array if no durable facts are present.

Text to analyze:
{{text}}
"#;

3. The reviewer (search-first dedup + relation suggestion)

use mentisdb::{MentisDb, RankedSearchQuery, ThoughtInput};

pub enum ReviewVerdict {
    /// Already captured by a near-duplicate thought. Drop the candidate.
    Duplicate { existing_index: u32, similarity: f32 },
    /// Genuinely new. Append with these relations.
    New { suggested_refs: Vec<u32> },
    /// Related to an existing thought but distinct. Append with DerivedFrom.
    Related { to: u32, similarity: f32 },
}

pub struct Reviewer {
    duplicate_threshold: f32,
    related_threshold: f32,
}

impl Reviewer {
    pub fn new() -> Self {
        Self { duplicate_threshold: 0.92, related_threshold: 0.65 }
    }

    pub fn review(&self, chain: &MentisDb, candidate: &ThoughtInput) -> ReviewVerdict {
        let hits = chain.query_ranked(
            &RankedSearchQuery::new()
                .with_text(&candidate.content)
                .with_limit(5)
                .with_min_score(0.3)
        ).hits;

        let Some(top) = hits.first() else {
            return ReviewVerdict::New { suggested_refs: vec![] };
        };
        if top.score >= self.duplicate_threshold {
            return ReviewVerdict::Duplicate {
                existing_index: top.thought.index as u32,
                similarity: top.score,
            };
        }
        if top.score >= self.related_threshold {
            return ReviewVerdict::Related {
                to: top.thought.index as u32,
                similarity: top.score,
            };
        }
        // New, but attach the top two as suggested refs for the graph.
        let suggested_refs: Vec<u32> = hits.iter().take(2)
            .map(|h| h.thought.index as u32).collect();
        ReviewVerdict::New { suggested_refs }
    }
}

4. The applier

use mentisdb::{MentisDb, ThoughtInput, ThoughtRelation, ThoughtRelationKind};

pub struct ExtractionApplier { reviewer: Reviewer }

impl ExtractionApplier {
    pub fn new() -> Self { Self { reviewer: Reviewer::new() } }

    /// Apply a batch of extracted thoughts. Returns the indices that
    /// were actually appended (duplicates are skipped).
    pub fn apply(
        &self,
        chain: &mut MentisDb,
        agent_id: &str,
        source_tag: &str,
        candidates: Vec<ThoughtInput>,
    ) -> mentisdb::io::Result<Vec<u32>> {
        let mut appended = Vec::new();
        for mut candidate in candidates {
            candidate.tags.push(format!("source:{}", source_tag));
            match self.reviewer.review(chain, &candidate) {
                ReviewVerdict::Duplicate { existing_index, similarity } => {
                    eprintln!("skip duplicate (sim={:.2}) of #{}: {}",
                        similarity, existing_index, candidate.content);
                }
                ReviewVerdict::Related { to, .. } => {
                    let rel = build_relation(chain, to, ThoughtRelationKind::DerivedFrom)?;
                    let thought = chain.append_thought(agent_id,
                        candidate.with_refs(vec![to]).with_relations(vec![rel]))?;
                    appended.push(thought.index as u32);
                }
                ReviewVerdict::New { suggested_refs } => {
                    let input = if suggested_refs.is_empty() {
                        candidate
                    } else { candidate.with_refs(suggested_refs) };
                    let thought = chain.append_thought(agent_id, input)?;
                    appended.push(thought.index as u32);
                }
            }
        }
        Ok(appended)
    }
}

fn build_relation(
    chain: &MentisDb,
    target_index: u32,
    kind: ThoughtRelationKind,
) -> mentisdb::io::Result<ThoughtRelation> {
    let target = chain.get_thought_by_index(target_index as u64)
        .expect("target thought must exist");
    Ok(ThoughtRelation::new(kind, target.id))
}

5. End-to-end pipeline on a real transcript

use mentisdb::{BinaryStorageAdapter, MentisDb};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let dir = tempfile::tempdir()?;
    let mut chain = MentisDb::open_with_storage(Box::new(
        BinaryStorageAdapter::for_chain_key(dir.path(), "team-brain")
    ))?;
    chain.upsert_agent("ingester", Some("Meeting Ingester"),
        Some("engineering"),
        Some("Extracts durable facts from transcripts"), None)?;

    let transcript = "\
[14:02] Priya: Ship offline mode by end of Q3. Board committed.\n\
[14:04] Marcus: No new vendors — approved list only. Legal flagged.\n\
[14:08] Lin: 2.1 broke offline sync for 12% of users. Rolled back Friday.\n\
[14:09] Marcus: Lesson: never ship a sync change without a canary first.";

    let extractor = Extractor::from_env()?;
    let candidates = extractor.extract(&chain, transcript).await?;

    let applier = ExtractionApplier::new();
    let appended = applier.apply(&mut chain, "ingester",
        "sync:2026-06-08-product-sync", candidates)?;
    println!("appended {} thoughts (rest were duplicates)", appended.len());
    Ok(())
}

The local-only fallback

Not every environment has an API key. For air-gapped deployments and tests, a regex-and-keyword extractor gets you 60-70% of the LLM's quality with zero network calls. The same ExtractionApplier works with any ThoughtInput producer.

use mentisdb::{ThoughtInput, ThoughtType};

pub fn extract_heuristic(text: &str) -> Vec<ThoughtInput> {
    text.lines().filter_map(|line| {
        let lower = line.to_lowercase();
        let ttype = if lower.contains("we decided") ||
                      (lower.contains("ship") && lower.contains("by")) {
            ThoughtType::Decision
        } else if lower.contains("must ") || lower.contains("no new ") {
            ThoughtType::Constraint
        } else if lower.starts_with("lesson:") {
            ThoughtType::Mistake
        } else if lower.contains("%") {
            ThoughtType::Insight
        } else { return None };
        Some(ThoughtInput::new(ttype, line.trim().to_string())
            .with_importance(0.6).with_confidence(0.5))
    }).collect()
}
Heuristic extraction will miss things. The point is to never block the pipeline on an external service.

Production notes

Rate limits and batching

Most LLM providers cap you at 60-500 requests per minute. Call the extractor once on the full text and let the model do the splitting — do not make one request per paragraph. If the text is too long, chunk by semantic boundary and run extractions in parallel, then de-duplicate the union through the reviewer.

Cost estimation

At gpt-4o pricing (June 2026) the extraction prompt is roughly 400 tokens of overhead plus the input. A 2,000-word transcript costs about $0.005 per extraction; 50 daily meetings cost roughly $7/month. Switch to gpt-4o-mini if you do not need the discrimination quality of the larger model.

Confidence, importance, and source tags

The extractor sets confidence (how sure the LLM is about this fact), importance (how durable it is), and tags (the applier extends with a source: tag pointing back to the source document). Use them as automatic gates — anything below 0.5 confidence goes to a human-review queue.

Pitfalls

Hallucinated facts

The single biggest failure mode. The model will confidently state "Priya committed to offline mode by July 15" when the transcript said "end of Q3." Mitigations: include the source date in the prompt; for high-stakes extractions, run the model twice with two prompts and keep only thoughts that match; surface low-confidence candidates to a human-review UI instead of appending them.

Duplicate extractions and missing provenance

The reviewer catches near-duplicates with the duplicate_threshold similarity score — but only if you wired it in. The most common bug is calling chain.extract_memories(...).await and appending the result directly. Always go through the applier. The same rule applies to source: tags: if you forget them, you cannot answer "why is this in the chain?" six months later.

Type confusion and over-extraction

An LLM will happily call everything Insight because it sounds smart. Tighten the prompt: "Decision: an irreversible commitment by a named stakeholder. Insight: a finding backed by data, not a commitment." A 2,000-word transcript should produce 3-8 durable thoughts, not 30. If you see 20+ candidates, the prompt is too greedy — the "skip routine status updates" rule is what keeps the count sane.

Testing this pattern

Use the heuristic extractor (or a mocked LLM response) so the test stays hermetic. A passing test verifies both the dedup threshold and the relation attachment:

#[test]
fn applier_skips_duplicates() {
    let mut chain = test_chain();
    chain.upsert_agent("ingester", None, None, None, None).unwrap();
    chain.append_thought("ingester",
        ThoughtInput::new(ThoughtType::Decision, "Ship offline mode by end of Q3.")
            .with_importance(0.8)
    ).unwrap();

    let candidates = vec![
        ThoughtInput::new(ThoughtType::Decision, "Offline mode must be shipped by end of Q3.")
            .with_importance(0.7),
        ThoughtInput::new(ThoughtType::Constraint, "No new vendors for the offline mode build.")
            .with_importance(0.8),
    ];
    let appended = ExtractionApplier::new()
        .apply(&mut chain, "ingester", "test-source", candidates)
        .unwrap();
    assert_eq!(appended.len(), 1, "duplicate should have been filtered");
}

What's next

Extraction fills the chain with what the agent learned. The next pattern, Multi-Agent Handoff, covers what happens when multiple agents share a chain and how to keep the search-first discipline enforced across a team.