1.2 Semantic Fact Extraction
The problem
Your agent has just read 400 lines of Slack from a product sync. Inside that thread are durable facts: a deadline slipped, a constraint was agreed upon, a decision was made. The agent cannot paste the whole thread into a single memory thought — that would be a 6,000-token "blob" with no type, no importance score, and no way to find the one fact that matters in six weeks.
The fix is semantic fact extraction: an LLM reads the raw
text, splits it into individually-meaningful facts, classifies each one as
Decision, Constraint, Insight,
PreferenceUpdate, or Mistake, and emits a list of
typed ThoughtInput records. The agent then reviews
them before appending, using the
search-first discipline to
filter out duplicates and attach proper provenance.
Why it's hard
The LLM invents deadlines, misattributes quotes, and summarizes opinions
as decisions (hallucination). It over-splits or under-splits (granularity).
It blurs Insight and Decision (type confusion).
Two adjacent thoughts will say the same thing in different words
(duplication). And every thought needs to remember which source document
it came from (provenance). Review is what keeps the chain from
accumulating garbage.
The pattern: extract → review → append
Three components, each with a single responsibility:
- Extractor — takes raw text plus a prompt template,
returns
Vec<ThoughtInput>. Stateless. Easy to swap (OpenAI, local model, regex heuristic). - Reviewer — runs
query_rankedagainst the chain, classifies each candidate as duplicate, related, or new, and suggests relations. - Applier — for each surviving candidate, optionally
asks a human (or a stricter model) to approve, then appends with proper
refsandrelations.
The extractor never writes to the chain. The applier never talks to an LLM. This separation is what keeps the system debuggable.
Implementation
1. The extractor (OpenAI-compatible)
The crate ships an opt-in LLM extractor behind the llm-extraction
feature flag. It calls an OpenAI-compatible chat completion API, asks the model
for typed JSON, and validates the response. The wrapper is a one-liner — the
interesting logic is the prompt and the reviewer.
use mentisdb::{LlmExtractionConfig, MentisDb, ThoughtInput};
pub struct Extractor { config: LlmExtractionConfig }
impl Extractor {
pub fn from_env() -> Result<Self, mentisdb::LlmExtractionError> {
Ok(Self { config: LlmExtractionConfig::from_env()? })
}
/// Run the extraction. Returns the candidate thoughts; does NOT append.
pub async fn extract(
&self,
chain: &MentisDb,
raw_text: &str,
) -> Result<Vec<ThoughtInput>, mentisdb::LlmExtractionError> {
let result = chain.extract_memories(raw_text, &self.config).await?;
Ok(result.thoughts)
}
}
2. The prompt template
A narrow template with type definitions baked in works dramatically better than a generic "extract memories" prompt.
const PRODUCT_SYNC_PROMPT: &str = r#"
You extract durable product decisions from meeting transcripts.
Emit one JSON object per distinct fact. Use ONLY these thought_type values:
- Decision: an irreversible choice the team committed to (e.g. "ship X by Q3")
- Constraint: a binding rule (e.g. "must work offline", "no new vendors")
- Insight: a non-obvious finding backed by evidence in the text
- PreferenceUpdate: a user or stakeholder preference that affects future work
- Mistake: a past action the team explicitly flagged as wrong
Rules:
1. Each thought must be a single factual statement — no compound thoughts.
2. Include the speaker's name in the content when attributing an opinion.
3. Confidence < 0.7 means "I am guessing" — set it low rather than fabricating.
4. Skip routine status updates and unanswered questions.
5. Return {"thoughts": [...]}. Empty array if no durable facts are present.
Text to analyze:
{{text}}
"#;
3. The reviewer (search-first dedup + relation suggestion)
use mentisdb::{MentisDb, RankedSearchQuery, ThoughtInput};
pub enum ReviewVerdict {
/// Already captured by a near-duplicate thought. Drop the candidate.
Duplicate { existing_index: u32, similarity: f32 },
/// Genuinely new. Append with these relations.
New { suggested_refs: Vec<u32> },
/// Related to an existing thought but distinct. Append with DerivedFrom.
Related { to: u32, similarity: f32 },
}
pub struct Reviewer {
duplicate_threshold: f32,
related_threshold: f32,
}
impl Reviewer {
pub fn new() -> Self {
Self { duplicate_threshold: 0.92, related_threshold: 0.65 }
}
pub fn review(&self, chain: &MentisDb, candidate: &ThoughtInput) -> ReviewVerdict {
let hits = chain.query_ranked(
&RankedSearchQuery::new()
.with_text(&candidate.content)
.with_limit(5)
.with_min_score(0.3)
).hits;
let Some(top) = hits.first() else {
return ReviewVerdict::New { suggested_refs: vec![] };
};
if top.score >= self.duplicate_threshold {
return ReviewVerdict::Duplicate {
existing_index: top.thought.index as u32,
similarity: top.score,
};
}
if top.score >= self.related_threshold {
return ReviewVerdict::Related {
to: top.thought.index as u32,
similarity: top.score,
};
}
// New, but attach the top two as suggested refs for the graph.
let suggested_refs: Vec<u32> = hits.iter().take(2)
.map(|h| h.thought.index as u32).collect();
ReviewVerdict::New { suggested_refs }
}
}
4. The applier
use mentisdb::{MentisDb, ThoughtInput, ThoughtRelation, ThoughtRelationKind};
pub struct ExtractionApplier { reviewer: Reviewer }
impl ExtractionApplier {
pub fn new() -> Self { Self { reviewer: Reviewer::new() } }
/// Apply a batch of extracted thoughts. Returns the indices that
/// were actually appended (duplicates are skipped).
pub fn apply(
&self,
chain: &mut MentisDb,
agent_id: &str,
source_tag: &str,
candidates: Vec<ThoughtInput>,
) -> mentisdb::io::Result<Vec<u32>> {
let mut appended = Vec::new();
for mut candidate in candidates {
candidate.tags.push(format!("source:{}", source_tag));
match self.reviewer.review(chain, &candidate) {
ReviewVerdict::Duplicate { existing_index, similarity } => {
eprintln!("skip duplicate (sim={:.2}) of #{}: {}",
similarity, existing_index, candidate.content);
}
ReviewVerdict::Related { to, .. } => {
let rel = build_relation(chain, to, ThoughtRelationKind::DerivedFrom)?;
let thought = chain.append_thought(agent_id,
candidate.with_refs(vec![to]).with_relations(vec![rel]))?;
appended.push(thought.index as u32);
}
ReviewVerdict::New { suggested_refs } => {
let input = if suggested_refs.is_empty() {
candidate
} else { candidate.with_refs(suggested_refs) };
let thought = chain.append_thought(agent_id, input)?;
appended.push(thought.index as u32);
}
}
}
Ok(appended)
}
}
fn build_relation(
chain: &MentisDb,
target_index: u32,
kind: ThoughtRelationKind,
) -> mentisdb::io::Result<ThoughtRelation> {
let target = chain.get_thought_by_index(target_index as u64)
.expect("target thought must exist");
Ok(ThoughtRelation::new(kind, target.id))
}
5. End-to-end pipeline on a real transcript
use mentisdb::{BinaryStorageAdapter, MentisDb};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let dir = tempfile::tempdir()?;
let mut chain = MentisDb::open_with_storage(Box::new(
BinaryStorageAdapter::for_chain_key(dir.path(), "team-brain")
))?;
chain.upsert_agent("ingester", Some("Meeting Ingester"),
Some("engineering"),
Some("Extracts durable facts from transcripts"), None)?;
let transcript = "\
[14:02] Priya: Ship offline mode by end of Q3. Board committed.\n\
[14:04] Marcus: No new vendors — approved list only. Legal flagged.\n\
[14:08] Lin: 2.1 broke offline sync for 12% of users. Rolled back Friday.\n\
[14:09] Marcus: Lesson: never ship a sync change without a canary first.";
let extractor = Extractor::from_env()?;
let candidates = extractor.extract(&chain, transcript).await?;
let applier = ExtractionApplier::new();
let appended = applier.apply(&mut chain, "ingester",
"sync:2026-06-08-product-sync", candidates)?;
println!("appended {} thoughts (rest were duplicates)", appended.len());
Ok(())
}
The local-only fallback
Not every environment has an API key. For air-gapped deployments and tests,
a regex-and-keyword extractor gets you 60-70% of the LLM's quality with
zero network calls. The same ExtractionApplier works with any
ThoughtInput producer.
use mentisdb::{ThoughtInput, ThoughtType};
pub fn extract_heuristic(text: &str) -> Vec<ThoughtInput> {
text.lines().filter_map(|line| {
let lower = line.to_lowercase();
let ttype = if lower.contains("we decided") ||
(lower.contains("ship") && lower.contains("by")) {
ThoughtType::Decision
} else if lower.contains("must ") || lower.contains("no new ") {
ThoughtType::Constraint
} else if lower.starts_with("lesson:") {
ThoughtType::Mistake
} else if lower.contains("%") {
ThoughtType::Insight
} else { return None };
Some(ThoughtInput::new(ttype, line.trim().to_string())
.with_importance(0.6).with_confidence(0.5))
}).collect()
}
Production notes
Rate limits and batching
Most LLM providers cap you at 60-500 requests per minute. Call the extractor once on the full text and let the model do the splitting — do not make one request per paragraph. If the text is too long, chunk by semantic boundary and run extractions in parallel, then de-duplicate the union through the reviewer.
Cost estimation
At gpt-4o pricing (June 2026) the extraction prompt is roughly
400 tokens of overhead plus the input. A 2,000-word transcript costs about
$0.005 per extraction; 50 daily meetings cost roughly $7/month. Switch to
gpt-4o-mini if you do not need the discrimination quality of
the larger model.
Confidence, importance, and source tags
The extractor sets confidence (how sure the LLM is about
this fact), importance (how durable it is), and
tags (the applier extends with a source: tag
pointing back to the source document). Use them as automatic gates —
anything below 0.5 confidence goes to a human-review queue.
Pitfalls
Hallucinated facts
The single biggest failure mode. The model will confidently state "Priya committed to offline mode by July 15" when the transcript said "end of Q3." Mitigations: include the source date in the prompt; for high-stakes extractions, run the model twice with two prompts and keep only thoughts that match; surface low-confidence candidates to a human-review UI instead of appending them.
Duplicate extractions and missing provenance
The reviewer catches near-duplicates with the duplicate_threshold
similarity score — but only if you wired it in. The most common bug is
calling chain.extract_memories(...).await and appending the
result directly. Always go through the applier. The same rule applies to
source: tags: if you forget them, you cannot answer "why is
this in the chain?" six months later.
Type confusion and over-extraction
An LLM will happily call everything Insight because it
sounds smart. Tighten the prompt: "Decision: an irreversible
commitment by a named stakeholder. Insight: a finding backed by data,
not a commitment." A 2,000-word transcript should produce 3-8
durable thoughts, not 30. If you see 20+ candidates, the prompt is too
greedy — the "skip routine status updates" rule is what keeps the count
sane.
Testing this pattern
Use the heuristic extractor (or a mocked LLM response) so the test stays hermetic. A passing test verifies both the dedup threshold and the relation attachment:
#[test]
fn applier_skips_duplicates() {
let mut chain = test_chain();
chain.upsert_agent("ingester", None, None, None, None).unwrap();
chain.append_thought("ingester",
ThoughtInput::new(ThoughtType::Decision, "Ship offline mode by end of Q3.")
.with_importance(0.8)
).unwrap();
let candidates = vec![
ThoughtInput::new(ThoughtType::Decision, "Offline mode must be shipped by end of Q3.")
.with_importance(0.7),
ThoughtInput::new(ThoughtType::Constraint, "No new vendors for the offline mode build.")
.with_importance(0.8),
];
let appended = ExtractionApplier::new()
.apply(&mut chain, "ingester", "test-source", candidates)
.unwrap();
assert_eq!(appended.len(), 1, "duplicate should have been filtered");
}
What's next
Extraction fills the chain with what the agent learned. The next pattern, Multi-Agent Handoff, covers what happens when multiple agents share a chain and how to keep the search-first discipline enforced across a team.