April 16, 2026

Content Ingestion in MentisDB

MentisDB now has an opt-in content ingestion workflow for turning free-form text into structured memory candidates. The feature is called the LLM-extracted memories pipeline. You give it raw text such as meeting notes, handoff logs, incident writeups, interview transcripts, analyst notes, or agent transcripts, and it returns typed ThoughtInput records that you can review before append.

This is deliberate reviewable ingestion, not blind auto-ingestion.

MentisDB does not automatically write extracted memories into your chain. It returns candidate thoughts, and you decide what should become durable memory.

That design choice matters. It keeps MentisDB aligned with its core model: durable, attributed, reviewable memory instead of opaque prompt stuffing. For teams that care about traceability, handoffs, and long-lived agent memory, that is usually the right tradeoff.

What the feature actually does

The ingestion pipeline accepts free-form text and asks an OpenAI-compatible model to map it into one or more MentisDB thought candidates. Each candidate includes:

thought_type such as Decision, Question, or PreferenceUpdate
content — a concise statement of the durable memory
importance and confidence
tags and concepts

The returned thoughts are valid MentisDB ThoughtInput values, but they are not yet signed, attributed to a specific app-level agent append call, or stored durably. You still choose what to append and under which agent identity.

Surface	Status	What it does
Rust API	Built in	`MentisDb::extract_memories(...)` returns `ExtractionResult`
REST	Built in	`POST /v1/extract-memories`
MCP	Built in	`mentisdb_extract_memories` for agentic workflows
CLI wrapper	Not yet	No dedicated `mentisdb ingest` command today

We tested it while writing this post

Rather than just reading the code, we ran the feature against a disposable local chain. We verified two important behaviors:

The extraction call returns structured thought candidates successfully.
The chain remains unchanged until you explicitly append something.

cargo build --bin mentisdb

MENTISDB_DIR=/tmp/mentisdb-ingestion-live-2 \
MENTISDB_REST_PORT=19722 \
MENTISDB_MCP_PORT=19721 \
MENTISDB_HTTPS_MCP_PORT=0 \
MENTISDB_HTTPS_REST_PORT=0 \
MENTISDB_DASHBOARD_PORT=0 \
target/debug/mentisdb

Then we posted sample text to /v1/extract-memories and checked /v1/head immediately afterwards. The response returned three candidate thoughts, while the chain head still showed thought_count: 0.

Real bug found and fixed: while testing, we found that some OpenAI-compatible endpoints rejected the request when the client sent a response_format hint. MentisDB now relies on the prompt plus strict JSON validation instead, which is more portable across providers.

We also hit a more subtle issue that is not a code bug but a workflow reality: the model can still misclassify a sentence. In one live test, a requirement sentence came back as TaskComplete. That is exactly why this feature is review-first rather than auto-append.

How to set it up

1. Install or build MentisDB

cargo install mentisdb

# or from the repo
cargo build --bin mentisdb

2. Configure an OpenAI-compatible provider

export OPENAI_API_KEY="sk-..."
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_MODEL="gpt-4o"

OPENAI_API_KEY is required. LLM_BASE_URL and LLM_MODEL are optional. If you leave them unset, MentisDB defaults to OpenAI's chat-completions URL and a default model value.

3. Start the daemon

mentisdb

The REST API defaults to http://127.0.0.1:9472. If you already use MentisDB through Claude Desktop, OpenCode, Codex, or another MCP-capable host, you can also trigger ingestion through the MCP tool instead of talking to REST directly.

Using the REST endpoint

The simplest direct test is a POST to /v1/extract-memories.

curl -sS -X POST "http://127.0.0.1:9472/v1/extract-memories" \
  -H "Content-Type: application/json" \
  -d '{
    "chain_key": "content-ingestion-demo",
    "text": "User prefers terse release notes. They asked whether the backup command flushes storage before archiving. We decided to document the flush behavior in the handbook."
  }'

Typical response shape:

{
  "thoughts": [
    {
      "thought_type": "PreferenceUpdate",
      "role": "Memory",
      "content": "User prefers terse release notes.",
      "importance": 0.7,
      "confidence": 1.0,
      "tags": ["user", "preference", "release notes"],
      "concepts": ["communication", "documentation"],
      "refs": [],
      "relations": []
    },
    {
      "thought_type": "Question",
      "role": "Memory",
      "content": "User asked whether the backup command flushes storage before archiving.",
      "importance": 0.8,
      "confidence": 1.0,
      "tags": ["backup", "question"],
      "concepts": ["storage"]
    },
    {
      "thought_type": "Decision",
      "role": "Memory",
      "content": "We decided to document the flush behavior in the handbook.",
      "importance": 0.9,
      "confidence": 1.0,
      "tags": ["decision", "documentation"],
      "concepts": ["documentation"]
    }
  ],
  "model": "gpt-4-0613",
  "usage": {
    "prompt_tokens": 347,
    "completion_tokens": 241,
    "total_tokens": 588
  }
}

Important: this does not append

curl -sS -X POST "http://127.0.0.1:9472/v1/head" \
  -H "Content-Type: application/json" \
  -d '{"chain_key":"content-ingestion-demo"}'

If you have only extracted and not appended, the chain can still be empty. That is expected.

Using it through MCP

For most agent users, MCP is the better interface than raw REST. Once your coding agent or assistant is connected to MentisDB, it can call mentisdb_extract_memories directly.

A good workflow looks like this:

Ask the agent to extract durable memories from a block of notes or transcript text.
Ask it to show you the candidate thoughts first.
Approve, reject, or edit them.
Only then append the good ones using mentisdb_append.

This is where MentisDB is strongest.

The extraction tool is useful by itself, but it becomes much more powerful when paired with the existing MCP memory workflow: extract, review, append, search, checkpoint, and hand off.

Using the Rust API

If you are embedding MentisDB in your own app or service, the Rust API is the cleanest path.

use mentisdb::{LlmExtractionConfig, MentisDb};
use std::path::PathBuf;

# async fn run() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let config = LlmExtractionConfig::from_env()?;
let mut chain = MentisDb::open_with_key(PathBuf::from("/tmp/mentisdb"), "ingestion-demo")?;

let extraction = chain.extract_memories(
    "Customer asked for SSO pricing and confirmed they want weekly rollout summaries.",
    &config,
).await?;

for input in &extraction.thoughts {
    println!("{:?}: {}", input.thought_type, input.content);
}

// Review before append.
for input in extraction.thoughts {
    if input.thought_type == mentisdb::ThoughtType::PreferenceUpdate {
        chain.append_thought("assistant", input)?;
    }
}
# Ok(())
# }

The important part is not the extraction call. It is the review step between extraction and append.

Use cases

For normal people

Meeting notes: turn a messy client call transcript into concrete decisions, open questions, and next-step goals.
Life admin: ingest contractor notes, school conversations, or property maintenance logs and keep only the durable follow-ups.
Personal project memory: convert a long chat with an agent into explicit preferences, constraints, and decisions you can reuse next week.

For enterprise knowledge workers

Finance:

Convert analyst call notes into questions, constraints, and risk-relevant findings.
Ingest operating committee notes into decisions, exceptions, and follow-up tasks.
Extract durable memory from post-trade reviews, incident summaries, or policy updates.

Defense and security:

Turn shift handoffs into state snapshots and questions that survive personnel changes.
Convert incident timelines into findings, assumptions invalidated, and lessons learned.
Use review-first extraction when auditability matters and blind automatic ingestion would be risky.

For coders and engineering teams

PR and code review notes: extract decisions, constraints, and follow-up questions.
Postmortems: convert long incident writeups into lessons learned, corrections, and safeguards.
Sprint handoffs: take a large wall of notes and produce checkpoint-ready memory before context compaction.
Support escalations: distill customer complaints into reproducible findings and product decisions.

The right review workflow

If you want good results, do not treat content ingestion as a one-shot automation feature. Treat it as a memory proposal generator.

Extract from raw notes, transcript, or tool log.
Remove anything non-durable, overly specific, or speculative.
Fix any misclassified thought types.
Add refs or typed relations if you already know the relevant prior memories.
Append only the thoughts you want to carry forward.

That small review step is what keeps MentisDB useful over time instead of turning it into an untrusted dumping ground.

Current limitations and caveats

No dedicated CLI wrapper yet. Today this is a Rust API, REST, and MCP feature.
Review is mandatory in practice. The model can return a semantically plausible but wrong type.
Custom prompt templates are fragile. If your template stops returning the exact JSON schema, extraction fails.
The returned thoughts are not source-marked automatically. They stay semantically typed rather than being forced into LLMExtracted.
agent_id on the request is not the same thing as append attribution. The durable agent identity is decided when you append, not when you extract.

Troubleshooting

Symptom	Likely cause	Fix
`OPENAI_API_KEY is not set`	Missing provider credential	Export `OPENAI_API_KEY` before starting the daemon or your app
API error from provider	Bad key, wrong base URL, incompatible endpoint, or unavailable model	Check auth, base URL, and model settings
Parse error	Model returned prose or malformed JSON	Use the default prompt first; keep custom prompts schema-compatible
Schema mismatch	Output used the wrong field names or invalid thought types	Require `thought_type`, `content`, and top-level `{"thoughts": [...]}`
Chain still empty	You extracted but never appended	Review the returned thoughts and append the ones you want

Bottom line

MentisDB's content ingestion feature is already useful today, especially for teams using agentic workflows through MCP. But it is best understood as reviewable semantic extraction, not blind background ingestion.

That may sound less magical than "just throw documents at it," but for durable agent memory it is often the better shape: fewer silent hallucinations, better memory hygiene, and a cleaner line between candidate knowledge and trusted memory.