← Blog
April 16, 2026

Content Ingestion in MentisDB

MentisDB now has an opt-in content ingestion workflow for turning free-form text into structured memory candidates. The feature is called the LLM-extracted memories pipeline. You give it raw text such as meeting notes, handoff logs, incident writeups, interview transcripts, analyst notes, or agent transcripts, and it returns typed ThoughtInput records that you can review before append.

This is deliberate reviewable ingestion, not blind auto-ingestion.

MentisDB does not automatically write extracted memories into your chain. It returns candidate thoughts, and you decide what should become durable memory.

That design choice matters. It keeps MentisDB aligned with its core model: durable, attributed, reviewable memory instead of opaque prompt stuffing. For teams that care about traceability, handoffs, and long-lived agent memory, that is usually the right tradeoff.


What the feature actually does

The ingestion pipeline accepts free-form text and asks an OpenAI-compatible model to map it into one or more MentisDB thought candidates. Each candidate includes:

The returned thoughts are valid MentisDB ThoughtInput values, but they are not yet signed, attributed to a specific app-level agent append call, or stored durably. You still choose what to append and under which agent identity.

SurfaceStatusWhat it does
Rust APIBuilt inMentisDb::extract_memories(...) returns ExtractionResult
RESTBuilt inPOST /v1/extract-memories
MCPBuilt inmentisdb_extract_memories for agentic workflows
CLI wrapperNot yetNo dedicated mentisdbd ingest command today

We tested it while writing this post

Rather than just reading the code, we ran the feature against a disposable local chain. We verified two important behaviors:

  1. The extraction call returns structured thought candidates successfully.
  2. The chain remains unchanged until you explicitly append something.
cargo build --bin mentisdbd

MENTISDB_DIR=/tmp/mentisdb-ingestion-live-2 \
MENTISDB_REST_PORT=19722 \
MENTISDB_MCP_PORT=19721 \
MENTISDB_HTTPS_MCP_PORT=0 \
MENTISDB_HTTPS_REST_PORT=0 \
MENTISDB_DASHBOARD_PORT=0 \
target/debug/mentisdbd

Then we posted sample text to /v1/extract-memories and checked /v1/head immediately afterwards. The response returned three candidate thoughts, while the chain head still showed thought_count: 0.

Real bug found and fixed: while testing, we found that some OpenAI-compatible endpoints rejected the request when the client sent a response_format hint. MentisDB now relies on the prompt plus strict JSON validation instead, which is more portable across providers.

We also hit a more subtle issue that is not a code bug but a workflow reality: the model can still misclassify a sentence. In one live test, a requirement sentence came back as TaskComplete. That is exactly why this feature is review-first rather than auto-append.


How to set it up

1. Install or build MentisDB

cargo install mentisdb

# or from the repo
cargo build --bin mentisdbd

2. Configure an OpenAI-compatible provider

export OPENAI_API_KEY="sk-..."
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_MODEL="gpt-4o"

OPENAI_API_KEY is required. LLM_BASE_URL and LLM_MODEL are optional. If you leave them unset, MentisDB defaults to OpenAI's chat-completions URL and a default model value.

3. Start the daemon

mentisdbd

The REST API defaults to http://127.0.0.1:9472. If you already use MentisDB through Claude Desktop, OpenCode, Codex, or another MCP-capable host, you can also trigger ingestion through the MCP tool instead of talking to REST directly.


Using the REST endpoint

The simplest direct test is a POST to /v1/extract-memories.

curl -sS -X POST "http://127.0.0.1:9472/v1/extract-memories" \
  -H "Content-Type: application/json" \
  -d '{
    "chain_key": "content-ingestion-demo",
    "text": "User prefers terse release notes. They asked whether the backup command flushes storage before archiving. We decided to document the flush behavior in the handbook."
  }'

Typical response shape:

{
  "thoughts": [
    {
      "thought_type": "PreferenceUpdate",
      "role": "Memory",
      "content": "User prefers terse release notes.",
      "importance": 0.7,
      "confidence": 1.0,
      "tags": ["user", "preference", "release notes"],
      "concepts": ["communication", "documentation"],
      "refs": [],
      "relations": []
    },
    {
      "thought_type": "Question",
      "role": "Memory",
      "content": "User asked whether the backup command flushes storage before archiving.",
      "importance": 0.8,
      "confidence": 1.0,
      "tags": ["backup", "question"],
      "concepts": ["storage"]
    },
    {
      "thought_type": "Decision",
      "role": "Memory",
      "content": "We decided to document the flush behavior in the handbook.",
      "importance": 0.9,
      "confidence": 1.0,
      "tags": ["decision", "documentation"],
      "concepts": ["documentation"]
    }
  ],
  "model": "gpt-4-0613",
  "usage": {
    "prompt_tokens": 347,
    "completion_tokens": 241,
    "total_tokens": 588
  }
}

Important: this does not append

curl -sS -X POST "http://127.0.0.1:9472/v1/head" \
  -H "Content-Type: application/json" \
  -d '{"chain_key":"content-ingestion-demo"}'

If you have only extracted and not appended, the chain can still be empty. That is expected.


Using it through MCP

For most agent users, MCP is the better interface than raw REST. Once your coding agent or assistant is connected to MentisDB, it can call mentisdb_extract_memories directly.

A good workflow looks like this:

  1. Ask the agent to extract durable memories from a block of notes or transcript text.
  2. Ask it to show you the candidate thoughts first.
  3. Approve, reject, or edit them.
  4. Only then append the good ones using mentisdb_append.

This is where MentisDB is strongest.

The extraction tool is useful by itself, but it becomes much more powerful when paired with the existing MCP memory workflow: extract, review, append, search, checkpoint, and hand off.


Using the Rust API

If you are embedding MentisDB in your own app or service, the Rust API is the cleanest path.

use mentisdb::{LlmExtractionConfig, MentisDb};
use std::path::PathBuf;

# async fn run() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let config = LlmExtractionConfig::from_env()?;
let mut chain = MentisDb::open_with_key(PathBuf::from("/tmp/mentisdb"), "ingestion-demo")?;

let extraction = chain.extract_memories(
    "Customer asked for SSO pricing and confirmed they want weekly rollout summaries.",
    &config,
).await?;

for input in &extraction.thoughts {
    println!("{:?}: {}", input.thought_type, input.content);
}

// Review before append.
for input in extraction.thoughts {
    if input.thought_type == mentisdb::ThoughtType::PreferenceUpdate {
        chain.append_thought("assistant", input)?;
    }
}
# Ok(())
# }

The important part is not the extraction call. It is the review step between extraction and append.


Use cases

For normal people

For enterprise knowledge workers

Finance:

Defense and security:

For coders and engineering teams


The right review workflow

If you want good results, do not treat content ingestion as a one-shot automation feature. Treat it as a memory proposal generator.

  1. Extract from raw notes, transcript, or tool log.
  2. Remove anything non-durable, overly specific, or speculative.
  3. Fix any misclassified thought types.
  4. Add refs or typed relations if you already know the relevant prior memories.
  5. Append only the thoughts you want to carry forward.

That small review step is what keeps MentisDB useful over time instead of turning it into an untrusted dumping ground.


Current limitations and caveats


Troubleshooting

SymptomLikely causeFix
OPENAI_API_KEY is not setMissing provider credentialExport OPENAI_API_KEY before starting the daemon or your app
API error from providerBad key, wrong base URL, incompatible endpoint, or unavailable modelCheck auth, base URL, and model settings
Parse errorModel returned prose or malformed JSONUse the default prompt first; keep custom prompts schema-compatible
Schema mismatchOutput used the wrong field names or invalid thought typesRequire thought_type, content, and top-level {"thoughts": [...]}
Chain still emptyYou extracted but never appendedReview the returned thoughts and append the ones you want

Bottom line

MentisDB's content ingestion feature is already useful today, especially for teams using agentic workflows through MCP. But it is best understood as reviewable semantic extraction, not blind background ingestion.

That may sound less magical than "just throw documents at it," but for durable agent memory it is often the better shape: fewer silent hallucinations, better memory hygiene, and a cleaner line between candidate knowledge and trusted memory.