← Blog
April 19, 2026

Toward a Standard Agent Memory Protocol — Lessons from Hermes and MentisDB

AI agent frameworks all ship their own memory abstraction, and none of them talk to each other. LangChain has BaseMemory. Semantic Kernel has plugins. AutoGen has a conversation buffer. Each one is tied to its framework's idioms, and none expose the lifecycle hooks an external memory backend actually needs to do its job.

After studying Hermes — Nous Research's production self-improving agent — we found what we believe is the most complete open-source memory abstraction in any agent framework today: the MemoryProvider ABC in agent/memory_provider.py. It covers initialization, turn-by-turn sync, semantic prefetch, context compression, delegation observability, and clean shutdown. It deserves to be a community standard.

This post has three parts:

Contents

  1. The MemoryProvider Protocol — what it is and why it works
  2. Implementing MentisDB as a native Hermes memory provider
  3. Zero-code tutorial: MentisDB as an MCP tool inside Hermes

Part 1 — The MemoryProvider Protocol

Why existing abstractions fall short

LangChain's BaseMemory exposes two methods: load_memory_variables(inputs) and save_context(inputs, outputs). That covers recall and persistence. It doesn't cover: what tools the backend exposes to the model, what happens before context is compressed, how session boundaries are signaled, or how a parent agent observes subagent completions. You end up wiring those yourself, per framework, per project.

Semantic Kernel plugins are richer but not memory-specific — they model everything as a generic skill with no lifecycle awareness. AutoGen's conversation buffers are in-process only; swapping them out requires forking the agent.

Hermes gets it right. Here is the full protocol, annotated:

The interface

from abc import ABC, abstractmethod
from typing import List, Dict, Any

class MemoryProvider(ABC):
    """Abstract base class for pluggable agent memory backends."""

    # ── Identity ──────────────────────────────────────────────────────

    @property
    @abstractmethod
    def name(self) -> str:
        """Short identifier, e.g. 'mentisdb', 'honcho', 'mem0'."""

    # ── Lifecycle (must implement) ─────────────────────────────────────

    @abstractmethod
    def is_available(self) -> bool:
        """Return True if the backend is reachable and configured.
        Called once at startup — no network calls that could stall init."""

    @abstractmethod
    def initialize(self, session_id: str, **kwargs) -> None:
        """Called once per session. Receives:
          session_id      — stable session identifier
          platform        — 'cli', 'telegram', 'discord', 'gateway', …
          user_id         — for multi-tenant scoping
          agent_identity  — per-profile/persona scoping
          hermes_home     — path to ~/.hermes
          session_title   — human-readable label
        """

    @abstractmethod
    def get_tool_schemas(self) -> List[Dict[str, Any]]:
        """OpenAI-format tool schemas exposed directly to the model.
        The agent registers these alongside its built-in tools so the
        model can call them explicitly (e.g. mentisdb_recall, mentisdb_store)."""

    # ── Per-turn operations (optional, no-op defaults) ─────────────────

    def system_prompt_block(self) -> str:
        """Static guidance injected into the system prompt at session start.
        Use sparingly — every token here is spent on every API call."""
        return ""

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        """Return relevant context for the upcoming turn.
        Called once at turn start; result is fenced and appended to the
        user message at API-call time (not persisted to message history).
        Should be fast — it blocks the turn. Use queue_prefetch for async."""
        return ""

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
        """Schedule a background prefetch for the NEXT turn.
        Called at end-of-turn so results are ready before the next one starts.
        Implement as a daemon thread; result cached in self._prefetch_result."""

    def sync_turn(self, user_content: str, assistant_content: str,
                  *, session_id: str = "", **kwargs) -> None:
        """Persist the completed turn to the backend.
        Called after every successful turn. Both user and assistant sides
        are available — store whichever is useful for your retrieval model."""

    def handle_tool_call(self, tool_name: str, args: Dict, **kwargs) -> str:
        """Dispatch a model-initiated tool call (from get_tool_schemas).
        Return a JSON string — the agent inserts it as the tool result."""
        return '{}'

    # ── Lifecycle hooks (opt-in, no-op defaults) ───────────────────────

    def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
        """Called at the start of each turn with the user's message.
        Useful for cadence-gating: run deep ops every N turns, not every turn."""

    def on_session_end(self, messages: List[Dict]) -> None:
        """Called at session boundary (exit, /reset, gateway timeout).
        Full message history is available for extraction / summarization."""

    def on_pre_compress(self, messages: List[Dict]) -> str:
        """Called before context compression discards old messages.
        Extract and persist key facts before they are gone.
        Return a string to include in the compression summary (optional)."""
        return ""

    def on_memory_write(self, action: str, target: str, content: str) -> None:
        """Bridge: called when the agent's built-in memory tool fires.
        action: 'add' | 'replace' | 'remove'
        target: 'memory' | 'user'
        Lets you mirror MEMORY.md writes to the external backend automatically."""

    def on_delegation(self, task: str, result: str,
                       *, child_session_id: str = "", **kwargs) -> None:
        """Called when a subagent completes a delegated task.
        Parent agent's provider observes the subtask outcome — useful for
        cross-session memory propagation in multi-agent workflows."""

    def shutdown(self) -> None:
        """Close connections, flush buffers, release resources."""

Why this design is correct

Six principles make it work:

  1. Frozen snapshot injection. Memory is loaded once at initialize() and injected into the system prompt. Mid-session writes go to disk or the backend immediately but do not mutate the prompt. This preserves the LLM's prefix cache across turns — the system prompt stays bit-identical, so the KV cache is reused. The model sees the updated memory at the next session, not mid-turn.
  2. Prefetch at turn boundary, not at API call. prefetch() is called once at turn start. The result is fenced into the user message at API-call time but never written back to message history. This decouples "what the model recalls" from "what is persisted" — you can recall rich context without polluting the chain. queue_prefetch() lets you warm the cache for the next turn while the current one is still processing.
  3. Built-in write bridging. When the agent's native memory tool fires (writing to MEMORY.md), the agent calls on_memory_write() on every registered provider. Your backend stays in sync with the built-in memory store without the model needing to call a separate tool.
  4. Pre-compression extraction. Before the agent compresses old messages out of context, it calls on_pre_compress(). This is the last chance to extract and persist key facts before they are gone. Without this hook, information that never made it into the memory store disappears permanently.
  5. Delegation observability. In multi-agent workflows, the parent agent calls on_delegation() when a subagent finishes. The memory backend can propagate what was learned in the subtask back to the parent's chain — enabling fleet-level memory compounding.
  6. One external provider, graceful degradation. Only one external provider is active at a time (alongside the always-on built-in store). Failures in the external provider are caught, logged, and do not crash the agent. The built-in memory is always the safety net.

Comparison to other abstractions

Feature MemoryProvider (Hermes) BaseMemory (LangChain) Semantic Kernel Plugin AutoGen Buffer
Recall / persist
Tool exposure to model get_tool_schemas() ✓ (generic)
Async prefetch / caching queue_prefetch()
Pre-compression hook on_pre_compress()
Built-in write bridge on_memory_write()
Delegation observability on_delegation()
Session boundary hook on_session_end()
Failure isolation ✓ (catches & logs) Raises Raises Raises
Multi-tenant scoping ✓ (user_id, agent_identity) Partial
Config / setup wizard get_config_schema()

We believe this interface, with minor additions for async-native support and a standardized config schema, should be the basis of a community agent-memory-protocol spec — adopted across LangChain, AutoGen, LlamaIndex, and any framework that wants memory backends to be genuinely portable.


Part 2 — Implementing MentisDB as a Native Hermes Provider

The full lifecycle integration — turn sync, semantic prefetch, built-in write bridging, session-end summarization — requires implementing the MemoryProvider ABC. The result is a plugin file that drops into ~/.hermes/plugins/mentisdb/__init__.py with zero changes to Hermes itself.

Prerequisites: mentisdbd running locally (cargo install mentisdb && mentisdbd &), Python 3.11+, httpx installed in the Hermes venv.

Plugin structure

~/.hermes/plugins/
└── mentisdb/
    ├── __init__.py     ← the provider (paste code below)
    └── plugin.yaml     ← metadata for the setup wizard

plugin.yaml

name: mentisdb
description: "Durable, hash-chained semantic memory via MentisDB"
pip_dependencies:
  - httpx
config:
  - key: MENTISDB_URL
    label: "MentisDB daemon URL"
    default: "http://localhost:9471"
    required: false
  - key: MENTISDB_CHAIN_KEY
    label: "Chain key (leave blank to use agent identity)"
    required: false

__init__.py — ~250 lines, full lifecycle integration:

from __future__ import annotations
import json, logging, os, threading
from typing import Dict, List, Any

try:
    import httpx
except ImportError:
    httpx = None

from agent.memory_provider import MemoryProvider

logger = logging.getLogger("mentisdb")

# ── Tool schemas exposed to the model ──────────────────────────────────────

RECALL_SCHEMA = {
    "name": "mentisdb_recall",
    "description": (
        "Search your long-term memory (MentisDB) for relevant context. "
        "Use this when you need to recall a past decision, fact, or lesson."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "What to search for"},
            "limit": {"type": "integer", "description": "Max results (default 8)"},
        },
        "required": ["query"],
    },
}

STORE_SCHEMA = {
    "name": "mentisdb_store",
    "description": (
        "Persist a fact, lesson, or decision to long-term memory (MentisDB). "
        "Use for things that should survive beyond this session."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "content": {"type": "string", "description": "Content to store"},
            "thought_type": {
                "type": "string",
                "description": (
                    "Memory category: Fact, LessonLearned, Decision, "
                    "Observation, Summary, Reference, or Hypothesis"
                ),
            },
            "tags": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Optional tags for filtering",
            },
        },
        "required": ["content"],
    },
}


# ── Provider implementation ─────────────────────────────────────────────────

class MentisDBMemoryProvider(MemoryProvider):
    """Full-lifecycle MentisDB memory provider for Hermes.

    Features:
    - Semantic prefetch injected as fenced context before each turn
    - Background prefetch queued during current turn for next turn
    - Turn sync: both user and assistant messages persisted as thoughts
    - Built-in memory bridge: MEMORY.md writes mirrored to MentisDB
    - Pre-compression extraction: key facts saved before messages are discarded
    - Session-end summary stored as a Summary thought
    """

    def __init__(self):
        self._url = "http://localhost:9471"
        self._chain_key = "hermes"
        self._client: httpx.Client | None = None
        self._prefetch_cache = ""
        self._prefetch_thread: threading.Thread | None = None
        self._prefetch_lock = threading.Lock()

    @property
    def name(self) -> str:
        return "mentisdb"

    def is_available(self) -> bool:
        if httpx is None:
            return False
        url = os.environ.get("MENTISDB_URL", "http://localhost:9471")
        try:
            r = httpx.get(f"{url}/health", timeout=2.0)
            return r.status_code == 200
        except Exception:
            return False

    def initialize(self, session_id: str, **kwargs) -> None:
        self._url = os.environ.get("MENTISDB_URL", "http://localhost:9471")
        # Use explicit chain key, or fall back to agent identity, or 'hermes'
        self._chain_key = (
            os.environ.get("MENTISDB_CHAIN_KEY")
            or kwargs.get("agent_identity")
            or "hermes"
        )
        self._session_id = session_id
        self._client = httpx.Client(
            base_url=self._url,
            headers={"Content-Type": "application/json"},
            timeout=15.0,
        )
        logger.info("MentisDB provider initialized: chain=%s", self._chain_key)

    def system_prompt_block(self) -> str:
        return (
            "## Long-term Memory (MentisDB)\n"
            "Your memory is backed by MentisDB — a durable, hash-chained memory engine.\n"
            "Use `mentisdb_recall` to search past sessions. "
            "Use `mentisdb_store` for facts, decisions, and lessons that should persist.\n"
            "Relevant context from past sessions is automatically injected before each turn."
        )

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        # Return cached background result if available
        if self._prefetch_thread and self._prefetch_thread.is_alive():
            self._prefetch_thread.join(timeout=3.0)
        with self._prefetch_lock:
            cached = self._prefetch_cache
            self._prefetch_cache = ""
        if cached:
            return cached
        # Synchronous fallback if no cache
        return self._search(query, limit=8)

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
        def _run():
            result = self._search(query, limit=8)
            with self._prefetch_lock:
                self._prefetch_cache = result

        self._prefetch_thread = threading.Thread(target=_run, daemon=True)
        self._prefetch_thread.start()

    def sync_turn(self, user_content: str, assistant_content: str,
                  *, session_id: str = "", **kwargs) -> None:
        for content, role in [(user_content, "user"), (assistant_content, "assistant")]:
            if content:
                self._append(content, thought_type="Observation",
                             tags=["hermes", f"role:{role}",
                                   f"session:{self._session_id}"])

    def get_tool_schemas(self) -> List[Dict]:
        return [RECALL_SCHEMA, STORE_SCHEMA]

    def handle_tool_call(self, tool_name: str, args: Dict, **kwargs) -> str:
        try:
            if tool_name == "mentisdb_recall":
                results = self._search(args["query"], limit=args.get("limit", 8))
                return json.dumps({"results": results})
            elif tool_name == "mentisdb_store":
                self._append(
                    args["content"],
                    thought_type=args.get("thought_type", "LessonLearned"),
                    tags=args.get("tags", []),
                )
                return json.dumps({"status": "stored"})
            return json.dumps({"error": f"Unknown tool: {tool_name}"})
        except Exception as e:
            logger.error("MentisDB tool call failed: %s", e)
            return json.dumps({"error": str(e)})

    def on_memory_write(self, action: str, target: str, content: str) -> None:
        # Mirror built-in MEMORY.md / USER.md writes to MentisDB
        if action in ("add", "replace") and content:
            self._append(content, thought_type="LessonLearned",
                         tags=["hermes", f"memory-file:{target}"])

    def on_pre_compress(self, messages: List[Dict]) -> str:
        # Extract user messages being compressed; store as Observations
        for msg in messages:
            if msg.get("role") == "user" and msg.get("content"):
                self._append(str(msg["content"])[:2000],
                             thought_type="Observation",
                             tags=["hermes", "pre-compress"])
        return ""

    def on_session_end(self, messages: List[Dict]) -> None:
        # Store a compact session summary as a Summary thought
        user_msgs = [m["content"] for m in messages
                     if m.get("role") == "user" and m.get("content")]
        if user_msgs:
            summary = f"Session {self._session_id}: {len(user_msgs)} turns. "
            summary += "Topics: " + "; ".join(str(m)[:80] for m in user_msgs[:5])
            self._append(summary, thought_type="Summary",
                         tags=["hermes", "session-end",
                               f"session:{self._session_id}"])

    def on_delegation(self, task: str, result: str,
                       *, child_session_id: str = "", **kwargs) -> None:
        # Persist subagent outcome so the parent can recall it later
        content = f"Delegated task: {task[:300]}\nResult: {result[:500]}"
        self._append(content, thought_type="Observation",
                     tags=["hermes", "delegation",
                           f"child:{child_session_id}"])

    def shutdown(self) -> None:
        if self._client:
            self._client.close()
            self._client = None

    # ── Internal helpers ───────────────────────────────────────────────────────

    def _search(self, query: str, limit: int = 8) -> str:
        if not self._client:
            return ""
        try:
            r = self._client.post(
                "/v1/search",
                json={"query": query, "limit": limit,
                      "chain_key": self._chain_key},
            )
            r.raise_for_status()
            thoughts = r.json().get("thoughts", [])
            return "\n\n".join(
                f"[{t.get('thought_type', 'Memory')}] {t['content']}"
                for t in thoughts
            )
        except Exception as e:
            logger.warning("MentisDB search failed: %s", e)
            return ""

    def _append(self, content: str, thought_type: str = "Observation",
               tags: list | None = None) -> None:
        if not self._client or not content.strip():
            return
        try:
            self._client.post(
                "/v1/thoughts",
                json={
                    "content": content,
                    "thought_type": thought_type,
                    "chain_key": self._chain_key,
                    "tags": tags or [],
                },
            )
        except Exception as e:
            logger.warning("MentisDB append failed: %s", e)


# ── Plugin entry point ──────────────────────────────────────────────────────

def register(ctx):
    ctx.register_memory_provider(MentisDBMemoryProvider())

Activation

Run the setup wizard:

hermes memory setup

Pick mentisdb from the list. The wizard reads plugin.yaml, asks for the URL (default http://localhost:9471) and chain key, writes secrets to ~/.hermes/.env, and saves memory.provider: mentisdb to ~/.hermes/config.yaml. Restart Hermes and MentisDB is the active memory backend.

Verify it loaded:

hermes memory status

Part 3 — Zero-Code Tutorial: MentisDB as an MCP Tool in Hermes

If you just want to call MentisDB tools from inside Hermes without any lifecycle integration, you can use Hermes' built-in MCP support. MentisDB speaks the MCP protocol natively on port 9471. No code, no plugin file — just two lines of config.

What you get vs. the native provider: MCP tools are model-callable on demand. They do not fire automatically on each turn, sync turns in the background, observe context compression, or participate in session boundaries. Use this for quick lookups and writes; use the native provider (Part 2) for full lifecycle integration.

Step 1 — Start the MentisDB daemon

# Install (if not already)
cargo install mentisdb

# Start the daemon (runs on port 9471 by default)
mentisdbd &

# Or in the foreground with the TUI
mentisdbd

Verify it's up:

curl -s http://localhost:9471/health
# → {"status":"ok"}

Step 2 — Register MentisDB as an MCP server in Hermes

Open ~/.hermes/config.yaml and add:

mcp_servers:
  mentisdb:
    url: "http://localhost:9471"

That's it. Restart Hermes.

Step 3 — Verify tool discovery

At startup, Hermes connects to the MCP server and discovers all available tools. You'll see them listed under the mentisdb server prefix:

hermes tools list | grep mcp_mentisdb
# mcp_mentisdb_mentisdb_append_thought
# mcp_mentisdb_mentisdb_search
# mcp_mentisdb_mentisdb_recent_context
# mcp_mentisdb_mentisdb_bootstrap
# … (all MentisDB MCP tools)

Step 4 — Use from a session

Start Hermes and ask it to use MentisDB directly. The model will call the MCP tools as needed:

# In a Hermes session:
You: Search my MentisDB memory for anything about the authentication refactor.

# Hermes calls: mcp_mentisdb_mentisdb_search({"query": "authentication refactor"})
# Returns ranked results from MentisDB
You: Store this in MentisDB: the auth service now uses RS256 JWTs, not HS256.

# Hermes calls: mcp_mentisdb_mentisdb_append_thought({
#   "content": "the auth service now uses RS256 JWTs, not HS256",
#   "thought_type": "LessonLearned"
# })

Step 5 — Prime an agent from MentisDB at session start

Tell Hermes to always bootstrap from MentisDB at the start of a session by adding this to your system prompt or ~/.hermes/memories/MEMORY.md:

At the start of each session, call mcp_mentisdb_mentisdb_bootstrap to load
the current chain context before answering any questions.

Or use the MentisDB primer line directly. From the TUI, press c while the Prime panel is focused to copy it, then paste it into Hermes:

prime yourself for optimal mentisdb usage. load agent orion from mentisdb chain.
call mentisdb_bootstrap then mentisdb_skill_md

Optional — restrict which tools Hermes exposes

If you want only search and append (not admin tools):

mcp_servers:
  mentisdb:
    url: "http://localhost:9471"
    tools:
      - mentisdb_search
      - mentisdb_append_thought
      - mentisdb_recent_context

Optional — use HTTPS

If your MentisDB daemon has TLS configured (MENTISDB_TLS_CERT / MENTISDB_TLS_KEY), point Hermes at the HTTPS MCP port instead:

mcp_servers:
  mentisdb:
    url: "https://my.mentisdb.com:9473"

What's next

The native provider plugin (Part 2) is the right integration for production use. We plan to submit it as a pull request to the Hermes repository so it ships bundled — the same way Honcho, Mem0, and Hindsight are bundled today.

On the protocol side: we want to work with the agent community — LangChain, AutoGen, LlamaIndex, and others — to formalize the MemoryProvider interface as a shared agent-memory-protocol spec. A backend that implements the protocol once should work unmodified in any conforming agent framework. MentisDB would be the reference implementation.

If you're building an agent framework and want to adopt the protocol, or if you've already built something similar and want to compare notes: open a discussion on GitHub.