AI agent frameworks all ship their own memory abstraction, and none of them talk to each
other. LangChain has BaseMemory. Semantic Kernel has plugins. AutoGen has
a conversation buffer. Each one is tied to its framework's idioms, and none expose the
lifecycle hooks an external memory backend actually needs to do its job.
After studying Hermes
— Nous Research's production self-improving agent — we found what we believe is the
most complete open-source memory abstraction in any agent framework today:
the MemoryProvider ABC in agent/memory_provider.py. It
covers initialization, turn-by-turn sync, semantic prefetch, context compression,
delegation observability, and clean shutdown. It deserves to be a community standard.
This post has three parts:
LangChain's BaseMemory exposes two methods:
load_memory_variables(inputs) and
save_context(inputs, outputs). That covers recall and persistence. It doesn't
cover: what tools the backend exposes to the model, what happens before context is
compressed, how session boundaries are signaled, or how a parent agent observes subagent
completions. You end up wiring those yourself, per framework, per project.
Semantic Kernel plugins are richer but not memory-specific — they model everything as a generic skill with no lifecycle awareness. AutoGen's conversation buffers are in-process only; swapping them out requires forking the agent.
Hermes gets it right. Here is the full protocol, annotated:
from abc import ABC, abstractmethod
from typing import List, Dict, Any
class MemoryProvider(ABC):
"""Abstract base class for pluggable agent memory backends."""
# ── Identity ──────────────────────────────────────────────────────
@property
@abstractmethod
def name(self) -> str:
"""Short identifier, e.g. 'mentisdb', 'honcho', 'mem0'."""
# ── Lifecycle (must implement) ─────────────────────────────────────
@abstractmethod
def is_available(self) -> bool:
"""Return True if the backend is reachable and configured.
Called once at startup — no network calls that could stall init."""
@abstractmethod
def initialize(self, session_id: str, **kwargs) -> None:
"""Called once per session. Receives:
session_id — stable session identifier
platform — 'cli', 'telegram', 'discord', 'gateway', …
user_id — for multi-tenant scoping
agent_identity — per-profile/persona scoping
hermes_home — path to ~/.hermes
session_title — human-readable label
"""
@abstractmethod
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""OpenAI-format tool schemas exposed directly to the model.
The agent registers these alongside its built-in tools so the
model can call them explicitly (e.g. mentisdb_recall, mentisdb_store)."""
# ── Per-turn operations (optional, no-op defaults) ─────────────────
def system_prompt_block(self) -> str:
"""Static guidance injected into the system prompt at session start.
Use sparingly — every token here is spent on every API call."""
return ""
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return relevant context for the upcoming turn.
Called once at turn start; result is fenced and appended to the
user message at API-call time (not persisted to message history).
Should be fast — it blocks the turn. Use queue_prefetch for async."""
return ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Schedule a background prefetch for the NEXT turn.
Called at end-of-turn so results are ready before the next one starts.
Implement as a daemon thread; result cached in self._prefetch_result."""
def sync_turn(self, user_content: str, assistant_content: str,
*, session_id: str = "", **kwargs) -> None:
"""Persist the completed turn to the backend.
Called after every successful turn. Both user and assistant sides
are available — store whichever is useful for your retrieval model."""
def handle_tool_call(self, tool_name: str, args: Dict, **kwargs) -> str:
"""Dispatch a model-initiated tool call (from get_tool_schemas).
Return a JSON string — the agent inserts it as the tool result."""
return '{}'
# ── Lifecycle hooks (opt-in, no-op defaults) ───────────────────────
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Called at the start of each turn with the user's message.
Useful for cadence-gating: run deep ops every N turns, not every turn."""
def on_session_end(self, messages: List[Dict]) -> None:
"""Called at session boundary (exit, /reset, gateway timeout).
Full message history is available for extraction / summarization."""
def on_pre_compress(self, messages: List[Dict]) -> str:
"""Called before context compression discards old messages.
Extract and persist key facts before they are gone.
Return a string to include in the compression summary (optional)."""
return ""
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Bridge: called when the agent's built-in memory tool fires.
action: 'add' | 'replace' | 'remove'
target: 'memory' | 'user'
Lets you mirror MEMORY.md writes to the external backend automatically."""
def on_delegation(self, task: str, result: str,
*, child_session_id: str = "", **kwargs) -> None:
"""Called when a subagent completes a delegated task.
Parent agent's provider observes the subtask outcome — useful for
cross-session memory propagation in multi-agent workflows."""
def shutdown(self) -> None:
"""Close connections, flush buffers, release resources."""
Six principles make it work:
initialize() and injected into the system prompt. Mid-session writes go to
disk or the backend immediately but do not mutate the prompt.
This preserves the LLM's prefix cache across turns — the system prompt stays
bit-identical, so the KV cache is reused. The model sees the updated memory at the
next session, not mid-turn.
prefetch()
is called once at turn start. The result is fenced into the user message at API-call
time but never written back to message history. This decouples "what the model recalls"
from "what is persisted" — you can recall rich context without polluting the chain.
queue_prefetch() lets you warm the cache for the next turn while
the current one is still processing.
memory tool fires (writing to MEMORY.md), the agent calls
on_memory_write() on every registered provider. Your backend stays in
sync with the built-in memory store without the model needing to call a separate tool.
on_pre_compress(). This is the last chance to
extract and persist key facts before they are gone. Without this hook, information
that never made it into the memory store disappears permanently.
on_delegation() when a subagent finishes. The memory backend can
propagate what was learned in the subtask back to the parent's chain — enabling
fleet-level memory compounding.
| Feature | MemoryProvider (Hermes) | BaseMemory (LangChain) | Semantic Kernel Plugin | AutoGen Buffer |
|---|---|---|---|---|
| Recall / persist | ✓ | ✓ | ✓ | ✓ |
| Tool exposure to model | ✓ get_tool_schemas() |
— | ✓ (generic) | — |
| Async prefetch / caching | ✓ queue_prefetch() |
— | — | — |
| Pre-compression hook | ✓ on_pre_compress() |
— | — | — |
| Built-in write bridge | ✓ on_memory_write() |
— | — | — |
| Delegation observability | ✓ on_delegation() |
— | — | — |
| Session boundary hook | ✓ on_session_end() |
— | — | — |
| Failure isolation | ✓ (catches & logs) | Raises | Raises | Raises |
| Multi-tenant scoping | ✓ (user_id, agent_identity) | — | Partial | — |
| Config / setup wizard | ✓ get_config_schema() |
— | — | — |
We believe this interface, with minor additions for async-native support and a standardized config schema, should be the basis of a community agent-memory-protocol spec — adopted across LangChain, AutoGen, LlamaIndex, and any framework that wants memory backends to be genuinely portable.
The full lifecycle integration — turn sync, semantic prefetch, built-in write bridging,
session-end summarization — requires implementing the MemoryProvider ABC.
The result is a plugin file that drops into
~/.hermes/plugins/mentisdb/__init__.py with zero changes to
Hermes itself.
Prerequisites: mentisdbd running locally
(cargo install mentisdb && mentisdbd &),
Python 3.11+, httpx installed in the Hermes venv.
~/.hermes/plugins/
└── mentisdb/
├── __init__.py ← the provider (paste code below)
└── plugin.yaml ← metadata for the setup wizard
plugin.yaml
name: mentisdb
description: "Durable, hash-chained semantic memory via MentisDB"
pip_dependencies:
- httpx
config:
- key: MENTISDB_URL
label: "MentisDB daemon URL"
default: "http://localhost:9471"
required: false
- key: MENTISDB_CHAIN_KEY
label: "Chain key (leave blank to use agent identity)"
required: false
__init__.py — ~250 lines, full lifecycle integration:
from __future__ import annotations
import json, logging, os, threading
from typing import Dict, List, Any
try:
import httpx
except ImportError:
httpx = None
from agent.memory_provider import MemoryProvider
logger = logging.getLogger("mentisdb")
# ── Tool schemas exposed to the model ──────────────────────────────────────
RECALL_SCHEMA = {
"name": "mentisdb_recall",
"description": (
"Search your long-term memory (MentisDB) for relevant context. "
"Use this when you need to recall a past decision, fact, or lesson."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What to search for"},
"limit": {"type": "integer", "description": "Max results (default 8)"},
},
"required": ["query"],
},
}
STORE_SCHEMA = {
"name": "mentisdb_store",
"description": (
"Persist a fact, lesson, or decision to long-term memory (MentisDB). "
"Use for things that should survive beyond this session."
),
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "Content to store"},
"thought_type": {
"type": "string",
"description": (
"Memory category: Fact, LessonLearned, Decision, "
"Observation, Summary, Reference, or Hypothesis"
),
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Optional tags for filtering",
},
},
"required": ["content"],
},
}
# ── Provider implementation ─────────────────────────────────────────────────
class MentisDBMemoryProvider(MemoryProvider):
"""Full-lifecycle MentisDB memory provider for Hermes.
Features:
- Semantic prefetch injected as fenced context before each turn
- Background prefetch queued during current turn for next turn
- Turn sync: both user and assistant messages persisted as thoughts
- Built-in memory bridge: MEMORY.md writes mirrored to MentisDB
- Pre-compression extraction: key facts saved before messages are discarded
- Session-end summary stored as a Summary thought
"""
def __init__(self):
self._url = "http://localhost:9471"
self._chain_key = "hermes"
self._client: httpx.Client | None = None
self._prefetch_cache = ""
self._prefetch_thread: threading.Thread | None = None
self._prefetch_lock = threading.Lock()
@property
def name(self) -> str:
return "mentisdb"
def is_available(self) -> bool:
if httpx is None:
return False
url = os.environ.get("MENTISDB_URL", "http://localhost:9471")
try:
r = httpx.get(f"{url}/health", timeout=2.0)
return r.status_code == 200
except Exception:
return False
def initialize(self, session_id: str, **kwargs) -> None:
self._url = os.environ.get("MENTISDB_URL", "http://localhost:9471")
# Use explicit chain key, or fall back to agent identity, or 'hermes'
self._chain_key = (
os.environ.get("MENTISDB_CHAIN_KEY")
or kwargs.get("agent_identity")
or "hermes"
)
self._session_id = session_id
self._client = httpx.Client(
base_url=self._url,
headers={"Content-Type": "application/json"},
timeout=15.0,
)
logger.info("MentisDB provider initialized: chain=%s", self._chain_key)
def system_prompt_block(self) -> str:
return (
"## Long-term Memory (MentisDB)\n"
"Your memory is backed by MentisDB — a durable, hash-chained memory engine.\n"
"Use `mentisdb_recall` to search past sessions. "
"Use `mentisdb_store` for facts, decisions, and lessons that should persist.\n"
"Relevant context from past sessions is automatically injected before each turn."
)
def prefetch(self, query: str, *, session_id: str = "") -> str:
# Return cached background result if available
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
cached = self._prefetch_cache
self._prefetch_cache = ""
if cached:
return cached
# Synchronous fallback if no cache
return self._search(query, limit=8)
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
def _run():
result = self._search(query, limit=8)
with self._prefetch_lock:
self._prefetch_cache = result
self._prefetch_thread = threading.Thread(target=_run, daemon=True)
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str,
*, session_id: str = "", **kwargs) -> None:
for content, role in [(user_content, "user"), (assistant_content, "assistant")]:
if content:
self._append(content, thought_type="Observation",
tags=["hermes", f"role:{role}",
f"session:{self._session_id}"])
def get_tool_schemas(self) -> List[Dict]:
return [RECALL_SCHEMA, STORE_SCHEMA]
def handle_tool_call(self, tool_name: str, args: Dict, **kwargs) -> str:
try:
if tool_name == "mentisdb_recall":
results = self._search(args["query"], limit=args.get("limit", 8))
return json.dumps({"results": results})
elif tool_name == "mentisdb_store":
self._append(
args["content"],
thought_type=args.get("thought_type", "LessonLearned"),
tags=args.get("tags", []),
)
return json.dumps({"status": "stored"})
return json.dumps({"error": f"Unknown tool: {tool_name}"})
except Exception as e:
logger.error("MentisDB tool call failed: %s", e)
return json.dumps({"error": str(e)})
def on_memory_write(self, action: str, target: str, content: str) -> None:
# Mirror built-in MEMORY.md / USER.md writes to MentisDB
if action in ("add", "replace") and content:
self._append(content, thought_type="LessonLearned",
tags=["hermes", f"memory-file:{target}"])
def on_pre_compress(self, messages: List[Dict]) -> str:
# Extract user messages being compressed; store as Observations
for msg in messages:
if msg.get("role") == "user" and msg.get("content"):
self._append(str(msg["content"])[:2000],
thought_type="Observation",
tags=["hermes", "pre-compress"])
return ""
def on_session_end(self, messages: List[Dict]) -> None:
# Store a compact session summary as a Summary thought
user_msgs = [m["content"] for m in messages
if m.get("role") == "user" and m.get("content")]
if user_msgs:
summary = f"Session {self._session_id}: {len(user_msgs)} turns. "
summary += "Topics: " + "; ".join(str(m)[:80] for m in user_msgs[:5])
self._append(summary, thought_type="Summary",
tags=["hermes", "session-end",
f"session:{self._session_id}"])
def on_delegation(self, task: str, result: str,
*, child_session_id: str = "", **kwargs) -> None:
# Persist subagent outcome so the parent can recall it later
content = f"Delegated task: {task[:300]}\nResult: {result[:500]}"
self._append(content, thought_type="Observation",
tags=["hermes", "delegation",
f"child:{child_session_id}"])
def shutdown(self) -> None:
if self._client:
self._client.close()
self._client = None
# ── Internal helpers ───────────────────────────────────────────────────────
def _search(self, query: str, limit: int = 8) -> str:
if not self._client:
return ""
try:
r = self._client.post(
"/v1/search",
json={"query": query, "limit": limit,
"chain_key": self._chain_key},
)
r.raise_for_status()
thoughts = r.json().get("thoughts", [])
return "\n\n".join(
f"[{t.get('thought_type', 'Memory')}] {t['content']}"
for t in thoughts
)
except Exception as e:
logger.warning("MentisDB search failed: %s", e)
return ""
def _append(self, content: str, thought_type: str = "Observation",
tags: list | None = None) -> None:
if not self._client or not content.strip():
return
try:
self._client.post(
"/v1/thoughts",
json={
"content": content,
"thought_type": thought_type,
"chain_key": self._chain_key,
"tags": tags or [],
},
)
except Exception as e:
logger.warning("MentisDB append failed: %s", e)
# ── Plugin entry point ──────────────────────────────────────────────────────
def register(ctx):
ctx.register_memory_provider(MentisDBMemoryProvider())
Run the setup wizard:
hermes memory setup
Pick mentisdb from the list. The wizard reads plugin.yaml,
asks for the URL (default http://localhost:9471) and chain key, writes
secrets to ~/.hermes/.env, and saves memory.provider: mentisdb
to ~/.hermes/config.yaml. Restart Hermes and MentisDB is the active memory backend.
Verify it loaded:
hermes memory status
If you just want to call MentisDB tools from inside Hermes without any lifecycle integration, you can use Hermes' built-in MCP support. MentisDB speaks the MCP protocol natively on port 9471. No code, no plugin file — just two lines of config.
What you get vs. the native provider: MCP tools are model-callable on demand. They do not fire automatically on each turn, sync turns in the background, observe context compression, or participate in session boundaries. Use this for quick lookups and writes; use the native provider (Part 2) for full lifecycle integration.
# Install (if not already)
cargo install mentisdb
# Start the daemon (runs on port 9471 by default)
mentisdbd &
# Or in the foreground with the TUI
mentisdbd
Verify it's up:
curl -s http://localhost:9471/health
# → {"status":"ok"}
Open ~/.hermes/config.yaml and add:
mcp_servers:
mentisdb:
url: "http://localhost:9471"
That's it. Restart Hermes.
At startup, Hermes connects to the MCP server and discovers all available tools.
You'll see them listed under the mentisdb server prefix:
hermes tools list | grep mcp_mentisdb
# mcp_mentisdb_mentisdb_append_thought
# mcp_mentisdb_mentisdb_search
# mcp_mentisdb_mentisdb_recent_context
# mcp_mentisdb_mentisdb_bootstrap
# … (all MentisDB MCP tools)
Start Hermes and ask it to use MentisDB directly. The model will call the MCP tools as needed:
# In a Hermes session:
You: Search my MentisDB memory for anything about the authentication refactor.
# Hermes calls: mcp_mentisdb_mentisdb_search({"query": "authentication refactor"})
# Returns ranked results from MentisDB
You: Store this in MentisDB: the auth service now uses RS256 JWTs, not HS256.
# Hermes calls: mcp_mentisdb_mentisdb_append_thought({
# "content": "the auth service now uses RS256 JWTs, not HS256",
# "thought_type": "LessonLearned"
# })
Tell Hermes to always bootstrap from MentisDB at the start of a session by adding
this to your system prompt or ~/.hermes/memories/MEMORY.md:
At the start of each session, call mcp_mentisdb_mentisdb_bootstrap to load
the current chain context before answering any questions.
Or use the MentisDB primer line directly. From the TUI, press c while
the Prime panel is focused to copy it, then paste it into Hermes:
prime yourself for optimal mentisdb usage. load agent orion from mentisdb chain.
call mentisdb_bootstrap then mentisdb_skill_md
If you want only search and append (not admin tools):
mcp_servers:
mentisdb:
url: "http://localhost:9471"
tools:
- mentisdb_search
- mentisdb_append_thought
- mentisdb_recent_context
If your MentisDB daemon has TLS configured (MENTISDB_TLS_CERT /
MENTISDB_TLS_KEY), point Hermes at the HTTPS MCP port instead:
mcp_servers:
mentisdb:
url: "https://my.mentisdb.com:9473"
The native provider plugin (Part 2) is the right integration for production use. We plan to submit it as a pull request to the Hermes repository so it ships bundled — the same way Honcho, Mem0, and Hindsight are bundled today.
On the protocol side: we want to work with the agent community — LangChain, AutoGen,
LlamaIndex, and others — to formalize the MemoryProvider interface as a
shared agent-memory-protocol spec. A backend that implements the
protocol once should work unmodified in any conforming agent framework. MentisDB would
be the reference implementation.
If you're building an agent framework and want to adopt the protocol, or if you've already built something similar and want to compare notes: open a discussion on GitHub.