3.1 Embedding Provider Selection

The problem

A sidecar is what makes semantic search possible in MentisDB: a per-chain vector index that lives next to the append-only log, refreshed on every append, queried at retrieval time. The provider turns thought text into vectors. Pick wrong and you pay for it in cold-start latency, monthly cost, and retrieval quality — all at once.

This chapter is the selection guide: the four built-in options (Local, FastEmbed, OpenAI, and the no-provider / lexical-only path), plus how to write a custom provider when none of them fit. The next chapter covers the operational side: when to rebuild, when to sync, when to ship a sidecar to a remote bucket.

The decision matrix

Four axes dominate: quality, cost, latency, and offline capability.

Provider	Dim	Quality	Cost	Latency	Offline
`LocalTextEmbeddingProvider`	256	Fair — token & trigram hash, no learned semantics	$0	~0 ms cold start, sub-ms per doc	Yes
`FastEmbedProvider` (AllMiniLML6V2)	384	Good — real semantic model, fine-tuned on 1B+ pairs	$0 (CPU inference, ~80 MB model)	~2-5 s cold start, ~5-20 ms per doc	Yes (after first download)
OpenAI `text-embedding-3-small`	1536	Excellent — top-tier general purpose	$0.02 / 1M tokens	~200-500 ms per batch	No
OpenAI `text-embedding-3-large`	3072	State of the art	$0.13 / 1M tokens	~300-800 ms per batch	No
No provider (lexical + graph only)	—	Depends on BM25 + graph coverage	$0	0 ms	Yes
Custom (your own model)	any	Whatever you can train or pay for	Your model, your bill	Your hardware, your profile	Up to you

When to use which

Local — the default

The built-in LocalTextEmbeddingProvider hashes normalized tokens and trigrams into a fixed 256-d dense vector. It has no learned semantics — "car" and "automobile" are not close in this space — but it is fast, deterministic, has zero dependencies, and requires no network. Use it for CLIs, embedded agents, air-gapped deployments, and any iteration loop where you want to defer the embedding-quality question. It also shines on rare proper nouns, file paths, error messages, and identifiers — text that semantic models tend to wash out.

It is the daemon's default for a reason: 80% of the value of semantic search with 0% of the operational pain. Upgrade when lexical overlap stops carrying you ("how do I authenticate" vs. "login flow").

FastEmbed — the upgrade

FastEmbedProvider wraps all-MiniLM-L6-v2 via the fastembed crate. It produces real 384-d semantic vectors, fully on-device, with no API key and no per-token cost. Use it when quality matters and you can absorb an 80 MB model download and a 2-5 s cold start.

Feature flag: FastEmbed is gated behind the local-embeddings Cargo feature (it pulls in an ONNX runtime). The daemon activates it automatically when the model can be loaded; library consumers add the feature to their Cargo.toml.

OpenAI — the high-end

When "good enough" isn't good enough, OpenAI embeddings are the practical ceiling. text-embedding-3-small is competitive with much larger open models for retrieval and cheap; text-embedding-3-large is the strongest off-the-shelf model for English retrieval as of 2026. Use it for online, multi-tenant services where retrieval quality is a product surface and the per-token bill is a rounding error — or a private Azure / Bedrock deployment that fits your privacy story.

Cost math: at $0.02 per 1M tokens, text-embedding-3-small embeds ~4 million 200-token thoughts for one dollar. Most personal agents never hit that. Most production agents on a busy day do. The benchmarking chapter shows how to measure whether the quality jump is worth it for your chain.

No provider — the lexical-only path

You can run MentisDB with no sidecar at all. Retrieval is then BM25-style lexical scoring (with per-field DF gating and the built-in thesaurus expansion since 0.9.9), plus graph expansion through relations and implicit edges. For many real workloads — anything where the user types names of functions, error strings, or specific identifiers — this beats a 256-d hash and is competitive with a 384-d semantic model.

Use it when memory is dominated by code, error logs, file paths, or configuration keys, you are in a constrained environment where even 80 MB is too much, or you are doing pure keyword/identifier search. There is no penalty: ranked retrieval works identically; the only difference is the absence of a vector similarity score. The chain, the log, the relations, the graph — all of that lives outside the sidecar.

Custom — when the others don't fit

A custom provider is the right answer when:

You have a domain-specific model (code, biomedical, legal, financial) that meaningfully beats the general-purpose options on your data.
You need a non-standard dimension (e.g., a 768-d CodeBERT-style model, or a 1024-d model fine-tuned on your repo).
You need an embedding space shared with another system — you already have vectors from another pipeline and want the same space for cross-system retrieval.
You have a private model on your own infrastructure and the privacy story forbids any third-party API.

Implementing the `EmbeddingProvider` trait

A provider is a struct that implements one trait with two methods: metadata (describes the embedding space) and embed_batch (turns a batch of inputs into a batch of vectors).

The trait, in full

pub trait EmbeddingProvider {
    type Error: std::error::Error + Send + Sync + 'static;
    fn metadata(&self) -> &EmbeddingMetadata;
    fn embed_batch(
        &self, inputs: &[EmbeddingInput],
    ) -> Result<Vec<EmbeddingVector>, Self::Error>;
}

EmbeddingMetadata is a (model_id, dimension, embedding_version) tuple that uniquely identifies the embedding space. Different metadata means different sidecars on disk (see Switching providers mid-chain below).

A custom provider: tag-weighted local embeddings

Suppose you want the zero-dependency story of LocalTextEmbeddingProvider but with a twist: in your domain, certain tags (auth, billing, incident) are more discriminating than the raw text. A wrapper that boosts tag-derived features is a few dozen lines:

use mentisdb::search::{
    EmbeddingInput, EmbeddingMetadata, EmbeddingProvider, EmbeddingVector,
    LocalTextEmbeddingProvider,
};
use std::collections::HashMap;
use std::error::Error;
use std::fmt;

/// Wraps the local hashed-trigram provider and folds a tag-based
/// signal into the same 256-d dense vector. Tag hints are passed via
/// input_id ("doc-7|tag:auth|tag:billing") so the EmbeddingInput
/// contract is unchanged.
pub struct TagWeightedEmbeddingProvider {
    inner: LocalTextEmbeddingProvider,
    tag_slots: HashMap<String, [usize; 8]>,
}

impl TagWeightedEmbeddingProvider {
    pub fn new(important_tags: &[&str]) -> Self {
        let mut tag_slots = HashMap::new();
        for (idx, tag) in important_tags.iter().enumerate() {
            let slots: [usize; 8] = core::array::from_fn(|i| (idx * 8 + i) % 256);
            tag_slots.insert((*tag).to_string(), slots);
        }
        Self { inner: LocalTextEmbeddingProvider::new(), tag_slots }
    }
}

#[derive(Debug)]
pub struct TagProviderError(String);
impl fmt::Display for TagProviderError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        f.write_str(&self.0)
    }
}
impl Error for TagProviderError {}

impl EmbeddingProvider for TagWeightedEmbeddingProvider {
    type Error = TagProviderError;

    fn metadata(&self) -> &EmbeddingMetadata {
        // Inherit inner metadata so the sidecar path stays
        // deterministic and comparable with the local-only index.
        self.inner.metadata()
    }

    fn embed_batch(
        &self,
        inputs: &[EmbeddingInput],
    ) -> Result<Vec<EmbeddingVector>, Self::Error> {
        let mut vectors = self.inner.embed_batch(inputs)
            .map_err(|e| TagProviderError(e.to_string()))?;
        for (input, vec) in inputs.iter().zip(vectors.iter_mut()) {
            for tag in input.input_id.split('|').skip(1) {
                let Some(tag) = tag.strip_prefix("tag:") else { continue };
                let Some(slots) = self.tag_slots.get(tag) else { continue };
                for slot in slots { vec.values[*slot] += 0.25; }
            }
        }
        // Re-normalize so cosine similarity stays well-defined.
        for vec in &mut vectors {
            let mag: f32 = vec.values.iter().map(|v| v * v).sum::<f32>().sqrt();
            if mag > 0.0 {
                for v in &mut vec.values { *v /= mag; }
            }
        }
        Ok(vectors)
    }
}

Wiring it into a chain

use mentisdb::{BinaryStorageAdapter, MentisDb};

fn main() -> io::Result<()> {
    let dir = tempfile::tempdir()?;
    let adapter = BinaryStorageAdapter::for_chain_key(
        dir.path(), "auth-billing",
    );
    let mut chain = MentisDb::open_with_storage(Box::new(adapter))?;

    let provider = TagWeightedEmbeddingProvider::new(&[
        "auth", "billing", "incident", "deploy",
    ]);
    chain.manage_vector_sidecar(provider)?;

    chain.append("oncall", ThoughtType::Insight,
        "Auth refresh tokens on /v2/login expire in 1h.")?;
    Ok(())
}

A query for "login session" now lands close to the auth thought not just because the words overlap, but because the tag signal pulls tag-related thoughts together. The same pattern generalizes: a wrapper that takes the inner provider's output, applies a domain transformation, and returns vectors in the same dimension space.

Switching providers mid-chain

A chain is append-only and does not care which provider — or how many providers — have ever been used to embed its thoughts. The sidecar path is namespaced by the provider's EmbeddingMetadata (model id + dimension + version), so multiple sidecars can coexist on the same chain:

my-chain.tcbin
my-chain.vectors.mentisdb-local-text.v1.256d.json
my-chain.vectors.fastembed-all-minilm-l6-v2.v1.384d.json
my-chain.vectors.custom-bert-code.v2.768d.json

Each sidecar is a self-contained vector index over the same append-only stream. When you switch from Local to FastEmbed, the Local sidecar is preserved and the new FastEmbed sidecar is built alongside. Ranked retrieval picks which sidecar to consult at query time. This makes upgrade paths safe: register a FastEmbedProvider, let it rebuild in the background, and once it catches up ranked queries can use it. Roll back any time — the Local sidecar is still there. The sidecar management chapter covers the operational details.

Production notes

Cold start, footprint, batch sizes

Local: ~0 ms cold start, ~4 KB static state, sub-ms per doc.
FastEmbed: ~2-5 s cold start for the model load, ~80 MB on disk, ~200-300 MB resident. Plan a 3-5 second wait at first provider use; subsequent opens reuse the cached ONNX file.
OpenAI: ~200-500 ms per batch round-trip, zero local footprint. The bill replaces the disk cost.
Custom: whatever your model and hardware dictate. Document cold start and footprint in the provider's Debug impl.

embed_batch takes a slice. The daemon and library callers chunk appends into batches of 32 by default. Slow per-call providers (remote APIs) benefit from 64-128; fast providers (local hash, fastembed) get better progress feedback at smaller batches.

The cost/quality frontier

For most agents, the practical frontier is:

Local for prototyping, CLI tools, and offline-first agents. Fair quality, operationally free.
FastEmbed for production agents on a single host. Good quality, zero per-token cost.
OpenAI small for online, multi-tenant services where the quality jump matters and a $0.02/1M token bill is a rounding error.
OpenAI large only when the gap to 3-small is measurable on your data. It often isn't.

The default — what MentisDB uses out of the box — is Local. Upgrade only when the benchmark numbers say you should.

Decision flowchart

Run this in your head at provider choice time:

Are you running offline / in a constrained env? ├── Yes ─→ Use Local. (256-d, no deps, no network.) │ └── No │ Are retrieval quality gains worth >$0.02/1M tokens at your scale? ├── No ─→ Use FastEmbed. (384-d, ~80 MB model, $0/token.) │ Drop to Local if the model download is unacceptable. │ └── Yes │ Do you have a private model that beats OpenAI on your data? ├── Yes ─→ Use your custom provider. │ See "Implementing the EmbeddingProvider trait". │ └── No ─→ Use OpenAI text-embedding-3-small. Bump to 3-large only after measuring the gap. (Or, if lexical + graph retrieval is enough, skip the sidecar entirely.)

What's next

Picking a provider is the easy part. Keeping the sidecar fresh as the chain grows, shipping it to a remote bucket, and rebuilding from scratch when you switch providers is the operational work. 3.2: Vector Sidecar Management covers that workflow.