A persistent agent memory layer on one database, 0.89 recall and no tenant leaks

Elastic's search-labs team has published how it built a persistent, multi-tenant memory layer for agents on a single Elasticsearch cluster, reporting 0.89 recall on a question-answering retrieval eval with zero cross-tenant leaks.

The design argument is the nugget. The common pattern splits memory across a vector store, a keyword engine, an audit log and a separate auth service, which the authors note is four things that can break plus the plumbing between them. Folding all of it into one engine with hybrid retrieval and document-level security collapses that surface. Memory is sorted into three kinds, episodic, semantic and procedural, the last carrying success and failure counters so a remembered fix that stops working can be demoted rather than blindly reused.

They are candid that the eval is corpus-specific rather than a shared benchmark like LoCoMo, so the recall figure is not yet comparable across systems. Still, it is a concrete reference architecture for the part of agent design most teams hand-wave: where long-term memory actually lives, and how you keep one tenant's facts out of another tenant's answers.