What It Does

MemPalace is a Python library and MCP server that gives AI assistants persistent cross-session memory by storing conversation history verbatim in a locally-hosted ChromaDB vector database. The core design metaphor is the ancient “method of loci” mnemonic: conversations are organized into a hierarchy of Wings (per-person or per-project containers), Rooms (topic areas), Halls (memory type corridors: facts, events, discoveries, preferences, advice), Closets (summaries), and Drawers (verbatim files). Retrieval uses ChromaDB’s default all-MiniLM-L6-v2 embeddings with optional metadata filtering by wing and room to narrow search scope.

A four-layer memory stack controls token budget: L0 identity (~50 tokens, always loaded), L1 critical facts (~120 tokens via AAAK compression, always loaded), L2 room recall (on-demand), and L3 deep semantic search (on-demand). A secondary SQLite-based knowledge graph stores temporal entity-relationship triples with validity windows. An MCP server exposes 19 tools compatible with Claude, ChatGPT, Cursor, and Gemini CLI. An experimental “AAAK dialect” applies lossy text abbreviation for compression, but degrades benchmark performance by 12.4 percentage points and is not recommended for production use.

Key Features

Verbatim storage with no LLM writes: Writes are fully offline, deterministic, and free — no API calls during ingestion
Hierarchical namespace filtering: Wing and room metadata filtering narrows ChromaDB search scope, improving retrieval precision on large collections
Four-layer progressive loading: Predictable 170-token wake-up context with deeper layers loaded on demand
Temporal knowledge graph: SQLite triples with start/end validity windows for point-in-time queries (partially implemented — contradiction detection not yet wired in)
19 MCP tools: Search, memory management, agent operations, and knowledge graph queries via Model Context Protocol
Multi-mode mining: CLI commands for ingesting project files, conversation exports, or general auto-classified content
Session splitting: Handles large conversation exports by splitting on configurable thresholds
Cross-client compatibility: Works with Claude, ChatGPT, Cursor, Gemini CLI, and local models via MCP or Python API
Zero operational cost: No cloud dependency, no subscription; ChromaDB and SQLite run locally

Use Cases

Solo developer persistent context: A developer using Claude Code who wants decisions, errors, and preferences remembered across sessions without connecting to a managed cloud service
Local-first privacy requirement: Environments where sending conversation history to a third-party memory API (Mem0, Zep) is not acceptable for data residency or confidentiality reasons
Low-cost long-term memory experiment: Teams evaluating verbatim-storage approaches for AI memory before committing to a production memory infrastructure
MCP tool integration prototyping: Developers exploring how to expose agent memory as MCP tools for multi-client compatibility

Adoption Level Analysis

Small teams (<20 engineers): Potential fit for personal or small-team use cases where local-first and zero-cost are the primary requirements. The MCP integration and CLI setup are accessible. However, the project launched April 2026 with 170 commits, 4 test files for 21 modules, and multiple corrected benchmark claims — production reliability is unverified. Treat as early-stage experimental tooling.

Medium orgs (20–200 engineers): Does not fit. ChromaDB’s single-node architecture limits scale; there are no multi-user access controls, no role-based permissions, no audit logs, and no compliance certifications. The verbatim storage model also has no forgetting/decay mechanism — memories accumulate indefinitely. Better alternatives exist at this scale (Mem0 managed, Zep, Weaviate Engram).

Enterprise (200+ engineers): Does not fit. No enterprise features, no SLA, no data governance controls, no integration with enterprise identity providers. Not designed for this use case.

Alternatives

Alternative	Key Difference	Prefer when…
Hippo Memory	TypeScript, biologically-inspired decay, BM25+embedding hybrid	You want TypeScript and memory that naturally expires unused entries
Honcho	Dialectic user modeling, peer-entity architecture, cloud-optional	You need user-centric relationship modeling beyond conversation storage
Weaviate Engram	Managed cloud memory on Weaviate, MCP integration, preview	You already use Weaviate and want managed memory infrastructure
OpenViking	Filesystem paradigm, tiered context, AGPL, ByteDance	You want filesystem-native context management with stronger typing
Mem0	19 vector store backends, graph memory, cloud + self-host, SOC 2	You need production-ready memory with compliance and multi-backend support
Zep / Graphiti	Neo4j temporal knowledge graph, managed or self-hosted	You need strong temporal reasoning with entity relationship tracking
CLAUDE.md / MEMORY.md	File-based, zero tooling, natively understood by Claude Code	You want simplest possible persistent context with zero external dependencies
Mastra Observational Memory	No vector DB needed, text-only compression agents, 94.87% LongMemEval	You want SOTA benchmark performance without managing a vector database

Evidence & Sources

Independent benchmark reproduction on M2 Ultra — raw confirms 96.6%, aaak/rooms regress (GitHub Issue #39) — community reproduction confirming the benchmark measures embeddings not architecture
agentic-memory/ANALYSIS-mempalace.md (lhl, independent) — most thorough independent code-level analysis; documents AAAK lossiness, knowledge graph gaps, benchmark attribution issues
Multiple issues with benchmark methodology and scoring (GitHub Issue #29) — community-identified benchmark methodology problems
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025) — the actual benchmark paper; shows GPT-4o baseline systems score 30–70%, confirming the benchmark is non-trivial
Observational Memory: 95% on LongMemEval — Mastra Research — alternative SOTA approach (94.87% with gpt-5-mini) using no vector database at all
Milla Jovovich creates MemPalace AI memory tool — Cybernews — independent reporting with developer community skepticism documented

Notes & Caveats

Benchmark attribution is the central problem: The headline “96.6% LongMemEval” measures ChromaDB’s all-MiniLM-L6-v2 embeddings on verbatim text, not the palace architecture. Independent reproducers confirmed the benchmark runner never exercises wings, rooms, or any structural code. This is not a minor caveat — it invalidates the primary marketing claim.
AAAK compression is lossy and degrades performance: Despite initial “zero information loss” claims, AAAK uses sentence truncation and regex substitution. The decode() method cannot reconstruct original text. Performance drops 12.4 points vs. raw mode. The project corrected this post-launch. Use raw mode if recall quality matters.
Contradiction detection claimed but not implemented: knowledge_graph.py only blocks exact-duplicate triples. Conflicting facts accumulate silently. Any workflow that depends on contradiction detection (e.g., tracking fact updates over time) will produce incorrect results.
No decay or forgetting mechanism: Memories accumulate indefinitely. For long-running agents, storage will grow unbounded and retrieval signal may degrade over time as the collection grows.
ChromaDB single-node ceiling: ChromaDB is designed for prototyping under ~10 million vectors. For large-scale production with many agents or heavy memory accumulation, the underlying storage is not designed for that workload.
Celebrity-driven star inflation: 38k+ GitHub stars within days largely reflect Milla Jovovich’s media profile rather than technical community validation. Star count is not a proxy for production readiness here.
LoCoMo benchmark methodology flaw acknowledged: The LoCoMo dataset has 19–32 sessions per conversation. When MemPalace set top_k=50, it retrieved more sessions than exist, guaranteeing the ground-truth answer was always in the candidate pool. The corrected LoCoMo score without reranking is 88.9%, not the headline figure.
Early stage: Created April 5, 2026. 170 commits, 4 test files for 21 modules. No production case studies published. The rapid corrections post-launch indicate an honest team but also an immature release process.
No named individual with established track record: Ben Sigman (technical lead) does not have a publicly verifiable track record in AI memory research. The project lacks academic citations or peer-reviewed validation of its architectural claims.

MemPalace

At a Glance

What It Does

Key Features

Use Cases

Adoption Level Analysis

Alternatives

Evidence & Sources

Notes & Caveats

Related

Cognee

Cognithor

Hippo Memory

Agno