What It Does
MemPalace is a Python library and MCP server that gives AI assistants persistent cross-session memory by storing conversation history verbatim in a locally-hosted ChromaDB vector database. The core design metaphor is the ancient “method of loci” mnemonic: conversations are organized into a hierarchy of Wings (per-person or per-project containers), Rooms (topic areas), Halls (memory type corridors: facts, events, discoveries, preferences, advice), Closets (summaries), and Drawers (verbatim files). Retrieval uses ChromaDB’s default all-MiniLM-L6-v2 embeddings with optional metadata filtering by wing and room to narrow search scope.
A four-layer memory stack controls token budget: L0 identity (~50 tokens, always loaded), L1 critical facts (~120 tokens via AAAK compression, always loaded), L2 room recall (on-demand), and L3 deep semantic search (on-demand). A secondary SQLite-based knowledge graph stores temporal entity-relationship triples with validity windows. An MCP server exposes 19 tools compatible with Claude, ChatGPT, Cursor, and Gemini CLI. An experimental “AAAK dialect” applies lossy text abbreviation for compression, but degrades benchmark performance by 12.4 percentage points and is not recommended for production use.
Key Features
- Verbatim storage with no LLM writes: Writes are fully offline, deterministic, and free — no API calls during ingestion
- Hierarchical namespace filtering: Wing and room metadata filtering narrows ChromaDB search scope, improving retrieval precision on large collections
- Four-layer progressive loading: Predictable 170-token wake-up context with deeper layers loaded on demand
- Temporal knowledge graph: SQLite triples with start/end validity windows for point-in-time queries (partially implemented — contradiction detection not yet wired in)
- 19 MCP tools: Search, memory management, agent operations, and knowledge graph queries via Model Context Protocol
- Multi-mode mining: CLI commands for ingesting project files, conversation exports, or general auto-classified content
- Session splitting: Handles large conversation exports by splitting on configurable thresholds
- Cross-client compatibility: Works with Claude, ChatGPT, Cursor, Gemini CLI, and local models via MCP or Python API
- Zero operational cost: No cloud dependency, no subscription; ChromaDB and SQLite run locally
Use Cases
- Solo developer persistent context: A developer using Claude Code who wants decisions, errors, and preferences remembered across sessions without connecting to a managed cloud service
- Local-first privacy requirement: Environments where sending conversation history to a third-party memory API (Mem0, Zep) is not acceptable for data residency or confidentiality reasons
- Low-cost long-term memory experiment: Teams evaluating verbatim-storage approaches for AI memory before committing to a production memory infrastructure
- MCP tool integration prototyping: Developers exploring how to expose agent memory as MCP tools for multi-client compatibility
Adoption Level Analysis
Small teams (<20 engineers): Potential fit for personal or small-team use cases where local-first and zero-cost are the primary requirements. The MCP integration and CLI setup are accessible. However, the project launched April 2026 with 170 commits, 4 test files for 21 modules, and multiple corrected benchmark claims — production reliability is unverified. Treat as early-stage experimental tooling.
Medium orgs (20–200 engineers): Does not fit. ChromaDB’s single-node architecture limits scale; there are no multi-user access controls, no role-based permissions, no audit logs, and no compliance certifications. The verbatim storage model also has no forgetting/decay mechanism — memories accumulate indefinitely. Better alternatives exist at this scale (Mem0 managed, Zep, Weaviate Engram).
Enterprise (200+ engineers): Does not fit. No enterprise features, no SLA, no data governance controls, no integration with enterprise identity providers. Not designed for this use case.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| Hippo Memory | TypeScript, biologically-inspired decay, BM25+embedding hybrid | You want TypeScript and memory that naturally expires unused entries |
| Honcho | Dialectic user modeling, peer-entity architecture, cloud-optional | You need user-centric relationship modeling beyond conversation storage |
| Weaviate Engram | Managed cloud memory on Weaviate, MCP integration, preview | You already use Weaviate and want managed memory infrastructure |
| OpenViking | Filesystem paradigm, tiered context, AGPL, ByteDance | You want filesystem-native context management with stronger typing |
| Mem0 | 19 vector store backends, graph memory, cloud + self-host, SOC 2 | You need production-ready memory with compliance and multi-backend support |
| Zep / Graphiti | Neo4j temporal knowledge graph, managed or self-hosted | You need strong temporal reasoning with entity relationship tracking |
| CLAUDE.md / MEMORY.md | File-based, zero tooling, natively understood by Claude Code | You want simplest possible persistent context with zero external dependencies |
| Mastra Observational Memory | No vector DB needed, text-only compression agents, 94.87% LongMemEval | You want SOTA benchmark performance without managing a vector database |
Evidence & Sources
- Independent benchmark reproduction on M2 Ultra — raw confirms 96.6%, aaak/rooms regress (GitHub Issue #39) — community reproduction confirming the benchmark measures embeddings not architecture
- agentic-memory/ANALYSIS-mempalace.md (lhl, independent) — most thorough independent code-level analysis; documents AAAK lossiness, knowledge graph gaps, benchmark attribution issues
- Multiple issues with benchmark methodology and scoring (GitHub Issue #29) — community-identified benchmark methodology problems
- LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025) — the actual benchmark paper; shows GPT-4o baseline systems score 30–70%, confirming the benchmark is non-trivial
- Observational Memory: 95% on LongMemEval — Mastra Research — alternative SOTA approach (94.87% with gpt-5-mini) using no vector database at all
- Milla Jovovich creates MemPalace AI memory tool — Cybernews — independent reporting with developer community skepticism documented
Notes & Caveats
- Benchmark attribution is the central problem: The headline “96.6% LongMemEval” measures ChromaDB’s
all-MiniLM-L6-v2embeddings on verbatim text, not the palace architecture. Independent reproducers confirmed the benchmark runner never exercises wings, rooms, or any structural code. This is not a minor caveat — it invalidates the primary marketing claim. - AAAK compression is lossy and degrades performance: Despite initial “zero information loss” claims, AAAK uses sentence truncation and regex substitution. The
decode()method cannot reconstruct original text. Performance drops 12.4 points vs. raw mode. The project corrected this post-launch. Use raw mode if recall quality matters. - Contradiction detection claimed but not implemented:
knowledge_graph.pyonly blocks exact-duplicate triples. Conflicting facts accumulate silently. Any workflow that depends on contradiction detection (e.g., tracking fact updates over time) will produce incorrect results. - No decay or forgetting mechanism: Memories accumulate indefinitely. For long-running agents, storage will grow unbounded and retrieval signal may degrade over time as the collection grows.
- ChromaDB single-node ceiling: ChromaDB is designed for prototyping under ~10 million vectors. For large-scale production with many agents or heavy memory accumulation, the underlying storage is not designed for that workload.
- Celebrity-driven star inflation: 38k+ GitHub stars within days largely reflect Milla Jovovich’s media profile rather than technical community validation. Star count is not a proxy for production readiness here.
- LoCoMo benchmark methodology flaw acknowledged: The LoCoMo dataset has 19–32 sessions per conversation. When MemPalace set
top_k=50, it retrieved more sessions than exist, guaranteeing the ground-truth answer was always in the candidate pool. The corrected LoCoMo score without reranking is 88.9%, not the headline figure. - Early stage: Created April 5, 2026. 170 commits, 4 test files for 21 modules. No production case studies published. The rapid corrections post-launch indicate an honest team but also an immature release process.
- No named individual with established track record: Ben Sigman (technical lead) does not have a publicly verifiable track record in AI memory research. The project lacks academic citations or peer-reviewed validation of its architectural claims.