TL;DR

Weaviate shipped Engram, a managed memory service for AI agents, in open preview on April 21, 2026. Instead of dumping every chat into a vector DB and hoping retrieval finds the right chunk, Engram runs asynchronous extract → transform → commit pipelines that deduplicate, reconcile contradictions, update outdated facts, and forget on purpose. In Weaviate own dogfooding, decision-archaeology queries came back ~30% faster than reading reconstructed notes — at a cost of roughly 10% session overhead.

What is new

Engram ships three ideas that naive vector-DB memory does not have:

  • Topics — natural-language descriptions of what information actually matters. Think of them as magnets that pull matching facts out of raw data. Topics can be project-wide, user-scoped (multi-tenant), or property-scoped (soft isolation by conversation ID). A topic can also be marked bounded so only one memory exists per scope — perfect for user profiles.
  • Transform steps — for every new piece of data, Engram pulls related memories from Weaviate, then asks an LLM what to do: deduplicate, reconcile, update, or keep separate.
  • Buffers — aggregate across multiple pipeline runs, flushing on message count, topic, 24-hour interval, or idle timer. Useful for debouncing noisy inputs or building daily rollups.

The API surface is small on purpose: client.memories.add() to ingest, client.memories.search() to retrieve. Templates for personalization and continual learning cover common cases out of the box.

Why it matters

Most agentic apps treat memory as storage: append every turn to a vector index, retrieve top-k at inference, hope for the best. That works in demos. In production it breaks down — the agent surfaces contradictory facts from three months apart, quotes preferences the user has since changed, or drowns in noise from irrelevant chat history.

Weaviate argument is that memory is infrastructure, not a feature bolt-on. Real memory systems need write control, deduplication, reconciliation, amendment, and purposeful forgetting. Files like MEMORY.md give you zero-latency always-in-context recall, but cap out around 200 lines of stable facts. Engram picks up where files stop: reasoning chains, rejected alternatives, decisions that evolved across sessions.

Technical facts

ComponentWhat it does
Extract stepHandles conversation (role/content), plain string events, or pre-extracted memories
Transform stepSemantic search over existing memories + LLM decision: dedupe, reconcile, update, or keep
Commit stepPersists final state; intermediate values never leak into retrieval
Topic scopesProject-wide, user-scoped (multi-tenant), property-scoped (key/value soft isolation)
Bounded topicsCap one memory per scope — good for user profile, preference record
Buffer triggersMessage count, topic type, 24h interval, idle timer
InterfaceREST + Python client, templates for personalization & continual learning

Comparison

CapabilityNaive vector storeFile-based (MEMORY.md)Engram
Semantic searchYesNoYes
DeduplicationNoManualAutomatic
Contradiction reconciliationNoManualLLM-driven
Fact updates over timeNoManual editRewrite on commit
Purposeful forgettingNoManual deleteBuilt-in
Multi-tenant isolationDIYN/ANative
Read latency~tool callZero~tool call

Use cases

Personalization with evolving facts. A user previously told the agent they are an ML engineer working from home. Next week: “I just got promoted to CEO!” Engram extracts “User has been promoted to CEO”, retrieves the two related memories, and lets the LLM rewrite: memory_1 becomes “User used to work as an ML engineer, but has now been promoted to CEO”; memory_2 (works from home) stays untouched; the new extraction is deleted to prevent duplication.

Multi-agent continual learning. A main agent talks to the user while a search subagent runs queries. User feedback like “I prefer filtering by genre over text search” gets extracted across separate context windows, buffered, and combined into one reusable experience memory — available to both agents next run.

Decision archaeology. Weaviate own PM tested Engram across two weeks of Claude Code sessions spanning product strategy, spec writing, and design. Pulling up prior product-vision work felt, quote, “like picking up a conversation with someone who had actually been there” — about 30% faster than reconstructing from notes.

Limitations & pricing

  • Silent failures: Claude sometimes ignored Engram on planning tasks even with explicit retrieval instructions — no signal that context was skipped.
  • ~10% session overhead in Weaviate internal tests, with 19-second startup costs in some flows.
  • Team vs personal scopes are not yet cleanly distinguished — a flagged gap.
  • Cold starts from existing docs are harder than incremental capture — bootstrapping a filled memory is a known rough edge.
  • Hallucination risk remains: without enough grounded context, agents still fabricate plausible details.
  • Pricing: not published. Preview access is signup-only; no GA date or SLA yet.

What is next

Weaviate is pushing Engram as infrastructure — “memory you maintain, not memory you store” — with continual-learning and multi-agent patterns as the highlighted next frontier. Expect more templates, clearer team/personal scoping, and a GA announcement down the line. For anyone shipping agents today: the thesis is simple — if your memory layer cannot deduplicate, reconcile, update, and forget, you do not have memory, you have a growing pile of chat logs with a search bar.

Nguồn: Engram Deep Dive, Oh Memories, Where did You Go, The Limit in the Loop.