Engram · Give your AI a brain that remembers | open-source long-term memory engine

Why it matters为什么需要它

AI forgets. And "remember everything" is a trap.AI 会忘事。而"全记住"是个坑。

Every new session, the AI forgets who you are. The naive fix is to replay the whole history — but that's slow, expensive, and "lost in the middle" makes it misread.每开一段新对话,AI 就不记得你是谁。最笨的补救是把所有历史重新塞进去 —— 但这又慢又贵,还会"迷失在中间"读串。

🐢 The naive way: replay all history🐢 笨办法:每次塞全部历史

50 sessions ≈ 79k tokens, re-read every time50 段对话 ≈ 7.9 万 token,全部重读一遍
Slow and expensive; tokens grow without bound又慢、又贵,token 随历史无限增长
Too much context → "lost in the middle" → wrong answers信息太多 → "迷失在中间" → 反而答错
Only 73.2% accurate准确率只有 73.2%

⚡ Engram: retrieve only the relevant slice⚡ Engram:只取相关一小片

Distills structured memory; retrieves precisely at query time提炼成结构化记忆,提问时精准检索
Only ~9.6k tokens (8× leaner)只用 ~9.6k token(省 8 倍)
Drops the noise, keeps the signal → more focused去掉噪声、只留精华 → 更聚焦
83.6% accurate, and it holds up as history grows准确率 83.6%,且历史再长也不崩

How it works · dual-process, brain-inspired它怎么工作 · 双过程,仿人脑

Writes never block; structuring happens in the background写入不挡路,整理在后台

Two paths, like the brain: System-1 jots things down as you chat (<50ms, never blocks you); System-2, like sleep, distills memory, builds the graph, and resolves contradictions in the background.和人脑一样分两条路:System-1 边聊边快速记下(<50ms,不卡你);System-2 像睡觉时一样,在后台把记忆提炼、建图、解决矛盾。

<50ms

💬

You chat你聊天

raw turn stored losslessly (timestamped)原始对话无损存下(带时间)

→

⚗️

Distill facts提炼事实

atomic facts
(subject·predicate·object)抽出原子事实
(主语·谓语·宾语)

→

🕸️

Bi-temporal graph双时间轴图谱

entities + relations;
resolve conflicts, never overwrite建实体关系图
解决矛盾不覆盖

→

<100ms

🎯

Precise recall精准检索

question → relevant slice → answer提问 → 取相关一小片 → 作答

Five memory types, each its own job (like brain regions)五类记忆,各管一摊(就像大脑的不同区域)

📔

Episodic情节记忆

raw turns & events原始对话/事件

🕸️

Semantic语义记忆

facts + bi-temporal graph事实 + 双时间轴图谱

👤

Profile用户画像

preferences / habits / identity偏好/习惯/身份

📐

Procedural程序记忆

your rules & instructions你定的规则/指令

⚡

Working工作记忆

the active slice for now当前问题的活跃片

A real example · bi-temporal一个真实例子 · 双时间轴

Changed jobs? It neither confuses nor forgets你换工作了?它不搞混、也不忘

Most systems either cross wires or overwrite the old value. Engram marks the old fact "past" and the new one "current" — so it answers "now" correctly and still remembers "then".大多数系统要么记串、要么直接覆盖旧信息。Engram 把旧事实标成"历史"、新事实标成"当前" —— 既答得对"现在",又记得住"过去"。

2023-05

"I work at Tencent.""我在腾讯工作。"

past历史works at Tencent · valid 2023-05 → 2024-03在腾讯工作 · 有效期 2023-05 → 2024-03

2024-03

"I switched jobs — now at Moonshot AI.""我跳槽了,现在在 Moonshot AI。"

current当前works at Moonshot AI · valid 2024-03 → now在 Moonshot AI 工作 · 有效期 2024-03 → 至今

Ask提问

"Where do I work now?" → Moonshot AI ✓问"我现在在哪上班?" → Moonshot AI ✓

"Where did I work last year?" → Tencent ✓ (as-of that time)问"我去年在哪上班?" → 腾讯 ✓(回到那个时点)

With provenance: every fact answers "where did this come from?" and "what did it replace?" — auditable, not a black box.还带"溯源":每条事实都能回答"它从哪来""替换了谁" —— 不黑盒、可审计。

The core edge核心绝活

1/8 the context, yet more accurate用 1/8 的信息,答得反而更准

Asked "where do I work now?" — the naive way re-reads all 50 sessions; Engram pulls just the relevant facts + a few raw snippets. Less noise, sharper model.问"我现在在哪上班?"时 —— 笨办法把 50 段对话全读一遍;Engram 只取出相关的几条事实 + 几段原文。去掉噪声,模型更聚焦。

🐢 Full-context🐢 塞全文

79k tokens · read it alltoken · 全读一遍

❌ lost in the middle · 73.2%❌ 迷失在中间 · 73.2% 准

⚡ Engram lean retrieval⚡ Engram 精简检索

9.6k tokens · only the relevant slicetoken · 只取相关片

✓ sharper · 83.6% · cost stays flat as history grows✓ 更聚焦 · 83.6% 准 · 历史再长成本也不涨

Real results · no cherry-picking真实成绩单 · 不挑数据

Beats the full-context baseline, under a strict judge严格判分器下,大幅超过全文基线

LongMemEval_S · 500 questions · graded by a standard, strict DeepSeek judge — a fair number, not a friendly one.LongMemEval_S 基准 · 500 题 · 用标准、严格的 DeepSeek 判分器,所以是公平的数字 —— 不是换个宽松判分器自我安慰。

73.2

Full-context
79k tokens裸塞全文
79k token

83.6

Engram
9.6k tokens · 8× leaner9.6k token · 8× 精简

System系统	Overall总分	Avg tokens平均 token	Open source?开源?
Engram (this project)(本项目)	83.6%	9.6k	✅ AGPL-3.0
Full-context (same backbone)裸塞全文(同 backbone)	73.2%	79k	—

Where it stands: Engram beats full-context by +10.4 at ~8× fewer tokens — fully open, reproducible, cost flat as history grows. 83.6% on the official 500-question judge, every per-question log published.它的定位:Engram 远超裸全文(+10.4 分)且省约 8 倍 token,全开源、可复现、历史再长成本也不涨。500 题官方判分 ≈83.6%,每题原始日志公开。

# Reproduce every number yourself:每个数字都能自己跑一遍验证:
git clone https://github.com/ly-wang19/engram && cd engram
pytest # 80+ unit tests, zero deps, no API key80+ 个单元测试,零依赖、不要 key
python eval/bench.py --data s --limit 500 --systems engram_lean,full_context ...

Full architecture · every claim has code + tests完整架构 · 每个声称都有代码 + 测试

Not a single trick — a whole memory engine不是单点技巧,是一整套记忆引擎

From write to consolidation to retrieval — layered, reproducible, cost stays flat as history grows; every capability has code and unit tests behind it.从写入、固化到检索,逐层设计、可复现、成本随历史增长保持平稳 —— 每个能力都有对应的代码和单元测试。

Dual-process write / consolidate · System-1 hot write (no LLM, <50ms) + System-2 async extract & graph双过程写入 / 固化 · System-1 热写入(不调 LLM、<50ms)+ System-2 异步抽取建图

Bi-temporal facts · valid-time + transaction-time, as-of queries, knowledge-updates first-class双时间轴事实 · 有效时间 + 事务时间,as-of 时点查询、知识更新一等公民

Non-destructive conflict resolution · slot + semantic + subsumption, LLM only when ambiguous; invalidate (not delete), full provenance非破坏式冲突解决 · 精确槽 + 语义 + 子集,仅模糊时 LLM 裁决;失效不删除、全溯源

Hybrid retrieval · semantic + BM25 lexical + graph n-hop + recency/salience, fused with RRF (optional rerank)混合检索 · 语义 + 词法 BM25 + 图 n 跳 + 时近/显著度,RRF 融合(可选重排)

Typed memory · episodic / semantic graph / profile / procedural / working, each with its own store & policy类型化记忆 · 情节 / 语义图谱 / 画像身份 / 程序性 / 工作记忆,各有存储与策略

Salience decay + Reflector · reinforce & forget, summary refresh, answer-verification loop显著度衰减 + Reflector · 强化与遗忘、摘要刷新、答案验证回路

Everything pluggable: LLM / embedder / vector store / graph store sit behind interfaces with zero-dep offline fallbacks — pytest passes green with no key and no services.一切可插拔:LLM / 嵌入器 / 向量库 / 图库都在接口背后,带零依赖离线兜底 —— 不用 key、不用任何服务即可 pytest 全绿。

Three lines to use it · zero setup三行就能用 · 零搭建

How to call it怎么调用

Pick any Bearer key — that's your private memory namespace. Or pip install and self-host, data entirely on your own machine.Bearer key 随便起一个,它就是你的私有记忆空间。也可以 pip install 自部署,数据全在自己机器。

# Call the hosted API (any key = your isolated namespace):
B=http://42.193.220.197:8456 ; K=my-app
# Remember (auto-extracts facts; records in your input's language):
curl -X POST $B/v1/remember -H "Authorization: Bearer $K" -d '{"content":"I do backend at ByteDance; favorite singer is Jay Chou"}'
# Recall (lean context + an answer + token savings):
curl -X POST $B/v1/recall -H "Authorization: Bearer $K" -d '{"query":"who is my favorite singer?"}'

# 直接调托管 API(任意 key = 你的隔离空间):
B=http://42.193.220.197:8456 ; K=my-app
# 记一条(自动抽取事实,按原文语种记录):
curl -X POST $B/v1/remember -H "Authorization: Bearer $K" -d '{"content":"我在字节做后端,最爱周杰伦"}'
# 召回(精炼上下文 + 答案 + 省 token):
curl -X POST $B/v1/recall -H "Authorization: Bearer $K" -d '{"query":"我喜欢哪个歌手"}'

MCP (persistent memory for Claude Desktop / Cursor): pip install "engram-memory[mcp]" && python -m engram.mcp · SDK / OpenAI-compatible: point your base_url at the address above · full API in the repo's API.md.MCP(给 Claude Desktop / Cursor 加持久记忆):pip install "engram-memory[mcp]" && python -m engram.mcp · SDK / OpenAI 兼容:把 base_url 指到上面这个地址即可 · 完整接口见仓库 API.md。

▶ Open the console (demo key: 1)▶ 打开控制台(体验 key: 1) ★ GitHub / docs★ GitHub / 文档