Open source · long-term memory engine开源 · 长期记忆引擎

Give your AI a
brain that remembers
给 AI 一个
会记事的大脑

Like a human brain: it quietly distills facts as you chat, updates them over time without forgetting history,
and at question time pulls only the relevant slice — answering more accurately on 1/8 the context.
像人脑一样:聊天时悄悄提炼「事实」、随时间更新而不忘历史、
提问时只取出相关的一小片 —— 用 1/8 的信息,答得反而更准。

83.6%
LongMemEval_S accuracyLongMemEval_S 准确率
fewer tokens than full-context比塞全文更省 token
55 类
typed memory stores类型化记忆
100%
open source · reproducible开源 · 可复现

Real numbers · strict DeepSeek judge · every figure you can reproduce yourself真实数据 · 严格的 DeepSeek 判分器 · 每个数字都能自己跑一遍验证

Why it matters为什么需要它

AI forgets. And "remember everything" is a trap.AI 会忘事。而"全记住"是个坑。

Every new session, the AI forgets who you are. The naive fix is to replay the whole history — but that's slow, expensive, and "lost in the middle" makes it misread.每开一段新对话,AI 就不记得你是谁。最笨的补救是把所有历史重新塞进去 —— 但这又慢又贵,还会"迷失在中间"读串。

🐢 The naive way: replay all history🐢 笨办法:每次塞全部历史

  • 50 sessions ≈ 79k tokens, re-read every time50 段对话 ≈ 7.9 万 token,全部重读一遍
  • Slow and expensive; tokens grow without bound又慢、又贵,token 随历史无限增长
  • Too much context → "lost in the middle" → wrong answers信息太多 → "迷失在中间" → 反而答错
  • Only 73.2% accurate准确率只有 73.2%

⚡ Engram: retrieve only the relevant slice⚡ Engram:只取相关一小片

  • Distills structured memory; retrieves precisely at query time提炼成结构化记忆,提问时精准检索
  • Only ~9.6k tokens (8× leaner)只用 ~9.6k token(省 8 倍)
  • Drops the noise, keeps the signal → more focused去掉噪声、只留精华 → 更聚焦
  • 83.6% accurate, and it holds up as history grows准确率 83.6%,且历史再长也不崩
How it works · dual-process, brain-inspired它怎么工作 · 双过程,仿人脑

Writes never block; structuring happens in the background写入不挡路,整理在后台

Two paths, like the brain: System-1 jots things down as you chat (<50ms, never blocks you); System-2, like sleep, distills memory, builds the graph, and resolves contradictions in the background.和人脑一样分两条路:System-1 边聊边快速记下(<50ms,不卡你);System-2 像睡觉时一样,在后台把记忆提炼、建图、解决矛盾。

<50ms
💬
You chat你聊天
raw turn stored losslessly (timestamped)原始对话无损存下(带时间)
⚗️
Distill facts提炼事实
atomic facts
(subject·predicate·object)
抽出原子事实
(主语·谓语·宾语)
🕸️
Bi-temporal graph双时间轴图谱
entities + relations;
resolve conflicts, never overwrite
建实体关系图
解决矛盾不覆盖
<100ms
🎯
Precise recall精准检索
question → relevant slice → answer提问 → 取相关一小片 → 作答

Five memory types, each its own job (like brain regions)五类记忆,各管一摊(就像大脑的不同区域)

📔
Episodic情节记忆
raw turns & events原始对话/事件
🕸️
Semantic语义记忆
facts + bi-temporal graph事实 + 双时间轴图谱
👤
Profile用户画像
preferences / habits / identity偏好/习惯/身份
📐
Procedural程序记忆
your rules & instructions你定的规则/指令
Working工作记忆
the active slice for now当前问题的活跃片
A real example · bi-temporal一个真实例子 · 双时间轴

Changed jobs? It neither confuses nor forgets你换工作了?它不搞混、也不忘

Most systems either cross wires or overwrite the old value. Engram marks the old fact "past" and the new one "current" — so it answers "now" correctly and still remembers "then".大多数系统要么记串、要么直接覆盖旧信息。Engram 把旧事实标成"历史"、新事实标成"当前" —— 既答得对"现在",又记得住"过去"。

2023-05
"I work at Tencent.""我在腾讯工作。"
past历史works at Tencent · valid 2023-05 → 2024-03在腾讯工作 · 有效期 2023-05 → 2024-03
2024-03
"I switched jobs — now at Moonshot AI.""我跳槽了,现在在 Moonshot AI。"
current当前works at Moonshot AI · valid 2024-03 → now在 Moonshot AI 工作 · 有效期 2024-03 → 至今
Ask提问
"Where do I work now?" → Moonshot AI问"我现在在哪上班?" → Moonshot AI
"Where did I work last year?" → Tencent ✓ (as-of that time)问"我去年在哪上班?" → 腾讯 ✓(回到那个时点)

With provenance: every fact answers "where did this come from?" and "what did it replace?" — auditable, not a black box.还带"溯源":每条事实都能回答"它从哪来""替换了谁" —— 不黑盒、可审计。

The core edge核心绝活

1/8 the context, yet more accurate1/8 的信息,答得反而更准

Asked "where do I work now?" — the naive way re-reads all 50 sessions; Engram pulls just the relevant facts + a few raw snippets. Less noise, sharper model.问"我现在在哪上班?"时 —— 笨办法把 50 段对话全读一遍;Engram 只取出相关的几条事实 + 几段原文。去掉噪声,模型更聚焦。

🐢 Full-context🐢 塞全文

79k tokens · read it alltoken · 全读一遍
❌ lost in the middle · 73.2%❌ 迷失在中间 · 73.2% 准

⚡ Engram lean retrieval⚡ Engram 精简检索

9.6k tokens · only the relevant slicetoken · 只取相关片
✓ sharper · 83.6% · cost stays flat as history grows✓ 更聚焦 · 83.6% 准 · 历史再长成本也不涨
Real results · no cherry-picking真实成绩单 · 不挑数据

Beats the full-context baseline, under a strict judge严格判分器下,大幅超过全文基线

LongMemEval_S · 500 questions · graded by a standard, strict DeepSeek judge — a fair number, not a friendly one.LongMemEval_S 基准 · 500 题 · 用标准、严格的 DeepSeek 判分器,所以是公平的数字 —— 不是换个宽松判分器自我安慰。

73.2
Full-context
79k tokens
裸塞全文
79k token
83.6
Engram
9.6k tokens · 8× leaner9.6k token · 8× 精简
System系统Overall总分Avg tokens平均 tokenOpen source?开源?
Engram (this project)(本项目)83.6%9.6k✅ AGPL-3.0
Full-context (same backbone)裸塞全文(同 backbone)73.2%79k

Where it stands: Engram beats full-context by +10.4 at ~8× fewer tokens — fully open, reproducible, cost flat as history grows. 83.6% on the official 500-question judge, every per-question log published.它的定位:Engram 远超裸全文(+10.4 分)且省约 8 倍 token,全开源、可复现、历史再长成本也不涨。500 题官方判分 ≈83.6%,每题原始日志公开。

# Reproduce every number yourself:每个数字都能自己跑一遍验证:
git clone https://github.com/ly-wang19/engram && cd engram
pytest  # 80+ unit tests, zero deps, no API key80+ 个单元测试,零依赖、不要 key
python eval/bench.py --data s --limit 500 --systems engram_lean,full_context ...
Full architecture · every claim has code + tests完整架构 · 每个声称都有代码 + 测试

Not a single trick — a whole memory engine不是单点技巧,是一整套记忆引擎

From write to consolidation to retrieval — layered, reproducible, cost stays flat as history grows; every capability has code and unit tests behind it.从写入、固化到检索,逐层设计、可复现、成本随历史增长保持平稳 —— 每个能力都有对应的代码和单元测试。

Dual-process write / consolidate · System-1 hot write (no LLM, <50ms) + System-2 async extract & graph双过程写入 / 固化 · System-1 热写入(不调 LLM、<50ms)+ System-2 异步抽取建图
Bi-temporal facts · valid-time + transaction-time, as-of queries, knowledge-updates first-class双时间轴事实 · 有效时间 + 事务时间,as-of 时点查询、知识更新一等公民
Non-destructive conflict resolution · slot + semantic + subsumption, LLM only when ambiguous; invalidate (not delete), full provenance非破坏式冲突解决 · 精确槽 + 语义 + 子集,仅模糊时 LLM 裁决;失效不删除、全溯源
Hybrid retrieval · semantic + BM25 lexical + graph n-hop + recency/salience, fused with RRF (optional rerank)混合检索 · 语义 + 词法 BM25 + 图 n 跳 + 时近/显著度,RRF 融合(可选重排)
Typed memory · episodic / semantic graph / profile / procedural / working, each with its own store & policy类型化记忆 · 情节 / 语义图谱 / 画像身份 / 程序性 / 工作记忆,各有存储与策略
Salience decay + Reflector · reinforce & forget, summary refresh, answer-verification loop显著度衰减 + Reflector · 强化与遗忘、摘要刷新、答案验证回路

Everything pluggable: LLM / embedder / vector store / graph store sit behind interfaces with zero-dep offline fallbackspytest passes green with no key and no services.一切可插拔:LLM / 嵌入器 / 向量库 / 图库都在接口背后,带零依赖离线兜底 —— 不用 key、不用任何服务即可 pytest 全绿。

Three lines to use it · zero setup三行就能用 · 零搭建

How to call it怎么调用

Pick any Bearer key — that's your private memory namespace. Or pip install and self-host, data entirely on your own machine.Bearer key 随便起一个,它就是你的私有记忆空间。也可以 pip install 自部署,数据全在自己机器。

# Call the hosted API (any key = your isolated namespace):
B=http://42.193.220.197:8456 ; K=my-app
# Remember (auto-extracts facts; records in your input's language):
curl -X POST $B/v1/remember -H "Authorization: Bearer $K" -d '{"content":"I do backend at ByteDance; favorite singer is Jay Chou"}'
# Recall (lean context + an answer + token savings):
curl -X POST $B/v1/recall -H "Authorization: Bearer $K" -d '{"query":"who is my favorite singer?"}'
# 直接调托管 API(任意 key = 你的隔离空间):
B=http://42.193.220.197:8456 ; K=my-app
# 记一条(自动抽取事实,按原文语种记录):
curl -X POST $B/v1/remember -H "Authorization: Bearer $K" -d '{"content":"我在字节做后端,最爱周杰伦"}'
# 召回(精炼上下文 + 答案 + 省 token):
curl -X POST $B/v1/recall -H "Authorization: Bearer $K" -d '{"query":"我喜欢哪个歌手"}'

MCP (persistent memory for Claude Desktop / Cursor): pip install "engram-memory[mcp]" && python -m engram.mcp  ·  SDK / OpenAI-compatible: point your base_url at the address above  ·  full API in the repo's API.md.MCP(给 Claude Desktop / Cursor 加持久记忆):pip install "engram-memory[mcp]" && python -m engram.mcp  ·  SDK / OpenAI 兼容:把 base_url 指到上面这个地址即可  ·  完整接口见仓库 API.md