Agent engineer · verification systemsAgent 工程师 · 验证系统

I build the trust layer for AI agents. 我为 AI agents 构建会自证的信任层。

The demo is not the finish line. I build replay harnesses, black-box ledgers, eval gates, and proof receipts that make autonomous agents dependable after the human stops watching. Demo 跑通不是终点。我做 replay harness、黑匣子账本、eval gate 和 proof receipt，让自主 agent 在没人盯着时也能被验证、被追责、被信任。

Inspect work Trust Report v0

agent.run attested

trace

eval

receipt

proof PASS

Scroll-driven verification滚动驱动的验证叙事

Trust is a sequence, not a slogan.信任是一条链路, 不是一句口号。

The page moves like the system I build: a run starts as a claim, passes through independent records and replay, then exits as a signed verdict. 这个页面的滚动方式就是我的产品观: 一次运行先只是声明, 经过独立记录与重放验证, 最后才变成可签收的裁决。

01
Demo worksThe agent completes a real task.
02
Trace recordedEvery event lands in an append-only ledger.
03
Dual witnessThe agent's receipt meets an independent black box.
04
Eval gate passedClaims are replayed instead of trusted.
05
Proof receipt signedOne verdict format ships to CI.

trust-report

$ trace-vault replay agent-run.json
· reading ledger ........ idle
· witness cross-check ... idle
· eval gate ............. idle
· proof receipt ......... pending

sha2560000-0000

One ecosystem一个生态

Not isolated projects. One agent trust stack.不是散落项目, 是一套 agent trust stack。

RUN Alfred machine-gated autonomy, signed receipts RECORD NightWatch independent black box, claims re-verified GATE trace-vault offline replay, determinism ≠ faithfulness

Two contracts make them one system: Claude Code-compatible hooks, and one Agent Trust Report v0 verdict that CI can read. 两个契约把它们焊成一个系统: Claude Code 兼容 hooks, 以及 CI 可读取的同一种 Agent Trust Report v0 裁决格式。

Specimen cards项目标本

Systems that leave evidence.会留下证据的系统。

View all repositories ↗

live GATE

trace-vault

Snapshot testing & replay for AI agents — CI for non-deterministic systems.

Core insight: determinism ≠ faithfulness. A reproducible run still has to prove it did the right thing.

Live demo ↗ GitHub ↗

npm RUN

Alfred

A verifiable autonomous coding agent in your terminal.

“Done” is a machine-enforced verify gate; every hands-off run leaves a signed, replayable ledger.

Docs ↗ npm ↗ GitHub ↗

site RECORD

nightwatch

The black box recorder for overnight AI agents.

Every session event lands in a hash-chained append-only ledger, with morning debriefs that verify the agent’s claims.

Site ↗ npm ↗ GitHub ↗

wip PROVE

provenant

Glass-box bill auditing with HMAC-signed proof receipts.

A provenance lens for cost, usage, and claims — built around receipts that can be checked without trusting the narrator.

GitHub ↗ Spec ↗

About关于

Hi, I'm Beamus.你好, 我是 Beamus。

I'm an agent engineer focused on what happens after the demo works: making AI agents reliable, verifiable, and trustworthy enough to run without a human watching.

My bet for the loop-engineering era: when models write the code, the engineering that still matters is the loop around them — memory, replay, eval, provenance, and receipts.

Open to Agent Engineer & AI Engineer roles

Email ↗ GitHub ↗

Writing & guides写作与教程

Notes from the loop.来自循环里的笔记。

Agent Trust Report v0 — the spec One verdict format three gates emit, plus a real dual-witness run with raw ledgers committed. SPEC ↗ RAG, from zero to production A Chinese learning path: LangChain basics → GraphRAG → Agentic RAG → enterprise. GUIDE ↗ Mastra for TypeScript developers Build the mental model first, then Memory, RAG, MCP, eval, and deploy. ZH ↗