Agent EngineerAgent 工程师

I build the trust layer for AI agents. 我为 AI agent 构建信任层

As agents move from demos to unsupervised work, the bottleneck shifts from can it do the task to can you trust it to. I build the reliability, verification, and provenance infrastructure that makes autonomous agents safe to ship. 当 agent 从 demo 走向无人值守的真实工作,瓶颈就从「它能不能做」变成「你敢不敢信它做」。我构建让自主 agent 能放心上线的可靠性、验证与溯源基础设施。

No live demo here — these are recorded runs, replayed. Logs are claims; replays are proofs.这里没有 live demo——只有录制的运行与回放。Logs are claims; replays are proofs.

The through-line主线

One bet, three angles.一个赌注,三个切面。

Software taught us that capability comes first, then testing, monitoring, and CI become mandatory infrastructure. AI agents are walking the same path. My work sits in that emerging trust layer — the picks and shovels of the agent economy. 软件的规律是:能力先到,然后测试、监控、CI 成为强制基建。AI agent 正在重走这条路。我的工作就处在这层正在形成的信任基建里——agent 经济的铲子。

01  Reliability可靠性 02  Verification验证 03  Provenance溯源
Selected work精选作品

Projects项目

trace-vault

LIVE已上线
Snapshot testing & replay for AI agents — CI for non-deterministic systems.AI agent 的快照测试与重放——给非确定性系统做 CI。
Records real agent runs and replays them to catch silent regressions when a prompt or model changes. Built on one core insight: determinism ≠ faithfulness — a reproducible run isn't a correct one.录制真实 agent 运行并重放,在改了 prompt 或模型后抓出悄悄发生的回归。核心洞见:determinism ≠ faithfulness——可复现 ≠ 正确。
↳ Field-tested on mastra#17737 — CI diagnosis adopted upstream, credited in the commit.↳ 同域实战:为 mastra#17737 根因诊断红 CI,修复被上游采纳并在 commit 中致谢
record / replay录制 / 重放regression回归测试eval评测

alfred

OPEN SOURCE已开源
A verifiable autonomous coding agent in your terminal.终端里的可验证自主编码 agent。
A TypeScript / Bun CLI agent where the harness itself is executable: "done" is a machine-enforced verify gate, memory is agent-curated but inspectable, and every hands-off run leaves a signed, replayable ledger. 773 tests, strict tsc clean.TypeScript / Bun 的 CLI 编码 agent,harness 本身可执行:「done」由机器门禁判定,记忆由 agent 维护且可审查,每次脱手运行都留下签名、可重放的 ledger。773 个测试,严格 tsc 零报错。
TypeScriptBundone-gatesdone 门禁signed ledger签名账本

nightwatch

LIVE · npm已上线 · npm
The black box recorder for overnight AI agents.过夜 AI agent 的黑匣子记录仪。
Built for the Fable-5 era of multi-day autonomous runs: every session event lands in a hash-chained, append-only ledger with worktree checkpoints, and the morning debrief independently verifies the agent's claims instead of trusting its summary. 30-second demo: npm i -g nightwatch-agent为 Fable 5 时代的多天自主运行而建:会话的每个事件都写入哈希链式、只可追加的 ledger 并对工作区做检查点;晨报独立验证 agent 的声明,而不是转述它的总结。30 秒 demo:npm i -g nightwatch-agent
TypeScripthash-chain ledger哈希链账本checkpoints检查点morning debrief晨报

More: provenant — glass-box bill auditing, HMAC-signed proof receipts · simp-skill ⭐240+ · RAG-learning更多:provenant——玻璃盒账单审计,HMAC 签名证明收据 · simp-skill ⭐240+ · RAG-learning

About关于

Hi, I'm Beamus.你好,我是 Beamus。

I'm an agent engineer focused on what happens after the demo works: making AI agents reliable, verifiable, and trustworthy enough to run without a human watching. I like the unglamorous infrastructure — replay harnesses, proof receipts, eval gates — that turns a clever prototype into something you can actually depend on. 我是一名 agent 工程师,专注于「demo 跑通之后」的事:让 AI agent 足够可靠、可验证、可信,能在没人盯着的情况下运行。我喜欢那些不性感的基础设施——重放框架、证明收据、评测门禁——正是它们把一个聪明的原型,变成真正能依赖的东西。

Open to Agent Engineer roles正在寻找 Agent 工程师机会