Ken Osumi

Reproducing ESMC’s SAE features: I lost half a day to one wrong repo

Ken Osumi — Sun, 21 Jun 2026 15:00:00 GMT

What you’ll get from this

If you want to reproduce ESMC’s SAE features yourself, here are the snags and what was on the other side of them. By the end you’ll know:

there are two similarly-named SAEs, and following the README quietly lands you on the wrong one (and it’s hard to notice);
what to suspect first when the numbers look off;
the reproduction result itself — a model that only saw sequence had bundled residues that are close in 3D into a single feature.

One aside first. The only reason I can do this solo, with no funding and no big team, is that EvolutionaryScale and the Chan Zuckerberg Biohub put all of ESMC in the open: the weights, the SAEs, the ESM Atlas, the annotations, even training byproducts. Honestly, I’m just grateful for it.

Two SAEs with almost the same name

ESMC has two SAEs whose names look almost identical:

the right one: biohub/ESMC-6B-sae-layer60-k64-codebook16384. The paper and the ESM Atlas use it, and it’s the only one that ships normalization stats.
the mix-up: biohub/ESMC-6B-sae-k64-codebook16384 (an all-layer pack).

And the sample code in the esm README uses the second one. It’s just the first code you see.

It’s not really a trap, though — more like a byproduct. Grep the repo and the all-layer pack shows up in exactly one place: that README snippet. No tutorial, no script, not the paper. The actual feature tutorial uses -layer60-. My guess for why the all-layer pack exists at all: you need per-layer SAEs to work out which layer separates cleanest, and layer 60 (about 75% of the way up) won. They shipped the byproducts instead of throwing them out. It’s just them being generous.

“Almost everything matches” is the dangerous part

The annoying thing about the mix-up is that the two SAEs share feature indices.

Most features match either way. F278 lines up at K275 (the P-loop), F4787 at A296. The values agree, so you assume you’re set up right. But a few were quietly different:

F10351: should be 1.28 at I453, came out 0.
F8957: should be 1.68 at H242, came out 0.
F1635: peak should be at G409, showed up at G2.

If everything broke, you’d notice. The trouble is when most of it matches and one thing is off. That’s the hardest case to catch.

Four reasonable hypotheses, all wrong

Here’s the honest part. F10351 wouldn’t light up. I built four plausible hypotheses around that one point and knocked them down one by one:

A weak feature dropping out of the Top-K=64? The API raw was 1.28. Not weak. No.
A normalization thing? The API raw and the local raw already disagreed. No.
Missing fused kernels drifting the residual stream? I added xformers, forced fp32, got as far as trying to rebuild transformer_engine — nothing. No.
A Top-K rank-boundary flip? Local pre-activation was 0.39 (needed 1.26, rank 297). No.

All wrong. The cause was the SAE repo mix-up — one line. Everything I did around the kernels was wasted.

Honestly, partway through I could feel I was over-building a story. Floating point, kernels — a cause hidden deep down is a clean, technical-sounding explanation. But “explains cleanly” and “is actually the cause” are two different things.

What got me out: re-asking two questions

What got me out wasn’t a new experiment. I just backed up and re-asked two questions.

First: how does the official side actually run this SAE analysis — did I really look at the repo and the notebook? When I did, I found the real tutorial uses -layer60-.

Second: is the weight the API serves actually the paper’s model? It turned out there were two model checkpoints too — biohub/ESMC-6B (2026-05, the paper) and esmc-6b-2024-12 (the 2025-12 version the API still serves). Treat the API as ground truth and you’ll decide you’re the one who’s off. I did, for a while.

The lesson is simple. When most things match and one is exactly broken, check whether you’re looking at a different artifact, version, or repo before you dig into floating point. Don’t build a tidy mechanism story before you’ve isolated the cause.

What was on the other side

Once I swapped in the right layer-60 SAE, F10351 reproduced fine. And the feature the wrong SAE had been hiding turned out to be the most interesting one.

I scanned all 4,750 mutations across the Src kinase domain (270–519). The mutations that fully break F10351 (sensitivity ) are scattered far apart in sequence. The paper reports up to 79 residues away, median Cα 11 Å. My scan had mutations as far as 104 residues away still breaking it, and A374 (79 residues away in sequence, 12.8 Å in 3D) matched the paper’s example. The median 3D distance of the breaking positions is about 11 Å.

So: a model that only ever saw sequence had grouped residues that sit close together in 3D, and work together, into a single feature. Far apart on the chain, but folded up next to each other — and the feature just picked that up. With the wrong SAE, the whole thing was gone. Glad I caught it.

Note

One honest note on the number. F278’s within-position mutational sensitivity correlates with measured kinase activity (Ahler 2019 DMS, ProteinGym SRC_HUMAN_Ahler_2019) at Spearman . The paper’s headline is a 108-feature ridge fit; mine is a single-feature, within-position correlation. Different measurement — I reproduced the figure’s behavior, not the exact statistic.

Setup

Model: biohub/ESMC-6B (2026-05-19), layer 60. SAE: biohub/ESMC-6B-sae-layer60-k64-codebook16384 (2026-05-26), bf16, Top-K = 64.
Compute: one H100 on Modal. I reused an existing Modal image with the ESMC weights, the esm package, and CUDA baked in, and only pulled the SAE weights at startup.
Data: SRC_HUMAN_Ahler_2019 (ProteinGym). 3D distances from AlphaFold AF-P12931-F1.
The API isn’t built for scans (100–500 s per call, drops under parallelism, bills failed calls). A local GPU runs 4,750 mutations in minutes.

Reproduction code and data are coming soon: a public repo with modal run to regenerate every figure on your own GPU, plus precomputed feature activations so you can reproduce the plots without a GPU.

Wrapping up

The feature that cost me the most time and the feature I found most interesting were the same one — F10351. The one thing that was quietly broken turned out to be the one worth having.

I could only check this solo because the weights, the SAEs, and even the byproducts are all out in the open. Thanks again to EvolutionaryScale and the Chan Zuckerberg Biohub.

ESMC の SAE 図を一人で再現したら、repo の取り違えで半日溶かした話

Ken Osumi — Sun, 21 Jun 2026 15:00:00 GMT

この記事で分かること

ESMC の SAE 特徴を自分で再現してみたい人向けに、つまずきポイントと、その先にあったものを書いておきます。読み終わると、こんなことが分かるはずです。

よく似た名前の SAE が2つあって、README のまま進むと取り違えること(しかも気づきにくい)
「数値がおかしい」と思ったとき、何を先に疑うべきか
配列しか見ていないモデルが、3D で近い残基をまとめて1つの特徴にしていた、という再現結果

その前に一言。これを資金も大きなチームもない一人でやれるのは、EvolutionaryScale と Chan Zuckerberg Biohub が ESMC の重み・SAE・ESM Atlas・アノテーション、訓練の副産物まで全部オープンにしてくれているからです。素直にありがたいです。

SAE が2つある、という話

ESMC の SAE には、名前のよく似た版が2つあります。

正解：biohub/ESMC-6B-sae-layer60-k64-codebook16384。論文と Atlas が使っていて、正規化統計があるのはこれだけ。
取り違え：biohub/ESMC-6B-sae-k64-codebook16384(全層パック版)。

で、esm リポジトリの README のサンプルコードは、後者を使っています。最初に目に入るコードがこっちなんですね。

ただ、これは罠というより副産物だと思います。リポジトリを grep すると、全層版は README の1箇所だけ。tutorial にも論文にも使われていません。実際のチュートリアルはちゃんと -layer60- を使っている。たぶん、どの層が一番きれいに分解できるか比べるために各層の SAE を訓練して、layer60(深さ75%くらい)を選んだ。その副産物を捨てずに公開してくれた、というだけの話です。

「ほとんど合ってる」が一番やっかい

取り違えで一番やっかいなのは、2つの SAE が feature の index を共有していることです。

だいたいの特徴は、どっちを使っても一致します。F278 は K275(P-loop)で一致、F4787 は A296 で一致。値がそろうので「合ってる」と思ってしまう。でも一部だけ別物になっていました。

F10351：I453 で本来 1.28 のはずが 0
F8957：H242 で本来 1.68 のはずが 0
F1635：ピークが G409 のはずが G2

全部壊れていれば気づきます。やっかいなのは、大半が合っていて1つだけ壊れているとき。これが一番見つけにくい。

4つの仮説を立てて、全部外した

ここは正直な失敗談です。F10351 が光らない。この1点に、それっぽい仮説を4つ立てて、順番に潰しました。

弱い特徴が Top-K=64 から漏れた? → API の raw が 1.28。弱くない。外れ。
正規化のせい? → API の raw と local の raw がそもそも違う。外れ。
融合カーネルが無くて残差ストリームがズレた? → xformers 入れて、fp32 にして、transformer_engine を再ビルドしかけて、全部不発。外れ。
Top-K の順位境界で flip した? → local の前活性は 0.39(必要 1.26、297位)。外れ。

全部外れ。原因は SAE repo の取り違え、1行です。3番でやった kernel 周りの作業は、まるごと無駄でした。

正直、途中で薄々「story を作りすぎてるな」とは思っていました。浮動小数点とかカーネルとか、奥の方に原因がある筋書きはきれいで、それっぽい。でも、きれいに説明がつくことと本当の原因は別です。

抜けたきっかけは、問いを2つ立て直しただけ

抜け出せたのは新しい実験のおかげじゃなくて、問いを戻したからでした。

1つ目。「公式はこの SAE 解析をどう回してる? repo とノートブック、ちゃんと見たか?」見たら、実チュートリアルが -layer60- を使っているのが分かった。

2つ目。「API が返してる重みは、本当に論文と同じ?」調べたら、モデルも2系統ありました。biohub/ESMC-6B(2026-05、論文)と、API が今も返す esmc-6b-2024-12(2025-12 の旧版)。API を正解だと思い込むと、ズレてるのは自分の方だ、という逆の前提にハマる。実際ハマっていました。

教訓はシンプルです。大半が合っていて1つだけぴったり壊れているときは、浮動小数点の奥を掘る前に「別の artifact・別の版・別 repo を見てないか」を先に疑う。切り分ける前に、それっぽい機構の story を作らない。

うまくいった先にあったもの

正しい layer60 の SAE に替えたら、F10351 はちゃんと再現できました。で、間違った SAE が消していたこの特徴が、結果的に一番おもしろい特徴でした。

Src キナーゼドメイン(270–519)の全変異 4,750 をスキャンします。F10351 を完全に壊す(感受性 )変異は、配列の上ではばらばらに離れています。論文は「最大79残基離れ・median Cα 11Å」と書いている。私のスキャンでは最大104残基離れた変異まで壊していて、論文の例 A374(配列79残基・3D 12.8Å)とも一致しました。壊れる位置の 3D 距離の中央値は約11Å。

要するに、配列しか見ていないモデルが、3D で近くにあって一緒に働く残基のまとまりを、1つの特徴として捉えていた、ということです。配列上は遠いのに、畳まれた構造の中では寄り集まっている残基たち。それを特徴がそのまま拾っていた。間違った SAE だと、これがまるごと消えていたわけです。うまく拾えてよかった、というのが正直なところです。

ノート

校正の注釈。F278 の within-position 変異感受性は、実測キナーゼ活性(Ahler 2019 DMS、ProteinGym SRC_HUMAN_Ahler_2019)と Spearman でした。論文の代表値は108特徴のリッジ回帰、私の 0.692 は単一特徴・within-position の相関で、測り方が違います。再現したのは図の「振る舞い」で、統計値そのものではありません。

再現メモ

モデル：biohub/ESMC-6B(2026-05-19)の layer60。SAE：biohub/ESMC-6B-sae-layer60-k64-codebook16384(2026-05-26)。bf16、Top-K=64。
計算：Modal の H100。ESMC の重みと esm パッケージ・CUDA を焼き込んだ既存の Modal イメージを再利用して、SAE の重みだけ起動時に取得。
データ：SRC_HUMAN_Ahler_2019(ProteinGym)。3D 距離は AlphaFold AF-P12931-F1。
API はスキャンには向きません(1コール 100〜500秒、並列で接続が落ちる、失敗コールも課金)。ローカル GPU なら 4,750 変異が数分。

再現用のコードとデータは近日公開します。公開リポジトリと modal run で自分の GPU から全図を再現できるようにし、GPU を持っていない人向けには、事前計算済みの特徴活性から図を再現できる軽量版も用意する予定です。

おわりに

一番ハマった特徴と、一番おもしろかった特徴が、同じ F10351 だった、という話でした。1つだけ静かに壊れていた特徴が、抜けてみれば一番拾う価値のあるものだった。

これを一人で確かめられたのは、重みも SAE も副産物まで全部オープンにしてくれているからです。あらためて、EvolutionaryScale と Chan Zuckerberg Biohub に感謝します。