Reproducing ESMC’s SAE features: I lost half a day to one wrong repo

reproduction

esmc

interpretability

Reproducing Figure S33 of the ESMC paper, solo. I picked the wrong one of two similarly-named SAEs, lost half a day, and came out the other side at the most interesting feature in the figure.

Author

Ken Osumi

Published

June 22, 2026

What you’ll get from this

If you want to reproduce ESMC’s SAE features yourself, here are the snags and what was on the other side of them. By the end you’ll know:

there are two similarly-named SAEs, and following the README quietly lands you on the wrong one (and it’s hard to notice);
what to suspect first when the numbers look off;
the reproduction result itself — a model that only saw sequence had bundled residues that are close in 3D into a single feature.

One aside first. The only reason I can do this solo, with no funding and no big team, is that EvolutionaryScale and the Chan Zuckerberg Biohub put all of ESMC in the open: the weights, the SAEs, the ESM Atlas, the annotations, even training byproducts. Honestly, I’m just grateful for it.

Two SAEs with almost the same name

ESMC has two SAEs whose names look almost identical:

the right one: biohub/ESMC-6B-sae-layer60-k64-codebook16384. The paper and the ESM Atlas use it, and it’s the only one that ships normalization stats.
the mix-up: biohub/ESMC-6B-sae-k64-codebook16384 (an all-layer pack).

And the sample code in the esm README uses the second one. It’s just the first code you see.

It’s not really a trap, though — more like a byproduct. Grep the repo and the all-layer pack shows up in exactly one place: that README snippet. No tutorial, no script, not the paper. The actual feature tutorial uses -layer60-. My guess for why the all-layer pack exists at all: you need per-layer SAEs to work out which layer separates cleanest, and layer 60 (about 75% of the way up) won. They shipped the byproducts instead of throwing them out. It’s just them being generous.

“Almost everything matches” is the dangerous part

The annoying thing about the mix-up is that the two SAEs share feature indices.

Most features match either way. F278 lines up at K275 (the P-loop), F4787 at A296. The values agree, so you assume you’re set up right. But a few were quietly different:

F10351: should be 1.28 at I453, came out 0.
F8957: should be 1.68 at H242, came out 0.
F1635: peak should be at G409, showed up at G2.

If everything broke, you’d notice. The trouble is when most of it matches and one thing is off. That’s the hardest case to catch.

Four reasonable hypotheses, all wrong

Here’s the honest part. F10351 wouldn’t light up. I built four plausible hypotheses around that one point and knocked them down one by one:

A weak feature dropping out of the Top-K=64? The API raw was 1.28. Not weak. No.
A normalization thing? The API raw and the local raw already disagreed. No.
Missing fused kernels drifting the residual stream? I added xformers, forced fp32, got as far as trying to rebuild transformer_engine — nothing. No.
A Top-K rank-boundary flip? Local pre-activation was 0.39 (needed 1.26, rank 297). No.

All wrong. The cause was the SAE repo mix-up — one line. Everything I did around the kernels was wasted.

Honestly, partway through I could feel I was over-building a story. Floating point, kernels — a cause hidden deep down is a clean, technical-sounding explanation. But “explains cleanly” and “is actually the cause” are two different things.

What got me out: re-asking two questions

What got me out wasn’t a new experiment. I just backed up and re-asked two questions.

First: how does the official side actually run this SAE analysis — did I really look at the repo and the notebook? When I did, I found the real tutorial uses -layer60-.

Second: is the weight the API serves actually the paper’s model? It turned out there were two model checkpoints too — biohub/ESMC-6B (2026-05, the paper) and esmc-6b-2024-12 (the 2025-12 version the API still serves). Treat the API as ground truth and you’ll decide you’re the one who’s off. I did, for a while.

The lesson is simple. When most things match and one is exactly broken, check whether you’re looking at a different artifact, version, or repo before you dig into floating point. Don’t build a tidy mechanism story before you’ve isolated the cause.

What was on the other side

Once I swapped in the right layer-60 SAE, F10351 reproduced fine. And the feature the wrong SAE had been hiding turned out to be the most interesting one.

I scanned all 4,750 mutations across the Src kinase domain (270–519). The mutations that fully break F10351 (sensitivity $s = -1.0$) are scattered far apart in sequence. The paper reports up to 79 residues away, median Cα 11 Å. My scan had mutations as far as 104 residues away still breaking it, and A374 (79 residues away in sequence, 12.8 Å in 3D) matched the paper’s example. The median 3D distance of the breaking positions is about 11 Å.

So: a model that only ever saw sequence had grouped residues that sit close together in 3D, and work together, into a single feature. Far apart on the chain, but folded up next to each other — and the feature just picked that up. With the wrong SAE, the whole thing was gone. Glad I caught it.

Note

One honest note on the number. F278’s within-position mutational sensitivity correlates with measured kinase activity (Ahler 2019 DMS, ProteinGym SRC_HUMAN_Ahler_2019) at Spearman $\rho = 0.692$. The paper’s headline $\rho \approx 0.74$ is a 108-feature ridge fit; mine is a single-feature, within-position correlation. Different measurement — I reproduced the figure’s behavior, not the exact statistic.

Setup

Model: biohub/ESMC-6B (2026-05-19), layer 60. SAE: biohub/ESMC-6B-sae-layer60-k64-codebook16384 (2026-05-26), bf16, Top-K = 64.
Compute: one H100 on Modal. I reused an existing Modal image with the ESMC weights, the esm package, and CUDA baked in, and only pulled the SAE weights at startup.
Data: SRC_HUMAN_Ahler_2019 (ProteinGym). 3D distances from AlphaFold AF-P12931-F1.
The API isn’t built for scans (100–500 s per call, drops under parallelism, bills failed calls). A local GPU runs 4,750 mutations in minutes.

Reproduction code and data are coming soon: a public repo with modal run to regenerate every figure on your own GPU, plus precomputed feature activations so you can reproduce the plots without a GPU.

Wrapping up

The feature that cost me the most time and the feature I found most interesting were the same one — F10351. The one thing that was quietly broken turned out to be the one worth having.

I could only check this solo because the weights, the SAEs, and even the byproducts are all out in the open. Thanks again to EvolutionaryScale and the Chan Zuckerberg Biohub.

--- title: "Reproducing ESMC's SAE features: I lost half a day to one wrong repo" description: "Reproducing Figure S33 of the ESMC paper, solo. I picked the wrong one of two similarly-named SAEs, lost half a day, and came out the other side at the most interesting feature in the figure." author: "Ken Osumi" date: "2026-06-22" categories: [reproduction, esmc, interpretability] --- ## What you'll get from this If you want to reproduce ESMC's SAE features yourself, here are the snags and what was on the other side of them. By the end you'll know: - there are two similarly-named SAEs, and following the README quietly lands you on the wrong one (and it's hard to notice); - what to suspect first when the numbers look off; - the reproduction result itself — a model that only saw sequence had bundled residues that are close in 3D into a single feature. One aside first. The only reason I can do this solo, with no funding and no big team, is that EvolutionaryScale and the Chan Zuckerberg Biohub put all of ESMC in the open: the weights, the SAEs, the ESM Atlas, the annotations, even training byproducts. Honestly, I'm just grateful for it. ## Two SAEs with almost the same name ESMC has two SAEs whose names look almost identical: - the right one: `biohub/ESMC-6B-sae-layer60-k64-codebook16384`. The paper and the ESM Atlas use it, and it's the only one that ships normalization stats. - the mix-up: `biohub/ESMC-6B-sae-k64-codebook16384` (an all-layer pack). And the sample code in the esm README uses the second one. It's just the first code you see. It's not really a trap, though — more like a byproduct. Grep the repo and the all-layer pack shows up in exactly one place: that README snippet. No tutorial, no script, not the paper. The actual feature tutorial uses `-layer60-`. My guess for why the all-layer pack exists at all: you need per-layer SAEs to work out which layer separates cleanest, and layer 60 (about 75% of the way up) won. They shipped the byproducts instead of throwing them out. It's just them being generous. ## "Almost everything matches" is the dangerous part The annoying thing about the mix-up is that the two SAEs share feature indices. Most features match either way. F278 lines up at K275 (the P-loop), F4787 at A296. The values agree, so you assume you're set up right. But a few were quietly different: - F10351: should be 1.28 at I453, came out 0. - F8957: should be 1.68 at H242, came out 0. - F1635: peak should be at G409, showed up at G2. If everything broke, you'd notice. The trouble is when most of it matches and one thing is off. That's the hardest case to catch. ## Four reasonable hypotheses, all wrong Here's the honest part. F10351 wouldn't light up. I built four plausible hypotheses around that one point and knocked them down one by one: 1. A weak feature dropping out of the Top-K=64? The API raw was 1.28. Not weak. No. 2. A normalization thing? The API raw and the local raw already disagreed. No. 3. Missing fused kernels drifting the residual stream? I added xformers, forced fp32, got as far as trying to rebuild `transformer_engine` — nothing. No. 4. A Top-K rank-boundary flip? Local pre-activation was 0.39 (needed 1.26, rank 297). No. All wrong. The cause was the SAE repo mix-up — one line. Everything I did around the kernels was wasted. Honestly, partway through I could feel I was over-building a story. Floating point, kernels — a cause hidden deep down is a clean, technical-sounding explanation. But "explains cleanly" and "is actually the cause" are two different things. ## What got me out: re-asking two questions What got me out wasn't a new experiment. I just backed up and re-asked two questions. First: how does the official side actually run this SAE analysis — did I really look at the repo and the notebook? When I did, I found the real tutorial uses `-layer60-`. Second: is the weight the API serves actually the paper's model? It turned out there were two model checkpoints too — `biohub/ESMC-6B` (2026-05, the paper) and `esmc-6b-2024-12` (the 2025-12 version the API still serves). Treat the API as ground truth and you'll decide you're the one who's off. I did, for a while. The lesson is simple. When most things match and one is exactly broken, check whether you're looking at a different artifact, version, or repo before you dig into floating point. Don't build a tidy mechanism story before you've isolated the cause. ## What was on the other side Once I swapped in the right layer-60 SAE, F10351 reproduced fine. And the feature the wrong SAE had been hiding turned out to be the most interesting one. I scanned all 4,750 mutations across the Src kinase domain (270–519). The mutations that fully break F10351 (sensitivity $s = -1.0$) are scattered far apart in sequence. The paper reports up to 79 residues away, median Cα 11 Å. My scan had mutations as far as 104 residues away still breaking it, and A374 (79 residues away in sequence, 12.8 Å in 3D) matched the paper's example. The median 3D distance of the breaking positions is about 11 Å. So: a model that only ever saw sequence had grouped residues that sit close together in 3D, and work together, into a single feature. Far apart on the chain, but folded up next to each other — and the feature just picked that up. With the wrong SAE, the whole thing was gone. Glad I caught it. ::: {.callout-note} One honest note on the number. F278's within-position mutational sensitivity correlates with measured kinase activity (Ahler 2019 DMS, ProteinGym `SRC_HUMAN_Ahler_2019`) at Spearman $\rho = 0.692$. The paper's headline $\rho \approx 0.74$ is a 108-feature ridge fit; mine is a single-feature, within-position correlation. Different measurement — I reproduced the figure's behavior, not the exact statistic. ::: ## Setup - Model: `biohub/ESMC-6B` (2026-05-19), layer 60. SAE: `biohub/ESMC-6B-sae-layer60-k64-codebook16384` (2026-05-26), bf16, Top-K = 64. - Compute: one H100 on Modal. I reused an existing Modal image with the ESMC weights, the esm package, and CUDA baked in, and only pulled the SAE weights at startup. - Data: `SRC_HUMAN_Ahler_2019` (ProteinGym). 3D distances from AlphaFold `AF-P12931-F1`. - The API isn't built for scans (100–500 s per call, drops under parallelism, bills failed calls). A local GPU runs 4,750 mutations in minutes. > Reproduction code and data are coming soon: a public repo with `modal run` to regenerate every figure on your own GPU, plus precomputed feature activations so you can reproduce the plots without a GPU. ## Wrapping up The feature that cost me the most time and the feature I found most interesting were the same one — F10351. The one thing that was quietly broken turned out to be the one worth having. I could only check this solo because the weights, the SAEs, and even the byproducts are all out in the open. Thanks again to EvolutionaryScale and the Chan Zuckerberg Biohub.