Item: sparse-autoencoder-training
Rating: 4.8
Author: Implexa

sparse-autoencoder-training

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use…

installs

stars

karma

SkillRank score ↗

4.8/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-27

sparse-autoencoder-training covers sae training via saelens for decomposing neural network activations into interpretable sparse features, with background on polysemanticity and superposition but incomplete procedural guidance.

structure

4.0

trigger phrases

3.0

procedure

4.0

edge cases

3.0

documentation

6.0

strengths

SKILL.md

SAELens: Sparse Autoencoders for Mechanistic Interpretability

SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity.

GitHub: jbloomAus/SAELens (1,100+ stars)

The Problem: Polysemanticity &#x26; Superposition

Individual neurons in neural networks are polysemantic - they activate in multiple, semantically distinct contexts. This happens because models use superposition to represent more features than they have neurons, making interpretability difficult.

SAEs solve this by decomposing dense activations into sparse, monosemantic features - typically only a small number of features activate for any given input, and each feature corresponds to an interpretable concept.

When to Use SAELens

don't have the plugin yet? install it then click "run inline in claude" again.

sparse-autoencoder-training

SKILL.md

related skills