For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
-
Updated
Oct 10, 2025 - Python
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
Finding Direction of arrival (DOA) of small UAVs using Sparse Denoising Autoencoders and Deep Neural Networks.
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.
Sparse Autoencoders using FashionMNIST dataset
Hyperspectral Band Selection using Self-Representation Learning with Sparse 1D-Operational Autoencoder (SRL-SOA)
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"
Use evolution with sparse autoencoders
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.
performing mechanistic interpretability on inceptionV1, from linear prob and sparse direction maximization to adversarial and ciruict patching & ablation
studying (self-)supervised representations of Euclid galaxy imaging via SAEs
A framework for conducting interpretability research and for developing an LLM from a synthetic dataset.
Official code release for the paper: "Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autonencoders"
Unified SAE and Transcoder training using EleutherAI/sparsify library for neural network interpretability research
Add a description, image, and links to the sparse-autoencoders topic page so that developers can more easily learn about it.
To associate your repository with the sparse-autoencoders topic, visit your repo's landing page and select "manage topics."