sparse-autoencoders

Star

Here are 15 public repositories matching this topic...

OpenMOSS / Language-Model-SAEs

Star

For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.

sparse-autoencoders interpretability sparse-dictionary mechanistic-interpretability

Updated Oct 10, 2025
Python

LahiruJayasinghe / DeepDOA

Star

Finding Direction of arrival (DOA) of small UAVs using Sparse Denoising Autoencoders and Deep Neural Networks.

autoencoder denoising-autoencoders sparse-autoencoders unmanned-aerial-vehicle direction-of-arrival

Updated Oct 23, 2018
Python

dmis-lab / Monet

Star

[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers

sparse-autoencoders iclr interpretability mixture-of-experts large-language-models iclr2025

Updated Jun 23, 2025
Python

neuroexplicit-saar / Discover-then-Name

Star

Code for the paper: Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery. ECCV 2024.

sparse-autoencoders concept-extraction concept-bottleneck-models eccv2024

Updated Nov 3, 2024
Python

Abhipanda4 / Sparse-Autoencoders

Star

Sparse Autoencoders using FashionMNIST dataset

pytorch autoencoders sparse-autoencoders fashion-mnist

Updated May 19, 2018
Python

meteahishali / SRL-SOA

Star

Hyperspectral Band Selection using Self-Representation Learning with Sparse 1D-Operational Autoencoder (SRL-SOA)

machine-learning sparse-autoencoders band-selection hyperspectral-images 1d-operational-layers

Updated Apr 5, 2025
Python

Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"

sparse-autoencoders sae sparse-autoencoder

Updated Jan 26, 2025
Python

255BITS / sae-evolver

Star

Use evolution with sparse autoencoders

python evolutionary-algorithms sparse-autoencoders

Updated Jan 29, 2025
Python

ashioyajotham / exploring_saes

Star

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated May 17, 2025
Python

Dhia-naouali / Tickling-Vision-Models

Sponsor

Star

performing mechanistic interpretability on inceptionV1, from linear prob and sparse direction maximization to adversarial and ciruict patching & ablation

circuit-analysis sparse-autoencoders xai mechanistic-interpretability