ENTITY Sparse Autoencoders

Sparse Autoencoders

PulseAugur coverage of Sparse Autoencoders — every cluster mentioning Sparse Autoencoders across labs, papers, and developer communities, ranked by signal.

Total · 30d

12 over 90d

Releases · 30d

0 over 90d

Papers · 30d

12 over 90d

TIER MIX · 90D

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL

RESEARCH · CL_25987 · May 11 · 04:00

AI interpretability advances with Sparse Autoencoders for ASR and functional operators

Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…
TOOL · CL_26330 · May 8 · 15:57

Tree SAE model learns hierarchical features in sparse autoencoders

Researchers have developed a new method called Tree SAE to improve how Sparse Autoencoders learn hierarchical features. This approach combines activation and reconstruction conditions to ensure a stronger functional lin…
TOOL · CL_25598 · May 8 · 08:53

New SAEgis framework detects adversarial attacks on vision-language models

Researchers have developed a new framework called SAEgis to detect adversarial attacks on vision-language models (VLMs). This method utilizes sparse autoencoders (SAEs) as a plug-and-play module, requiring no additional…
TOOL · CL_25606 · May 8 · 06:30

New Diff-SAE method excels at detecting language model backdoors

Researchers have developed a new method using Sparse Autoencoders (SAEs) to detect backdoor attacks in language models. Their Differential SAE (Diff-SAE) architecture proved significantly more effective than Crosscoders…
TOOL · CL_21902 · May 8 · 04:00

New paper reveals geometric limits on feature composition in AI models

A new paper explores the theoretical limitations of feature composition in transformer models, specifically focusing on Sparse Autoencoders (SAEs). Researchers developed a geometric framework to analyze how non-linear i…
RESEARCH · CL_21785 · May 7 · 17:28

SoftSAE introduces dynamic sparsity for adaptive neural network interpretability

Researchers have introduced SoftSAE, a novel adaptive sparse autoencoder designed to improve the interpretability of neural networks. Unlike traditional methods that use a fixed number of features, SoftSAE dynamically a…
TOOL · CL_26990 · May 6 · 18:11

New AEN-SAE architecture tackles feature starvation in LLM interpretability

Researchers have introduced Adaptive Elastic Net Sparse Autoencoders (AEN-SAEs) to address feature starvation in sparse autoencoders used for interpreting LLM representations. Traditional methods struggle with dead neur…
RESEARCH · CL_18787 · May 6 · 04:00

New methods enhance sparse autoencoder interpretability and stability

Researchers have developed new methods to address limitations in sparse autoencoders (SAEs), which are used to interpret the internal representations of large language models. One paper introduces adaptive elastic net S…
TOOL · CL_15954 · May 5 · 04:00

CorrSteer method enhances LLM steering using correlated sparse autoencoder features

Researchers have developed CorrSteer, a novel method for steering large language models (LLMs) during generation using features extracted from Sparse Autoencoders (SAEs). This technique correlates sample correctness wit…
RESEARCH · CL_11213 · Apr 30 · 09:17

Qwen releases interpretability toolkit; GPT-5.5 and Claude Mythos tie in cyber attack tests

Qwen AI has released Qwen-Scope, an open-source toolkit for interpretability that integrates Sparse Autoencoders with their Qwen3.5-27B model. This tool exposes 81,000 features across 64 layers, enabling developers to p…
RESEARCH · CL_07818 · Apr 28 · 14:43

AI interprets protein models to detect biological risks

Researchers have developed a new method called SAEBER, utilizing Sparse Autoencoders (SAEs) to analyze protein design models like RFDiffusion3 and RoseTTAFold3. This technique identifies features within the models that …
RESEARCH · CL_06866 · Apr 28 · 04:00

Researchers develop new methods for out-of-distribution detection in AI models

Researchers have developed a novel framework using Sparse Autoencoders (SAEs) to analyze Vision Transformers (ViTs) for out-of-distribution (OOD) detection. This approach disentangles dense features into a structured la…

AI interpretability advances with Sparse Autoencoders for ASR and functional operators

Tree SAE model learns hierarchical features in sparse autoencoders

New SAEgis framework detects adversarial attacks on vision-language models

New Diff-SAE method excels at detecting language model backdoors

New paper reveals geometric limits on feature composition in AI models

SoftSAE introduces dynamic sparsity for adaptive neural network interpretability

New AEN-SAE architecture tackles feature starvation in LLM interpretability

New methods enhance sparse autoencoder interpretability and stability

CorrSteer method enhances LLM steering using correlated sparse autoencoder features

Qwen releases interpretability toolkit; GPT-5.5 and Claude Mythos tie in cyber attack tests

AI interprets protein models to detect biological risks

Researchers develop new methods for out-of-distribution detection in AI models