Sparse Autoencoders
PulseAugur coverage of Sparse Autoencoders — every cluster mentioning Sparse Autoencoders across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
AI interpretability advances with Sparse Autoencoders for ASR and functional operators
Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…
-
Tree SAE model learns hierarchical features in sparse autoencoders
Researchers have developed a new method called Tree SAE to improve how Sparse Autoencoders learn hierarchical features. This approach combines activation and reconstruction conditions to ensure a stronger functional lin…
-
New SAEgis framework detects adversarial attacks on vision-language models
Researchers have developed a new framework called SAEgis to detect adversarial attacks on vision-language models (VLMs). This method utilizes sparse autoencoders (SAEs) as a plug-and-play module, requiring no additional…
-
New Diff-SAE method excels at detecting language model backdoors
Researchers have developed a new method using Sparse Autoencoders (SAEs) to detect backdoor attacks in language models. Their Differential SAE (Diff-SAE) architecture proved significantly more effective than Crosscoders…
-
New paper reveals geometric limits on feature composition in AI models
A new paper explores the theoretical limitations of feature composition in transformer models, specifically focusing on Sparse Autoencoders (SAEs). Researchers developed a geometric framework to analyze how non-linear i…
-
SoftSAE introduces dynamic sparsity for adaptive neural network interpretability
Researchers have introduced SoftSAE, a novel adaptive sparse autoencoder designed to improve the interpretability of neural networks. Unlike traditional methods that use a fixed number of features, SoftSAE dynamically a…
-
New AEN-SAE architecture tackles feature starvation in LLM interpretability
Researchers have introduced Adaptive Elastic Net Sparse Autoencoders (AEN-SAEs) to address feature starvation in sparse autoencoders used for interpreting LLM representations. Traditional methods struggle with dead neur…
-
New methods enhance sparse autoencoder interpretability and stability
Researchers have developed new methods to address limitations in sparse autoencoders (SAEs), which are used to interpret the internal representations of large language models. One paper introduces adaptive elastic net S…
-
CorrSteer method enhances LLM steering using correlated sparse autoencoder features
Researchers have developed CorrSteer, a novel method for steering large language models (LLMs) during generation using features extracted from Sparse Autoencoders (SAEs). This technique correlates sample correctness wit…
-
Qwen releases interpretability toolkit; GPT-5.5 and Claude Mythos tie in cyber attack tests
Qwen AI has released Qwen-Scope, an open-source toolkit for interpretability that integrates Sparse Autoencoders with their Qwen3.5-27B model. This tool exposes 81,000 features across 64 layers, enabling developers to p…
-
AI interprets protein models to detect biological risks
Researchers have developed a new method called SAEBER, utilizing Sparse Autoencoders (SAEs) to analyze protein design models like RFDiffusion3 and RoseTTAFold3. This technique identifies features within the models that …
-
Researchers develop new methods for out-of-distribution detection in AI models
Researchers have developed a novel framework using Sparse Autoencoders (SAEs) to analyze Vision Transformers (ViTs) for out-of-distribution (OOD) detection. This approach disentangles dense features into a structured la…