New methods enhance LLM control without sacrificing performance or reasoning

By PulseAugur Editorial · [5 sources] · 2026-05-07 09:03

Researchers have developed new methods for steering large language model (LLM) behaviors at inference time without sacrificing generation quality. One approach, Prompt-only SV (PrOSV), intervenes only on prompt tokens, outperforming traditional full-sequence steering vectors on benchmarks like AxBench. Another method, FLAS (Flow-based Activation Steering), learns a concept-conditioned velocity field to transport activations, consistently outperforming prompting on Gemma models. A third technique, SKOP (Steering via Key-Orthogonal Projections), constrains attention rerouting to preserve reasoning and retrieval performance, achieving a better trade-off between utility and steering efficacy. AI

IMPACT New techniques for inference-time LLM control could enable more nuanced and reliable AI applications by improving steering accuracy and reducing performance degradation.

RANK_REASON Three new arXiv papers introduce novel methods for controlling LLM behavior at inference time.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

COVERAGE [5]

arXiv cs.LG TIER_1 English(EN) · Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin · 2026-05-08 04:00

Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

arXiv:2605.05983v1 Announce Type: new Abstract: Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current a…
arXiv cs.LG TIER_1 English(EN) · Zehao Jin, Ruixuan Deng, Junran Wang, Xinjie Shen, Chao Zhang · 2026-05-08 04:00

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv:2605.05892v1 Announce Type: cross Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations…
arXiv cs.CL TIER_1 English(EN) · Haoyan Luo, Mateo Espinosa Zarlenga, Mateja Jamnik · 2026-05-08 04:00

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

arXiv:2605.06342v1 Announce Type: new Abstract: Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention …
arXiv cs.CL TIER_1 English(EN) · Mateja Jamnik · 2026-05-07 14:29

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matc…
arXiv cs.CL TIER_1 English(EN) · Chao Zhang · 2026-05-07 09:03

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering metho…

COVERAGE [5]

Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

RELATED ENTITIES

RELATED TOPICS