research · [5 sources] · 2026-05-20 02:55

New LLM vulnerabilities found in compilation and trigger strength

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 5 sources

Researchers have identified new vulnerabilities in large language models (LLMs) related to optimization techniques used during deployment. One study reveals that compilation processes, intended for efficiency, can be exploited to implant hidden backdoors that trigger under specific compiled conditions, bypassing standard safety checks and achieving high attack success rates on open-source LLMs. Another theoretical paper explores how, counter-intuitively, stronger triggers in backdoor attacks can sometimes aid defenders in high-dimensional settings, with attack success peaking at a finite trigger strength. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT New research highlights critical security vulnerabilities in LLM deployment pipelines, potentially impacting the safety and reliability of AI systems.

RANK_REASON Multiple academic papers published on arXiv detailing new research into LLM vulnerabilities and theoretical aspects of backdoor attacks.

Read on arXiv cs.AI →

paper
safety

COVERAGE [5]

arXiv cs.AI TIER_1 · Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan · 2026-05-22 04:00

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

arXiv:2605.20641v1 Announce Type: cross Abstract: Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we fi…
arXiv cs.LG TIER_1 · Aman Saxena, Jan Schuchardt, Yan Scholten, Stephan G\"unnemann · 2026-05-22 04:00

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

arXiv:2605.21780v1 Announce Type: new Abstract: Randomized smoothing is a powerful tool for certifying robustness to adversarial perturbations, including poisoning attacks via randomized training and evasion attacks via randomized inference. Extending these guarantees to backdoor…
arXiv cs.LG TIER_1 · Donald Flynn, Hadas Yaron Goldhirsh, Jonathan P. Keating, Inbar Seroussi · 2026-05-22 04:00

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

arXiv:2605.22481v1 Announce Type: new Abstract: Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to…
arXiv cs.LG TIER_1 · Inbar Seroussi · 2026-05-21 13:39

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to κ$), varying the training trigger strength $α$ …
arXiv cs.AI TIER_1 · Li Pan · 2026-05-20 02:55

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be mali…

COVERAGE [5]

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

RELATED ENTITIES

RELATED TOPICS