PulseAugur
LIVE 09:17:59
research · [2 sources] ·
0
research

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Researchers have developed a new defense mechanism called Tail-risk Intrinsic Geometric Smoothing (TIGS) to protect large language models from backdoor attacks. TIGS operates during inference without requiring model updates or external data, identifying and disrupting malicious attention patterns. Separately, a new attack framework named BadStyle has been introduced, which uses natural style triggers to create stealthy poisoned samples for LLMs. BadStyle aims to overcome limitations of previous attacks by ensuring naturalness, stabilizing payload injection, and operating under a realistic threat model. AI

Summary written by None from 2 sources. How we write summaries →

IMPACT New defense and attack methods highlight ongoing security challenges for LLMs, potentially impacting deployment strategies and the need for robust security evaluations.

RANK_REASON The cluster contains two academic papers detailing new methods for attacking and defending large language models against backdoor threats.

Read on arXiv cs.CL →

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

COVERAGE [2]

  1. arXiv cs.AI TIER_1 · Kaisheng Fan, Weizhe Zhang, Yishu Gao, Tegawend\'e F. Bissyand\'e, Xunzhu Tang ·

    Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

    arXiv:2604.24162v1 Announce Type: cross Abstract: Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but typically incur high preparation costs and degrade utility via offline purification, …

  2. arXiv cs.CL TIER_1 · Ting Liu ·

    Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

    The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings…