PulseAugur
LIVE 10:16:44
ENTITY AdvBench

AdvBench

PulseAugur coverage of AdvBench — every cluster mentioning AdvBench across labs, papers, and developer communities, ranked by signal.

Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
RECENT · PAGE 1/1 · 2 TOTAL
  1. TOOL · CL_15984 ·

    New Logit-Gap Steering method efficiently measures AI alignment robustness

    Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…

  2. RESEARCH · CL_11458 ·

    New diagnostic tool probes LLM circuits for safety and behavior insights

    A new research paper introduces "Perturbation Probing," a diagnostic method for understanding the internal workings of large language models. This technique uses two forward passes per prompt to identify and analyze "be…