ENTITY AdvBench

AdvBench

PulseAugur coverage of AdvBench — every cluster mentioning AdvBench across labs, papers, and developer communities, ranked by signal.

Total · 30d

2

2 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

2

2 over 90d

TIER MIX · 90D

RECENT · PAGE 1/1 · 2 TOTAL

TOOL · CL_15984 · May 5 · 04:00

New Logit-Gap Steering method efficiently measures AI alignment robustness

Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…
RESEARCH · CL_11458 · Apr 30 · 04:13

New diagnostic tool probes LLM circuits for safety and behavior insights

A new research paper introduces "Perturbation Probing," a diagnostic method for understanding the internal workings of large language models. This technique uses two forward passes per prompt to identify and analyze "be…