PulseAugur
EN
LIVE 20:23:51
ENTITY LLM judges

LLM judges

PulseAugur coverage of LLM judges — every cluster mentioning LLM judges across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
5
5 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
5
5 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 5 TOTAL
  1. TOOL · CL_62713 ·

    New framework audits LLM judge rubrics for reliability and robustness

    Researchers have developed PReMISE, a framework designed to evaluate the effectiveness of rubrics used by Large Language Model (LLM) judges. The framework treats rubrics as measurement specifications, analyzing their st…

  2. TOOL · CL_51221 ·

    LLM judges show rationalization bias, new framework reveals

    Researchers have developed a causal framework to analyze rationalization bias in large language models (LLMs) when they act as judges for text evaluation. The study introduces new metrics and cue interventions to test i…

  3. TOOL · CL_51073 ·

    New framework tackles preference cycles in AI feedback

    Researchers have developed a new framework called Topological Consensus Rewards (TCR) to improve the stability of Reinforcement Learning from AI Feedback (RLAIF). This method addresses the issue of preference cycles, wh…

  4. TOOL · CL_40852 ·

    New benchmark reveals LLM judges unreliable for research agents

    Researchers have developed a new benchmark called REFLECT to evaluate the reliability of Large Language Models (LLMs) when used as judges for deep research agents. These agents automate complex information-seeking tasks…

  5. TOOL · CL_21933 ·

    LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

    Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…