PulseAugur
LIVE 01:33:37
ENTITY LLM-as-a-Judge

LLM-as-a-Judge

PulseAugur coverage of LLM-as-a-Judge — every cluster mentioning LLM-as-a-Judge across labs, papers, and developer communities, ranked by signal.

Total · 30d
8
8 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
8
8 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 9 TOTAL
  1. TOOL · CL_29410 ·

    AI predicts human rater disagreement in LLM-generated difficulty scores

    Researchers have developed a new method to predict when AI-generated difficulty ratings for educational materials might disagree with human assessments. This approach uses a separate embedding space, like ModernBERT, to…

  2. TOOL · CL_27695 ·

    New routing method optimizes LLM judges for cost and accuracy

    A new research paper introduces a method called RACER (Robust Adaptive Cost-Efficient Routing) to optimize the use of large language models (LLMs) as judges. The study found that while explicit reasoning in LLMs signifi…

  3. TOOL · CL_25635 ·

    New framework efficiently selects data for multimodal models

    Researchers have developed a new framework called One-Step-Train (OST) to efficiently select high-quality synthetic data for training large multimodal models (LMMs). OST reframes data selection as an incremental optimiz…

  4. TOOL · CL_22500 ·

    AI researchers introduce Joint Consistency for improved test-time reasoning aggregation

    Researchers have introduced Joint Consistency (JC), a novel framework for test-time aggregation that improves reasoning trace aggregation by considering comparative interactions between candidate answers. Unlike previou…

  5. RESEARCH · CL_21818 ·

    Pest-Thinker uses RL to help MLLMs reason like entomologists

    Researchers have developed Pest-Thinker, a novel reinforcement learning framework designed to enhance the reasoning capabilities of multimodal large language models (MLLMs) for agricultural pest identification. This sys…

  6. RESEARCH · CL_10999 ·

    Amazon Nova models use LLM-as-a-judge for reinforcement fine-tuning

    Amazon's AWS ML blog details Reinforcement Learning from AI Feedback (RLAIF), a method for fine-tuning large language models. This technique uses an LLM as a judge to provide feedback, guiding the model's learning proce…

  7. RESEARCH · CL_10085 ·

    LLM-as-a-Judge in Healthcare Faces Safety and Bias Concerns

    A scoping review of Large Language Model-as-a-Judge (LaaJ) applications in healthcare identified significant gaps in validation rigor and safety assessments. The review, which screened over 11,000 studies, found that wh…

  8. COMMENTARY · CL_04666 ·

    Eugene Yan: LLM-as-judge won't fix AI product evals; focus on process

    Eugene Yan argues that relying solely on tools like LLM-as-judge will not fix product evaluation issues. Instead, he emphasizes that a robust evaluation process, akin to the scientific method, is crucial for improving A…

  9. RESEARCH · CL_00195 ·

    AI code review bots show limits in automated evaluation, GitHub COO discusses ambient AI

    A new paper explores the limitations of automated evaluation for AI code review bots, finding that current automated methods like G-Eval and LLM-as-a-Judge show only moderate alignment with human developer labels. The s…