PulseAugur
LIVE 09:28:27
research · [3 sources] ·
0
research

LLMs struggle with social alignment, generating biased responses and missing social cues

A new paper reveals that current large language models often fail to align with socially desirable preferences, frequently preferring undesirable responses in domains like bias, safety, and ethics. Researchers developed a framework to evaluate reward models across these social dimensions, finding significant variation and a trade-off between bias avoidance and contextual faithfulness. Another study highlights that LLMs can generate text that triggers social comparison in humans, yet struggle to detect these same triggers themselves, demonstrating a disconnect between generation and comprehension of social cues. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Highlights the limitations of current LLM alignment techniques and the need for more nuanced evaluation methods to ensure socially responsible AI behavior.

RANK_REASON The cluster contains two academic papers published on arXiv detailing research into LLM alignment and social cue detection.

Read on arXiv cs.CL →

COVERAGE [3]

  1. arXiv cs.CL TIER_1 · Gayane Ghazaryan, Esra D\"onmez ·

    Misaligned by Reward: Socially Undesirable Preferences in LLMs

    arXiv:2605.05003v1 Announce Type: new Abstract: Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limite…

  2. arXiv cs.CL TIER_1 · Esra Dönmez ·

    Misaligned by Reward: Socially Undesirable Preferences in LLMs

    Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture soci…

  3. arXiv cs.CL TIER_1 · Hua Zhao, Jiapei Gu, Michelle Mingyue Gu ·

    Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

    arXiv:2605.01017v1 Announce Type: new Abstract: We introduce Xiaohongshu Social Comparison Reader Elicitation (XHS-SCoRE), a reader-grounded benchmark for detecting if a text-only Xiaohongshu (RedNote) post elicits UPWARD, DOWNWARD, or NEUTRAL/no clear social comparison from a fi…