PulseAugur
LIVE 09:17:57
research · [2 sources] ·
0
research

New MEDS dataset maps LLM math reasoning, bias, and attitudes

Researchers have introduced MEDS (Math Education Digital Shadows), a new dataset designed to evaluate how large language models perform in mathematics and identify potential biases. MEDS comprises 28,000 personas across 14 LLMs, simulating human and AI assistant interactions. It goes beyond traditional benchmarks by incorporating measures of self-efficacy, math anxiety, and cognitive networks alongside proficiency scores. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Provides a new dataset for evaluating LLM math capabilities and biases, aiding the development of safer AI tutors.

RANK_REASON The cluster describes a new dataset and research paper released on arXiv.

Read on arXiv cs.LG →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 · Naomi Esposito, Anthony Tricarico, Luisa Porzio, Ali Aghazadeh Ardebili, Massimo Stella ·

    Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

    arXiv:2604.27618v1 Announce Type: new Abstract: To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models rea…

  2. arXiv cs.LG TIER_1 · Massimo Stella ·

    Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

    To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- a…