A new paper analyzes annotation variation in NLP datasets, focusing on harmful language detection. The research combines annotator characteristics with linguistic properties of the data to understand labeling discrepancies. Findings indicate that interactions between annotator traits and item features, particularly lexical cues and annotator attitudes, are crucial, but patterns vary significantly across different datasets, cautioning against overgeneralization. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Highlights the importance of considering both annotator and data characteristics for reliable NLP model training.
RANK_REASON The cluster contains an academic paper published on arXiv.