PulseAugur
LIVE 06:49:06
research · [1 source] ·
0
research

TildeOpen LLM boosts low-resource European languages with curriculum learning

Researchers have introduced TildeOpen LLM, a 30-billion-parameter open-weight model designed to improve performance across 34 European languages. The model addresses data imbalance by employing dataset upsampling and a curriculum-based training schedule that shifts between uniform and natural language distributions. Evaluations indicate TildeOpen outperforms existing open-weight multilingual models, especially for Baltic, Finno-Ugric, and Slavic languages, with human assessments showing a significant reduction in linguistic errors. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enhances multilingual AI capabilities, particularly for underrepresented European languages, potentially lowering barriers for non-English content generation and comprehension.

RANK_REASON This is a research paper detailing the release of a new open-weight multilingual language model.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Toms Bergmanis, Martins Kronis, Ingus J\=anis Pretkalni\c{n}\v{s}, D\=avis Nicmanis, Je\c{l}izaveta Jelinska, Roberts Rozis, Rinalds V\=iksna, M\=arcis Pinnis ·

    TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

    arXiv:2603.08182v2 Announce Type: replace Abstract: Large language models often underperform in many European languages due to the dominance of English and a few high-resource languages in training data. This paper presents TildeOpen LLM, a 30-billion-parameter open-weight founda…