Researchers have introduced Data Language Models (DLMs), a new class of foundation models designed to natively understand tabular data without requiring preprocessing. The first DLM, Schema-1, a 140M parameter model trained on over 2.3 million datasets, demonstrates superior performance on row-level prediction benchmarks compared to existing methods. Schema-1 also excels at missing value reconstruction and can identify industry sectors from raw cell values alone, indicating a deeper structural understanding of tabular data than general-purpose language models. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Establishes a new foundation model class for tabular data, potentially streamlining AI development and decision-making in data-intensive industries.
RANK_REASON Introduces a new class of foundation models for tabular data in an academic paper. [lever_c_demoted from research: ic=1 ai=1.0]