Researchers have developed ORiGAMi, a novel autoregressive transformer architecture designed to synthesize sparse and semi-structured JSON data without the need for flattening. This approach preserves the inherent structure of JSON records, unlike traditional methods that convert them into wide, sparse tables. ORiGAMi serializes JSON into key, value, and structural tokens, encoding their positions within the document tree and enforcing grammar and schema constraints. Evaluations across six datasets demonstrated that ORiGAMi outperformed existing baselines in 17 out of 18 comparisons for fidelity, detection, and utility metrics, while also maintaining high privacy scores. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new method for generating realistic synthetic data from complex JSON structures, potentially improving privacy and testing for AI systems.
RANK_REASON This is a research paper introducing a new model architecture for synthetic data generation.