Apple researches diffusion model generalization; Hugging Face details Stable Diffusion tuning
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 152 sources
Apple's research paper explores the mechanisms behind compositional generalization in conditional diffusion models, specifically focusing on how they handle combinations of conditions not seen during training. The study validates that models exhibiting local conditional scores are better at generalizing, and that enforcing this locality can improve performance. Separately, Hugging Face has released several blog posts detailing various methods for fine-tuning and optimizing Stable Diffusion models, including techniques like DDPO, LoRA, and optimizations for Intel CPUs, as well as instruction-tuning and Japanese language support.
AI
Conditional diffusion models appear capable of compositional generalization, i.e., generating convincing samples for out-of-distribution combinations of conditioners, but the mechanisms underlying this ability remain unclear. To make this concrete, we study length generalization,…
Drifting Models have emerged as a new paradigm for one-step generative modeling, achieving strong image quality without iterative inference. The premise is to replace the iterative denoising process in diffusion models with a single evaluation of a generator. However, this create…
Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive language models, offering stronger global awareness and highly parallel generation. However, post-training DLMs with standard Negative Evidence Lower Bound (NELBO)-based supervised…
Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which…
We propose kernel-gradient drifting, a one-step generative modeling framework that replaces the fixed Euclidean displacement direction in drifting models with directions induced by the kernel itself. Standard drifting is attractive because it enables fast, high-quality generation…
We propose kernel-gradient drifting, a one-step generative modeling framework that replaces the fixed Euclidean displacement direction in drifting models with directions induced by the kernel itself. Standard drifting is attractive because it enables fast, high-quality generation…
Pretrained diffusion models provide powerful learned priors, but in scientific sampling the target distribution often depends on physical context that is not fully represented by one generative model. We introduce Generative Gibbs for Physics-Aware Sampling (GG-PA), a training-fr…
Diffusion large language models (dLLMs) offer a promising route to parallel and efficient text generation, but improving their reasoning ability requires effective post-training. Reinforcement learning with verifiable rewards (RLVR) is a natural choice for this purpose, yet its a…
Erasing specific concepts from text-to-image diffusion models is essential for avoiding the generation of copyrighted and explicit content. Closed-form concept erasure methods offer a fast alternative to backpropagation-based techniques, but they become less effective when scalin…
Masked diffusion language models enable parallel token generation and offer improved decoding efficiency over autoregressive models. However, their performance degrades significantly when generating multiple tokens simultaneously, due to a mismatch between token-level training ob…
Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in high dimensions. Many…
Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suit…
Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and quality tradeoff, a…
Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific attribute distributio…
arXiv cs.LG
TIER_1·Sankarshana Venugopal (Seoul National University), Mohammad Mostafavi (Seoul National University), Jonghyun Choi (Seoul National University)·
arXiv:2605.05889v1 Announce Type: cross Abstract: Diffusion-based image-to-image (I2I) translation excels in high-fidelity generation but suffers from slow sampling in state-of-the-art Diffusion Bridge Models (DBMs), often requiring dozens of function evaluations (NFEs). We intro…
arXiv:2605.05387v1 Announce Type: new Abstract: We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The rema…
arXiv:2605.06077v1 Announce Type: new Abstract: This position paper argues that understanding generalization in diffusion models requires fundamentally new theoretical frameworks that go beyond both classical statistical learning theory and the benign overfitting paradigm develop…
arXiv:2605.06507v1 Announce Type: cross Abstract: Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need t…
arXiv:2605.06169v1 Announce Type: new Abstract: Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. …
arXiv cs.LG
TIER_1·Benjamin Sterling, Yousef El-Laham, M\'onica F. Bugallo·
arXiv:2509.14225v3 Announce Type: replace Abstract: Recent advances in generative artificial intelligence applications have raised new data security concerns. This paper focuses on defending diffusion models against membership inference attacks. This type of attack occurs when th…
arXiv:2602.00175v2 Announce Type: replace Abstract: Text-to-image diffusion models (DMs) are frequently abused to produce harmful or copyrighted content, violating public interests. Concept erasure (unlearning) is a promising paradigm to alleviate this issue. However, there exist…
arXiv cs.LG
TIER_1·Tongda Xu, Mingwei He, Shady Abu-Hussein, Jose Miguel Hernandez-Lobato, Chunhang Zheng, Kai Zhao, Chao Zhou, Ya-Qin Zhang, Yan Wang·
arXiv:2603.05630v2 Announce Type: replace-cross Abstract: It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a str…
arXiv:2605.06548v1 Announce Type: new Abstract: Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve gene…
arXiv:2605.06553v1 Announce Type: new Abstract: We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symm…
arXiv:2605.06172v1 Announce Type: cross Abstract: Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of scor…
arXiv cs.LG
TIER_1·Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci·
arXiv:2605.06261v1 Announce Type: new Abstract: Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training …
arXiv cs.LG
TIER_1·Alexander Conzelmann, Albert Catalan-Tatjer, Shiwei Liu·
arXiv:2605.06366v1 Announce Type: new Abstract: Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaD…
arXiv cs.LG
TIER_1·Matias G. Delgadino, Sebastien Motsch, Advait Parulekar, William Porteous, Sanjay Shakkottai·
arXiv:2605.06538v1 Announce Type: new Abstract: Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: eve…
We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck equation, using drif…
Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are bia…
arXiv:2603.16368v2 Announce Type: replace-cross Abstract: Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, le…
arXiv:2605.04215v1 Announce Type: new Abstract: Diffusion-based Large Language Models (D-LLMs) represent a promising frontier in generative AI, offering fully parallel token generation that can lead to significant throughput advantages and superior GPU utilization over traditiona…
arXiv cs.LG
TIER_1·Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet·
arXiv:2605.05118v1 Announce Type: new Abstract: Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of stee…
arXiv:2605.05024v1 Announce Type: cross Abstract: Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propo…
arXiv:2605.05206v1 Announce Type: cross Abstract: We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carry…
arXiv:2510.08431v3 Announce Type: replace-cross Abstract: Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks rema…
arXiv cs.LG
TIER_1·Riccardo de Lutio, Tobias Fischer, Yen-Yu Chang, Yuxuan Zhang, Jay Zhangjie Wu, Xuanchi Ren, Tianchang Shen, Katarina Tothova, Zan Gojcic, Haithem Turki·
arXiv:2603.00492v2 Announce Type: replace-cross Abstract: Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifact…
arXiv:2601.19312v2 Announce Type: replace Abstract: The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algori…
arXiv cs.LG
TIER_1·James Rowbottom, Elizabeth L. Baker, Nick Huang, Ben Adcock, Carola-Bibiane Sch\"onlieb, Alexander Denker·
arXiv:2605.03497v1 Announce Type: new Abstract: Score-based diffusion models in infinite-dimensional function spaces provide a mathematically principled framework for modelling function-valued data, offering key advantages such as resolution invariance and the ability to handle i…
arXiv:2605.02973v1 Announce Type: new Abstract: Modality translation is inherently under-constrained, as multiple cross-modal mappings may yield the same marginals. Recent work has shown that diffusion bridges are effective for this task. However, most existing approaches rely on…
arXiv cs.LG
TIER_1·Francisco M. Castro-Mac\'ias, Pablo Morales-\'Alvarez, Saifuddin Syed, Daniel Hern\'andez-Lobato, Rafael Molina, Jos\'e Miguel Hern\'andez-Lobato·
arXiv:2605.04013v1 Announce Type: cross Abstract: Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference …
arXiv cs.LG
TIER_1·Andreas Makris, Paul Fearnhead, Chris Nemeth·
arXiv:2605.03712v1 Announce Type: cross Abstract: Training-free conditional diffusion provides a flexible alternative to task-specific conditional model training, but existing samplers often allocate computation inefficiently: independent guided trajectories can vary widely in qu…
arXiv cs.LG
TIER_1·Aaron Havens, Brian Karrer, Neta Shaul·
arXiv:2605.03984v1 Announce Type: new Abstract: Sampling from unnormalized densities is analogous to the generative modeling problem, but the target distribution is defined by a known energy function instead of data samples. Because evaluating the energy function is often costly,…
arXiv:2507.04330v2 Announce Type: replace-cross Abstract: We consider the problem of sampling from a probability distribution $\pi$ which admits a density w.r.t. a dominating measure. It is well known that this can be written as an optimisation problem over the space of probabili…
Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the target distribution. Parallel Tempering (P…
Sampling from unnormalized densities is analogous to the generative modeling problem, but the target distribution is defined by a known energy function instead of data samples. Because evaluating the energy function is often costly, a primary challenge is to learn an efficient sa…
Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives for dataset distillation. However, they typical…
Training-free conditional diffusion provides a flexible alternative to task-specific conditional model training, but existing samplers often allocate computation inefficiently: independent guided trajectories can vary widely in quality, and additional function evaluations along a…
Score-based diffusion models in infinite-dimensional function spaces provide a mathematically principled framework for modelling function-valued data, offering key advantages such as resolution invariance and the ability to handle irregular discretisations. However, practical imp…
arXiv cs.LG
TIER_1·Tongzhen Dang, Weiyang Ding, Michael K. Ng·
arXiv:2605.01691v1 Announce Type: new Abstract: In this paper, we propose Complex Diffusion Maps (CDM), a novel diffusion mapping framework that aims to reveal the dominant complex harmonics of high-dimensional data. Inspired by the local Gaussian kernel relevant to the heat equa…
arXiv cs.LG
TIER_1·Phil Sidney Ostheimer, Mayank Nagda, Andriy Balinskyy, Gabriel Vicente Rodrigues, Jean Radig, Carl Herrmann, Stephan Mandt, Marius Kloft, Sophie Fellenz·
arXiv:2605.01817v1 Announce Type: new Abstract: Diffusion models (DMs) excel on dense continuous data, but are not designed for sparse continuous data. They do not model exact zeros that represent the deliberate absence of a signal. As a result, they erase sparsity patterns and p…
arXiv:2605.00161v1 Announce Type: new Abstract: Diffusion language models (DLMs) are an attractive alternative to autoregressive models because they promise sublinear-time, parallel generation, yet practical gains remain elusive as high-quality samples still demand hundreds of re…
arXiv cs.LG
TIER_1·Carles Domingo-Enrich, Yuanqi Du, Michael S. Albergo·
arXiv:2605.00229v1 Announce Type: cross Abstract: We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities a…
arXiv:2509.20098v2 Announce Type: replace Abstract: Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for exis…
arXiv cs.AI
TIER_1·Yonggan Fu, Lexington Whalen, Zhifan Ye, Xin Dong, Shizhe Diao, Jingyu Liu, Chengyue Wu, Hao Zhang, Enze Xie, Song Han, Maksim Khadkevich, Jan Kautz, Yingyan Celine Lin, Pavlo Molchanov·
arXiv:2512.14067v2 Announce Type: replace-cross Abstract: Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained…
arXiv cs.AI
TIER_1·Michael Cardei, Huu Binh Ta, Ferdinando Fioretto·
arXiv:2604.26985v1 Announce Type: cross Abstract: Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-sta…
arXiv:2604.27443v1 Announce Type: cross Abstract: Generating continuous-time, continuous-space stochastic processes (e.g., videos, weather forecasts) conditioned on partial observations (e.g., first and last frames) is a fundamental challenge. Existing approaches, (e.g., diffusio…
arXiv:2510.18165v3 Announce Type: replace-cross Abstract: Diffusion language models (DLMs) are emerging as a compelling alternative to the dominant autoregressive paradigm, offering inherent advantages in parallel generation and bidirectional context modeling. However, for the ta…
arXiv cs.LG
TIER_1·Hyukjun Lim, Soojung Yang, Lucas Pin\`ede, Miguel Steiner, Yuanqi Du, Rafael G\'omez-Bombarelli·
arXiv:2603.25980v2 Announce Type: replace-cross Abstract: Transition states, the first-order saddle points on the potential energy surfaces, govern the kinetics and mechanisms of chemical reactions and conformational changes. Locating them is challenging because transition pathwa…
arXiv:2604.24832v1 Announce Type: new Abstract: Masked diffusion language models (MDMs) have recently emerged as a promising alternative to standard autoregressive large language models (AR-LLMs), yet their optimization can be substantially less stable. We study blockwise MDMs an…
arXiv:2603.20645v2 Announce Type: replace Abstract: Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper in…
Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning leads to significant performance degradation. However,…
arXiv:2604.18471v2 Announce Type: replace Abstract: Discrete diffusion language models (dLLMs) have recently emerged as a promising alternative to traditional autoregressive approaches, offering the flexibility to generate tokens in arbitrary orders and the potential of parallel …
arXiv:2604.22901v1 Announce Type: new Abstract: Diffusion models achieve remarkable success in time series generation. However, slow inference limits their practical deployment. We propose E$^2$-CRF (Error-Feedback Event-Driven Cumulative Residual Feature caching) to accelerate f…
arXiv cs.LG
TIER_1·Yiming Zhang, Sitong Liu, Ke Li, Zhihong Wu, Alex Cloninger, Melvin Leok·
arXiv:2604.24238v1 Announce Type: new Abstract: Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead…
arXiv:2604.23806v1 Announce Type: new Abstract: The reverse process in score-based diffusion models is formally equivalent to overdamped Langevin dynamics in a time-dependent energy landscape. In our prior work we showed that a bilinearly-coupled analog substrate can physically r…
arXiv:2604.24357v1 Announce Type: new Abstract: Diffusion language models generate without a fixed left-to-right order, making token ordering a central algorithmic choice: which tokens should be revealed, retained, revised or verified at each step? Existing systems mainly use ran…
arXiv:2505.16024v2 Announce Type: replace Abstract: Diffusion trajectory distillation accelerates sampling by training a student model to approximate the multi-step denoising trajectories of a pretrained teacher model using far fewer steps. Despite strong empirical results, the t…
arXiv:2603.19670v3 Announce Type: replace Abstract: Nonasymptotic diffusion analyses often decompose sampling error into score estimation, continuous reverse-time propagation, discretization, and terminal conversion. We isolate the propagation module on certified scalar-isotropic…
Diffusion language models generate without a fixed left-to-right order, making token ordering a central algorithmic choice: which tokens should be revealed, retained, revised or verified at each step? Existing systems mainly use random masking or confidence-driven ordering. Rando…
Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead edit near the data manifold, where small local …
arXiv:2603.20092v4 Announce Type: replace Abstract: Diffusion models generate structure by progressively transforming noise into data, yet the mechanisms underlying this transition remain poorly understood. In this work, we show that pattern formation in trained diffusion models …
arXiv:2605.11311v1 Announce Type: cross Abstract: Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specif…
arXiv stat.ML
TIER_1·Hao Chen, Renzheng Zhang, Scott S. Howard·
arXiv:2511.17038v3 Announce Type: replace-cross Abstract: From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its prac…
arXiv:2506.23619v2 Announce Type: replace-cross Abstract: This paper investigates the impact of posterior drift on out-of-sample forecasting accuracy in overparametrized machine learning models. We document the loss in performance when the loadings of the data generating process …
Unlearning specific concepts in text-to-image diffusion models has become increasingly important for preventing undesirable content generation. Among prior approaches, sparse autoencoder (SAE)-based methods have attracted attention due to their ability to suppress target concepts…
Class imbalance is a persistent challenge in visual recognition, particularly in safety-critical domains where collecting positive examples is expensive and rare events are inherently underrepresented. We propose a lightweight synthetic data augmentation pipeline that fine-tunes …
Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specify a coupling of the initial noises: each noise rem…
Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing diffusion models, enabling users to inject new visual concepts or styles through lightweight parameter updates. However, LoRAs can memorize training images, causing generated outputs to reproduce copyri…
arXiv stat.ML
TIER_1·Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello·
arXiv:2602.00716v4 Announce Type: replace Abstract: Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortion…
arXiv stat.ML
TIER_1·Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo·
arXiv:2605.06976v1 Announce Type: new Abstract: Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary lin…
arXiv stat.ML
TIER_1·James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz·
arXiv:2601.21951v2 Announce Type: replace Abstract: We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target,…
arXiv:2605.07625v1 Announce Type: cross Abstract: Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learnin…
arXiv:2605.07818v1 Announce Type: new Abstract: The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local con…
Sampling from score-based diffusion models incurs bias due to both time discretisation and the approximation of the score function. A common strategy for reducing this bias is to apply corrector steps based on the unadjusted Langevin algorithm (ULA) at each noise level within a p…
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjus…
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leaving unclear what kin…
The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the line…
Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained text following by d…
Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) ar…
Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We in…
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learn…
Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice d…
Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial ph…
Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For the probability flow…
arXiv cs.CV
TIER_1·Bartlomiej Sobieski, Matthew Tivnan, Dawid P{\l}udowski, Micha{\l} Jan W{\l}odarczyk, Pengfei Jin, Przemyslaw Biecek, Quanzheng Li·
arXiv:2605.05026v1 Announce Type: new Abstract: Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fing…
arXiv:2605.04412v1 Announce Type: new Abstract: 3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllab…
We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in g…
Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of pr…
Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode f…
Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on …
arXiv:2605.03317v1 Announce Type: new Abstract: Representation alignment has recently emerged as an effective paradigm for accelerating Diffusion Transformer training. Despite their success, existing alignment methods typically impose a fixed supervision target or a fixed alignme…
arXiv:2605.03877v1 Announce Type: new Abstract: Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives…
arXiv:2605.00935v1 Announce Type: cross Abstract: Diffusion models have become the foundation of modern generative systems, with most research focusing primarily on improving generation efficiency and output quality. The timestep embedding component is a crucial part of the diffu…
arXiv:2605.01653v1 Announce Type: new Abstract: We introduce SteeringDiffusion, a bottlenecked activation-level control interface for diffusion models that exposes a smooth, monotonic, and runtime-adjustable control surface over the content--style trade-off. Our method keeps the …
arXiv:2605.01107v1 Announce Type: cross Abstract: Neural networks transform data through learned representations whose geometry affects separation, contraction, and generalization. Recent work studies this geometry using discrete curvature on neighborhood graphs, suggesting Ricci…
arXiv:2510.23633v2 Announce Type: replace-cross Abstract: Pretrained diffusion models have demonstrated strong capabilities in zero-shot inverse problem solving by incorporating observation information into the generation process of the diffusion models. However, this presents an…
Representation alignment has recently emerged as an effective paradigm for accelerating Diffusion Transformer training. Despite their success, existing alignment methods typically impose a fixed supervision target or a fixed alignment granularity throughout the entire denoising t…
arXiv cs.CV
TIER_1·Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros·
arXiv:2601.00090v2 Announce Type: replace Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. Previous work has attempted to address this issue by steering the model usin…
arXiv:2406.14429v3 Announce Type: replace-cross Abstract: In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, par…
arXiv:2511.07756v4 Announce Type: replace Abstract: Diffusion models initialize generation from an isotropic Gaussian latent, yet changing only the random seed can substantially alter prompt faithfulness, composition, and visual quality. We explain this gap by distinguishing the …
arXiv cs.CV
TIER_1·Saeed Mohseni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu·
arXiv:2507.18654v2 Announce Type: replace-cross Abstract: Diffusion models are powerful tools for sampling from high-dimensional distributions by progressively transforming pure noise into structured data through a denoising process. When equipped with a guidance mechanism, these…
Neural networks transform data through learned representations whose geometry affects separation, contraction, and generalization. Recent work studies this geometry using discrete curvature on neighborhood graphs, suggesting Ricci-flow-like behavior across layers. We develop a sm…
We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This …
arXiv:2604.26365v1 Announce Type: new Abstract: To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping.…
arXiv cs.CV
TIER_1·Yang Yang, Feifan Meng, Han Fang, Weiming Zhang·
arXiv:2604.26348v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effecti…
arXiv:2405.13729v3 Announce Type: replace-cross Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, a…
arXiv cs.CV
TIER_1·Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski·
arXiv:2603.03239v2 Announce Type: replace Abstract: Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover. Relationships between modalities are fundamental for data integration but are inherently non-in…
arXiv:2604.26503v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content …
Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content with textual prompts, standard CFG relies on a g…
To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a …
Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effective for fidelity, may insufficient in terms of su…
arXiv:2604.25289v1 Announce Type: cross Abstract: Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning lead…
arXiv cs.CV
TIER_1·Nishit Anand, Manan Suri, Christopher Metzler, Dinesh Manocha, Ramani Duraiswami·
arXiv:2604.24877v1 Announce Type: new Abstract: Controlling illumination in images is essential for photography and visual content creation. While closed-source models have demonstrated impressive illumination control, open-source alternatives either require heavy control inputs …
Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning leads to significant performance degradation. However,…
arXiv stat.ML
TIER_1·Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin·
arXiv:2602.01338v2 Announce Type: replace-cross Abstract: We present algorithms for diffusion model sampling which obtain $\delta$-error in $\mathrm{polylog}(1/\delta)$ steps, given access to $\widetilde O(\delta)$-accurate score estimates in $L^2$. This is an exponential improve…
arXiv:2604.23536v1 Announce Type: new Abstract: Diffusion models have achieved unprecedented success in text-aligned generation, largely driven by Classifier-Free Guidance (CFG). However, standard CFG operates strictly on instantaneous gradients, omitting the intrinsic curvature …
arXiv:2604.23552v1 Announce Type: cross Abstract: Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is sha…
arXiv:2602.04749v3 Announce Type: replace Abstract: Long-tailed class imbalance remains a fundamental obstacle in semantic segmentation of high-resolution remote-sensing imagery, where dominant classes shape learned representations and rare classes are systematically under-segmen…
arXiv cs.CV
TIER_1·Zhongjie Duan, Hong Zhang, Yingda Chen·
arXiv:2604.24351v1 Announce Type: cross Abstract: Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats,…
Controlling illumination in images is essential for photography and visual content creation. While closed-source models have demonstrated impressive illumination control, open-source alternatives either require heavy control inputs like depth maps or do not release their data and…
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it di…
arXiv:2604.22379v1 Announce Type: new Abstract: Recent advances in distilling expensive diffusion models into efficient few-step generators show significant promise. However, these methods typically demand substantial computational resources and extended training periods, limitin…
arXiv cs.CV
TIER_1·Mingxing Rao, Bowen Qu, Daniel Moyer·
arXiv:2509.25003v2 Announce Type: replace-cross Abstract: Membership inference attacks (MIAs) against Diffusion Models (DMs) raise pressing privacy concerns by revealing whether a sample was part of the training set. While existing methods typically rely on measuring reconstructi…
Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and …
Recent advances in distilling expensive diffusion models into efficient few-step generators show significant promise. However, these methods typically demand substantial computational resources and extended training periods, limiting accessibility for resource-constrained researc…
Diffusion-based generative models have reformed generative AI, and have enabled new capabilities in the science domain, for example, generating 3D structures of molecules. Due to the intrinsic problem structure of certain tasks, there is often a symmetry in the system, which iden…
Selecting an appropriate kernel is a central challenge in kernel-based spectral methods. In \emph{Kernelized Diffusion Maps} (KDM), the kernel determines the accuracy of the RKHS estimator of a diffusion-type operator and hence the quality and stability of the recovered eigenfunc…
Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex …
**Stability AI** launched **Stable Diffusion 3 Medium** with models ranging from **450M to 8B parameters**, featuring the MMDiT architecture and T5 text encoder for image text rendering. The community has shown mixed reactions following the departure of key researchers like Emad …
**Over 2500 new community members joined following Soumith Chintala's shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlight is the detailed paper release of **Stable Diffusion 3 (SD3)**, showcasing advanced text-in-image control and complex …
Hacker News — AI stories ≥50 points
TIER_1·benanne·
<p>The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episo…