PulseAugur / Brief
LIVE 00:05:13

Brief

last 24h
[50/124] 185 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. RESEARCH · Mastodon — sigmoid.social · · [2 sources]

    How can you measure security in # ML systems? Maybe similarly to the way we measure security in software systems. # swsec # appsec BIML wrote about this in a ne

    Berryville IML has released a new report detailing methods for measuring security in machine learning systems, drawing parallels to established software security practices. The report, available for free under a creative commons license, aims to provide actionable insights for applied ML security. AI

    How can you measure security in # ML systems? Maybe similarly to the way we measure security in software systems. # swsec # appsec BIML wrote about this in a ne

    IMPACT Provides a framework for assessing and improving the security posture of machine learning systems.

  2. RESEARCH · Mastodon — fosstodon.org · · [2 sources]

    "The developers I talked to agreed that LLMs will stick around and play a role in programming in the future in some fashion, but worried about how the industry

    Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos Preview and GPT-5.5 are outperforming these trends, though their exact capabilities are still being measured due to near-perfect success rates on current benchmarks. This rapid progress challenges existing testing methodologies, as models are pushing the limits of token capacity and agent scaffolding, making it difficult to accurately assess their performance and potential deterioration at scale. AI

    IMPACT Rapid advancements in frontier models may necessitate new evaluation methods and could accelerate the adoption of AI in complex domains.

  3. RESEARCH · arXiv stat.ML · · [2 sources]

    Bayesian Surrogate Training on Multiple Data Sources: A Hybrid Modeling Strategy

    Researchers have developed new strategies for training surrogate models by integrating data from multiple sources, including simulations and real-world measurements. One approach involves training separate models for each data type and then combining their predictions, while another trains a single model incorporating both data types. These hybrid methods aim to improve predictive accuracy and coverage, and to identify potential issues within existing simulation models, ultimately aiding in system understanding and future development. AI

    IMPACT Enhances AI model training by enabling more accurate predictions and better diagnostics through multi-source data integration.

  4. RESEARCH · MarkTechPost · · [2 sources]

    Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

    Researchers have introduced AntAngelMed, a 103 billion parameter open-source medical language model. It utilizes a Mixture-of-Experts (MoE) architecture, activating only 6.1 billion parameters per query for enhanced efficiency. This design allows it to match the performance of a 40 billion parameter dense model while achieving speeds over 200 tokens per second on H20 hardware. The model supports a 128K context length and has undergone a three-stage training process including pre-training on medical corpora, supervised fine-tuning, and reinforcement learning. AI

    Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

    IMPACT Provides a highly efficient, open-source LLM for medical applications, potentially accelerating research and development in the healthcare sector.

  5. RESEARCH · arXiv stat.ML · · [2 sources]

    Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

    Researchers have introduced Pion, a novel spectrum-preserving optimizer designed for training large language models. Unlike traditional additive optimizers like Adam, Pion utilizes orthogonal transformations to update weight matrices, maintaining their singular values and spectral norm. This approach offers a stable and competitive alternative for both LLM pretraining and finetuning, as demonstrated by empirical results. AI

    IMPACT Introduces a new optimization method that could improve LLM training stability and performance.

  6. RESEARCH · arXiv stat.ML · · [2 sources]

    A proximal gradient algorithm for composite log-concave sampling

    Researchers have developed a new proximal gradient algorithm designed to sample from composite log-concave distributions. This algorithm assumes access to gradient evaluations for one part of the distribution and a restricted Gaussian oracle for the other. The proposed method achieves state-of-the-art iteration counts for sampling, matching previous results for simpler cases and extending to non-log-concave distributions and non-smooth functions. AI

    IMPACT Introduces a novel sampling technique that could improve efficiency in statistical modeling and machine learning applications.

  7. RESEARCH · MarkTechPost · · [3 sources]

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    Thinking Machines Lab, an AI research lab, has introduced a new class of systems called interaction models designed to overcome the limitations of traditional turn-based AI. These models feature a native multimodal architecture that allows for real-time human-AI collaboration, processing audio, video, and text inputs and outputs in continuous 200ms micro-turns. This approach enables the AI to listen, interrupt, and react proactively, moving beyond static chat interfaces to a more dynamic and integrated interaction. AI

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    IMPACT Moves AI interaction beyond static chat interfaces to real-time, multimodal collaboration.

  8. RESEARCH · The Register — AI · · [2 sources]

    Microsoft researchers find AI models and agents can't handle long-running tasks

    Microsoft researchers have identified a significant limitation in current AI models and agents: their inability to effectively manage long-running tasks. These systems struggle with tasks that require sustained operation or memory over extended periods. This deficiency impacts their potential for complex, multi-stage operations and highlights an area for future AI development. AI

    Microsoft researchers find AI models and agents can't handle long-running tasks

    IMPACT Highlights a current limitation in AI capabilities, suggesting that complex, long-term operations are not yet feasible for current models and agents.

  9. RESEARCH · Alignment Forum · · [2 sources]

    Clarifying the role of the behavioral selection model

    This post clarifies the behavioral selection model, emphasizing why distinguishing between AI motivations is crucial for predicting deployment outcomes. While the model is useful for short-to-medium term predictions, it omits significant factors like reflection and deliberation, which could be dominant drivers of AI motivations. The author presents an updated causal graph to illustrate how cognitive patterns that ensure their own influence during training are more likely to persist in deployment. AI

    Clarifying the role of the behavioral selection model

    IMPACT Clarifies theoretical frameworks for understanding AI behavior, potentially aiding in the development of safer AI systems.

  10. RESEARCH · arXiv stat.ML · · [2 sources]

    Model-based Bootstrap of Controlled Markov Chains

    Researchers have developed a new model-based bootstrap method for controlled Markov chains, particularly useful in offline reinforcement learning scenarios where the data-generating policy is unknown. This technique establishes distributional consistency for transition estimators and extends to policy evaluation and recovery, providing asymptotically valid confidence intervals for value and Q-functions. Experimental results on the RiverSwim problem demonstrate that the proposed confidence intervals offer improved calibration and coverage compared to existing methods, especially with limited data. AI

    IMPACT Improves confidence interval calibration for offline reinforcement learning, aiding in more reliable policy evaluation and recovery.

  11. RESEARCH · arXiv cs.CL · · [2 sources]

    Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

    Researchers have introduced MedHopQA, a new benchmark designed to evaluate the multi-hop reasoning capabilities of large language models in the biomedical domain. This benchmark consists of 1,000 expert-curated question-answer pairs, each requiring information synthesis from two distinct Wikipedia articles, with answers provided in free text. The MedHopQA dataset was presented as a shared task at BioCreative IX, attracting 48 submissions from 13 teams, and highlighted the effectiveness of retrieval-augmented generation strategies for improved performance. AI

    IMPACT Establishes a new standard for evaluating complex biomedical reasoning in LLMs, pushing for more robust and contamination-resistant benchmarks.

  12. RESEARCH · arXiv stat.ML · · [2 sources]

    Online Learning-to-Defer with Varying Experts

    Researchers have developed a new online algorithm for Learning-to-Defer (L2D) methods, designed to handle streaming data and dynamic expert availability. This algorithm is the first of its kind for multiclass classification with bandit feedback and a varying pool of experts. It offers theoretical regret guarantees and has demonstrated effectiveness in experiments on both synthetic and real-world datasets, extending L2D capabilities to more complex, dynamic environments. AI

    IMPACT Introduces a novel algorithmic approach for dynamic expert selection in machine learning, potentially improving efficiency in real-time decision-making systems.

  13. RESEARCH · arXiv cs.CL · · [2 sources]

    Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control

    Researchers are exploring the use of large language models (LLMs) for enhancing safety in air traffic control (ATC) and around non-towered airports. One study proposes a vision-language model approach to analyze radio communications, weather data, and flight trajectories for safety assessments, achieving high F1 scores with open-source models. Another paper introduces a safety-oriented evaluation framework that highlights the critical need for consequence-aware metrics, as standard accuracy measures can mask severe risks in ATC operations. AI

    IMPACT LLM analysis could improve safety and efficiency in critical air traffic control operations.

  14. RESEARCH · arXiv stat.ML · · [3 sources]

    Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data Splitting

    Two new research papers introduce advanced conformal prediction techniques to improve the accuracy and efficiency of prediction sets. The first paper, "Multi-Variable Conformal Prediction (MCP)," extends conformal prediction to handle vector-valued score functions, allowing for more flexible prediction set shapes without sacrificing coverage guarantees and eliminating the need for data splitting. The second paper, "Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization," presents the Minimax Optimization Predictive Inference (MOPI) framework, which optimizes over a flexible class of set-valued mappings to achieve superior shape adaptivity and more efficient prediction sets, even for complex conditional distributions. AI

    IMPACT These new methods could lead to more reliable and efficient predictive models in machine learning by improving the calibration of prediction sets.

  15. RESEARCH · arXiv stat.ML · · [2 sources]

    Optimal Policy Learning under Budget and Coverage Constraints

    Researchers have developed a new framework for optimal policy learning that addresses combined budget and minimum coverage constraints. The study reveals a knapsack-type structure within the problem, allowing the optimal policy to be defined by an affine threshold rule. Two algorithms, Greedy-Lagrangian (GLC) and rank-and-cut (RC), are proposed to implement this approach, with GLC offering close approximation and RC showing near-optimality under specific conditions. AI

    IMPACT Introduces a novel algorithmic approach for optimizing resource allocation in policy learning scenarios.

  16. RESEARCH · arXiv stat.ML · · [2 sources]

    Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

    Researchers have developed a new method called Self-Supervised Laplace Approximation (SSLA) to directly approximate the posterior predictive distribution in Bayesian models. This approach draws inspiration from self-training techniques and quantifies predictive uncertainty by refitting the model on its own predictions. The SSLA method offers a deterministic, sampling-free approximation that outperforms classical Laplace approximations in predictive calibration for regression tasks, including Bayesian neural networks, while maintaining computational efficiency. AI

    IMPACT Offers a more computationally efficient and accurate method for assessing uncertainty in Bayesian models, potentially improving reliability in AI applications.

  17. RESEARCH · arXiv stat.ML · · [2 sources]

    Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

    Researchers have developed a new method to improve the efficiency of training neural likelihood surrogates for stochastic process models. By augmenting the standard loss function with exact score information and adaptive weighting, the approach significantly reduces the computational cost associated with parameter inference. This technique demonstrates improved surrogate quality and can achieve performance comparable to a tenfold increase in training data with only a marginal increase in training time. AI

    IMPACT Reduces computational cost for parameter inference in stochastic process models, potentially accelerating research and development in fields relying on such models.

  18. RESEARCH · arXiv cs.CV · · [2 sources]

    EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

    Researchers have developed two new frameworks for improving 3D hand pose estimation from egocentric camera views. EgoForce utilizes a differentiable forearm representation and a unified transformer to achieve state-of-the-art accuracy across various camera types, reducing MPJPE by up to 28%. EgoEV-HandPose, on the other hand, employs stereo event cameras and a novel KeypointBEV fusion module to jointly estimate bimanual hand poses and recognize gestures, achieving an MPJPE of 30.54mm and 86.87% gesture recognition accuracy. Both methods aim to enhance applications in AR/VR and human-computer interaction by providing more robust and accurate hand tracking. AI

    IMPACT These advancements in egocentric hand tracking could significantly improve the realism and interactivity of AR/VR experiences and human-computer interfaces.

  19. RESEARCH · arXiv stat.ML · · [2 sources]

    Approximation Theory of Laplacian-Based Neural Operators for Reaction-Diffusion System

    Researchers have developed a new theoretical framework for neural operators, a type of AI model used to learn solutions for complex systems like partial differential equations. This work specifically addresses the approximation analysis for nonlinear reaction-diffusion systems, which are crucial for modeling pattern formation. The study establishes explicit error bounds and demonstrates that their proposed Laplacian eigenfunction-based architecture can significantly reduce the parameter complexity required for accurate predictions. AI

    IMPACT Provides a theoretical foundation for using neural operators to model complex physical systems more efficiently.

  20. RESEARCH · Mastodon — sigmoid.social · · [5 sources]

    BIML is proud to release a new study today: No Security Meter for AI # AI # ML # MLsec # security # infosec # swsec # appsec # LLM # AgenticAI https:// berryvil

    Berryville Infrastructure & Machine Learning (BIML) has published a new study highlighting a lack of security metrics for AI systems. The research indicates that current security practices are insufficient to address the unique risks posed by artificial intelligence. This gap in security measurement could hinder the safe and responsible development and deployment of AI technologies. AI

    BIML is proud to release a new study today: No Security Meter for AI # AI # ML # MLsec # security # infosec # swsec # appsec # LLM # AgenticAI https:// berryvil

    IMPACT Highlights a critical gap in AI security, potentially slowing responsible adoption.

  21. RESEARCH · arXiv stat.ML · · [2 sources]

    Random-Set Graph Neural Networks

    Researchers have introduced Random-Set Graph Neural Networks (RS-GNNs) to address uncertainty quantification in graph learning. This new framework models node-level epistemic uncertainty using a belief function formalism. Experiments on nine datasets, including autonomous driving benchmarks, show RS-GNNs offer improved uncertainty estimation capabilities. AI

    IMPACT Improves reliability of graph-based AI systems by quantifying uncertainty in predictions.

  22. RESEARCH · arXiv stat.ML Deutsch(DE) · · [2 sources]

    QDSB: Quantized Diffusion Schrödinger Bridges

    Researchers have introduced Quantized Diffusion Schrödinger Bridges (QDSB), a novel method for learning generative models from unpaired data. QDSB addresses the computational challenges of traditional Schrödinger bridges by quantizing endpoint distributions and using cell-wise sampling to reconstruct the data plan. This approach significantly reduces training time while maintaining sample quality comparable to existing methods. AI

    IMPACT Accelerates generative model training by reducing computational costs and time.

  23. RESEARCH · arXiv stat.ML · · [2 sources]

    LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection

    Researchers have introduced LOFT, a novel framework for low-rank orthogonal parameter-efficient fine-tuning (PEFT). This method explicitly separates the adaptation subspace from the transformation applied within it, offering a unified approach that encompasses existing orthogonal PEFT techniques. LOFT's key innovation lies in its task-aware support selection strategy, informed by downstream training signals, which improves the efficiency-performance trade-off. AI

    IMPACT Introduces a new method to improve the efficiency and performance of fine-tuning large models, potentially reducing computational costs for adaptation.

  24. RESEARCH · arXiv stat.ML · · [2 sources]

    Variance-aware Reward Modeling with Anchor Guidance

    Researchers have developed a new framework called Anchor-guided Variance-aware Reward Modeling to address limitations in standard reward models when dealing with diverse human preferences. This method enhances existing Gaussian reward models by introducing two response-level anchor labels, resolving a fundamental non-identifiability issue. The framework has demonstrated improved performance in reward modeling and downstream Reinforcement Learning from Human Feedback (RLHF) tasks across simulations and real-world datasets. AI

    IMPACT Enhances reward modeling for RLHF, potentially improving the alignment and performance of AI systems trained on diverse human feedback.

  25. RESEARCH · arXiv stat.ML · · [2 sources]

    Minimax Rates and Spectral Distillation for Tree Ensembles

    Researchers have developed a new spectral perspective to better understand tree ensemble algorithms like random forests and gradient boosting machines. This approach reveals that the decay rate of eigenvalues in the induced kernel operator dictates the statistical convergence for random forest regression. The findings also enable the creation of compressed tree ensembles, yielding significantly smaller models that retain competitive predictive accuracy, outperforming current methods for forest pruning and rule extraction. AI

    IMPACT Advances understanding of widely used tree ensemble models and enables more efficient model compression for resource-constrained environments.

  26. RESEARCH · arXiv cs.CL · · [2 sources]

    Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

    Researchers have developed a three-stage retrieval system for multi-turn conversations, enhancing accuracy in information retrieval tasks. The system first refines context-dependent queries using a fine-tuned Qwen 2.5 7B model to create standalone questions. It then employs a hybrid search combining BM25 and dense vector retrieval, fused with Reciprocal Rank Fusion, before a cross-encoder model reranks the results for improved precision. This approach achieved a notable nDCG@5 score in a recent SemEval task, outperforming many other systems. AI

    IMPACT Improves multi-turn conversational search accuracy by combining advanced query rewriting, hybrid search, and cross-encoder reranking.

  27. RESEARCH · arXiv stat.ML · · [2 sources]

    Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

    Researchers have developed a theoretical framework for sparse Bayesian Kolmogorov-Arnold Networks (KANs). Their work establishes statistical foundations for KANs, demonstrating that these networks can achieve near-minimax posterior contraction rates. The analysis shows that KANs can adapt to unknown function smoothness and avoid the curse of dimensionality by controlling approximation complexity through width and parameter sparsity, rather than depth. AI

    IMPACT Provides theoretical grounding for KANs, potentially influencing future neural network architectures and their statistical analysis.

  28. RESEARCH · arXiv stat.ML · · [2 sources]

    Learning U-Statistics with Active Inference

    Researchers have developed a new active inference framework for U-statistics, aiming to improve estimation efficiency when labeling data is expensive. This approach selectively queries informative labels within a fixed budget, building upon augmented inverse probability weighting U-statistics. The framework is also extended to U-statistic-based empirical risk minimization, showing significant gains in efficiency and maintaining target coverage in experiments. AI

    IMPACT This research could lead to more efficient data labeling strategies in machine learning applications where data acquisition is costly.

  29. RESEARCH · arXiv stat.ML · · [2 sources]

    A Composite Activation Function for Learning Stable Binary Representations

    Researchers have developed a new activation function called Heavy Tailed Activation Function (HTAF) to address the challenges of training neural networks with binary representations. HTAF is a smooth approximation of the Heaviside function, designed to maintain a large gradient mass for stable optimization. This new function enables the stable training of various neural network types, including Spiking Neural Networks and Binary Neural Networks, using gradient-based methods. The researchers also introduced Implicit Concept Bottleneck Models (ICBMs), which utilize HTAF to create interpretable image models with discrete feature representations, achieving performance comparable to or better than existing models. AI

    IMPACT Enables more efficient and interpretable neural network training for specific applications.

  30. RESEARCH · MarkTechPost · · [2 sources]

    Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

    Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become permanently inactive during training. The new optimizer, demonstrated with a 1.1B parameter pretraining experiment, achieves state-of-the-art performance on the modded-nanoGPT speedrun benchmark and has its code released publicly. AI

    Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

    IMPACT Fixes a critical flaw in a widely-used optimizer, potentially improving training efficiency and model performance for large-scale models.

  31. RESEARCH · arXiv stat.ML · · [2 sources]

    Post-ADC Inference: Valid Inference After Active Data Collection

    Researchers have introduced a new framework called post-ADC inference to address the challenges of statistical validity when data collected through active data collection (ADC) is reused for subsequent inferential tasks. This method accounts for biases introduced by both the data collection process and data-dependent target construction. The framework aims to provide valid p-values and confidence intervals, applicable to various ADC processes without strict assumptions on the underlying black-box function or surrogate models. AI

    IMPACT Enables more reliable statistical analysis in machine learning workflows that use active data collection.

  32. RESEARCH · arXiv stat.ML · · [2 sources]

    Adaptive Calibration in Non-Stationary Environments

    Researchers have developed new online prediction algorithms designed to adapt their calibration error based on the degree of non-stationarity in the environment. These algorithms aim to perform optimally across a spectrum from stable, i.i.d. settings to highly adversarial ones. The proposed methods achieve adaptive calibration guarantees, matching optimal rates in stationary cases and recovering known bounds for adversarial regimes. AI

    IMPACT Introduces adaptive algorithms for online predictions, potentially improving AI system performance in dynamic environments.

  33. RESEARCH · arXiv stat.ML · · [2 sources]

    FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

    Researchers have developed FibQuant, a novel vector quantization method designed to significantly compress the key-value (KV) cache used in large language models. This technique aims to reduce the memory traffic associated with long-context inference by replacing scalar quantization with a more efficient vector-based approach. Experiments show FibQuant can achieve substantial compression ratios, such as 34x on GPT-2 small KV caches while maintaining high fidelity, and demonstrates improved perplexity compared to existing methods on models like TinyLlama-1.1B. AI

    IMPACT Enables more efficient long-context inference by reducing KV-cache memory requirements, potentially lowering operational costs and increasing model accessibility.

  34. RESEARCH · arXiv stat.ML · · [2 sources]

    Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors

    Researchers have developed a "Spatial Adapter," a novel post-hoc layer designed to enhance frozen predictive models. This adapter efficiently learns a structured spatial representation of a model's residual field and its covariance without altering the original model's parameters. The technique utilizes a spatially regularized orthonormal basis and per-sample scores, enabling kriging-style spatial prediction and uncertainty quantification for downstream applications. AI

    IMPACT Introduces a parameter-efficient method to improve spatial prediction and uncertainty quantification in existing models.

  35. RESEARCH · arXiv stat.ML · · [2 sources]

    Causal Algorithmic Recourse: Foundations and Methods

    Researchers have developed a new causal framework for algorithmic recourse, addressing the limitations of existing methods that treat recourse outcomes as static counterfactuals. This novel approach models recourse as a dynamic process, accounting for repeated decisions and potential changes in latent conditions for an individual. The framework introduces post-recourse stability conditions, enabling recourse inference from observational data alone, and proposes copula-based and distribution-free algorithms for practical application. AI

    IMPACT Enhances AI system trustworthiness by providing more robust methods for individuals to understand and potentially reverse adverse decisions.

  36. RESEARCH · arXiv stat.ML · · [2 sources]

    Causal Bias Detection in Generative Artifical Intelligence

    Researchers have developed a new framework for detecting causal bias in generative AI systems. This methodology extends causal inference principles to address the unique complexities of generative models, which differ from standard machine learning by implicitly constructing their own causal mechanisms. The approach allows for a granular quantification of fairness impacts across various causal pathways and the model's replacement of real-world mechanisms. The paper demonstrates its utility by analyzing race and gender bias in large language models using diverse datasets. AI

    IMPACT Provides a new theoretical framework and practical tools for identifying and quantifying bias in generative AI, crucial for fair and ethical deployment.

  37. RESEARCH · arXiv stat.ML · · [2 sources]

    Causal Fairness for Survival Analysis

    Researchers have developed a new causal framework to analyze fairness in time-to-event (TTE) analysis, a type of statistical modeling often used in healthcare and other high-stakes domains. This framework allows for the decomposition of survival disparities into direct, indirect, and spurious pathways, offering a more understandable explanation for why and how these disparities emerge over time. The non-parametric approach involves formalizing assumptions with graphical models, recovering survival functions, and applying causal reduction theorems for efficient estimation. The method was applied to study racial disparities in intensive care unit (ICU) outcomes. AI

    IMPACT Provides a novel method for understanding and mitigating bias in temporal AI models, crucial for equitable decision-making in sensitive applications.

  38. RESEARCH · arXiv stat.ML · · [2 sources]

    $\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search

    Researchers have developed a new algorithm for identifying $\varepsilon$-good actions in fixed-budget Monte Carlo Tree Search (MCTS). This algorithm is $\varepsilon$-agnostic, meaning it does not require the error tolerance $\varepsilon$ as an input but still provides instance-dependent error bounds. The misidentification probability decays exponentially with the budget, and the analysis offers new guarantees for specific MCTS methods while highlighting differences in hardness compared to standard K-armed bandits. AI

    IMPACT Introduces a novel algorithmic approach for decision-making under uncertainty in search algorithms, potentially improving planning efficiency in AI systems.

  39. RESEARCH · arXiv stat.ML · · [2 sources]

    Extending Kernel Trick to Influence Functions

    Researchers have developed a new dual representation for influence functions, which can efficiently estimate changes in model parameters and outputs. This method scales with dataset size rather than model size, offering an advantage for large models where traditional influence function evaluation is infeasible. However, the approach is currently limited to linearizable models and requires substantial matrix materialization. AI

    IMPACT Introduces a more efficient method for analyzing model behavior, potentially aiding in debugging and understanding large-scale machine learning models.

  40. RESEARCH · arXiv stat.ML · · [2 sources]

    Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty

    Researchers have developed a new framework for Probabilistic Partial Least Squares (PPLS) that addresses practical limitations in existing fitting pipelines. This framework combines noise pre-estimation, constrained likelihood optimization, and prediction calibration, offering an end-to-end solution. The method utilizes exact Stiefel-manifold optimization and noise-subspace estimation, achieving improved accuracy and calibrated uncertainty across various benchmarks, including multi-omics datasets. AI

    IMPACT Introduces a novel statistical method for two-view learning, potentially improving accuracy and uncertainty calibration in multi-omics data analysis.

  41. RESEARCH · arXiv stat.ML · · [2 sources]

    Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes

    Researchers have developed a new interpretable kernel for Gaussian Processes that can model rotational anisotropy in 3D spatial fields. This kernel explicitly parameterizes principal length-scales and orientation, offering a more intuitive approach than standard axis-aligned methods or generic SPD metrics. The method was tested on synthetic data and a material-density dataset, showing improved predictive performance and the ability to reveal complex anisotropy not captured by existing techniques. AI

    IMPACT Introduces a more interpretable method for modeling complex spatial data, potentially improving applications in fields requiring precise directional analysis.

  42. RESEARCH · arXiv stat.ML · · [2 sources]

    Variational predictive resampling

    Researchers have introduced Variational Predictive Resampling (VPR), a new method designed to improve the accuracy of Bayesian posterior sampling. VPR leverages variational inference's predictive capabilities within a resampling framework to better approximate the true posterior distribution. This approach aims to overcome the limitations of standard variational inference, which can sometimes produce overly concentrated approximations that miss important posterior dependencies. Experiments show VPR significantly enhances uncertainty quantification and recovers missed posterior dependencies, while remaining computationally efficient compared to traditional MCMC methods. AI

    IMPACT Improves uncertainty quantification in Bayesian models, potentially leading to more reliable AI systems that require robust uncertainty estimates.

  43. RESEARCH · arXiv stat.ML · · [2 sources]

    Uniform Scaling Limits in AdamW-Trained Transformers

    Researchers have published a paper detailing uniform scaling limits in transformers trained with the AdamW optimizer. The study models hidden-state dynamics as an interacting particle system, demonstrating convergence to a forward-backward system of ODEs. This convergence rate is dependent on the transformer's depth and number of heads, with specific mathematical bounds derived that are independent of token count and embedding dimension. AI

    IMPACT Provides theoretical insights into transformer scaling, potentially informing future model design and training strategies.

  44. RESEARCH · 36氪 (36Kr) 中文(ZH) · · [2 sources]

    Shanghai AI Laboratory Joint Team Overcomes Difficulties in Stable Preparation of Core Chip Material Photoresist

    The Shanghai Artificial Intelligence Laboratory, in collaboration with other institutions, has developed a new method for creating high-purity KrF photoresist resin, a critical material for chip manufacturing. This AI-driven approach, utilizing the "Sheng" scientific large model and discovery platform, breaks reliance on foreign suppliers and offers a standardized, rapidly iterative path for producing advanced photoresist materials. This breakthrough is part of a national initiative aimed at advancing China's capabilities in core chip material production. AI

    IMPACT Establishes a new AI-driven pathway for critical chip material production, reducing foreign dependency and enabling faster iteration.

  45. RESEARCH · arXiv cs.CL · · [2 sources]

    Infinite Mask Diffusion for Few-Step Distillation

    Researchers have developed new techniques for improving the efficiency of training large language models (LLMs). One method, Step Rejection Fine-Tuning (SRFT), leverages unsuccessful training trajectories by assessing the correctness of each step, allowing models to learn from errors without repeating them. This approach improved resolution rates on SWE-bench tasks by 3.7%. Another development, Infinite Mask Diffusion Model (IMDM), addresses factorization errors in Masked Diffusion Models (MDMs) by introducing a stochastic infinite-state mask. IMDM demonstrates superior few-step generation capabilities and surpasses existing methods on LM1B and OpenWebText datasets when combined with distillation. AI

    IMPACT These new training techniques could lead to more capable and efficient LLMs, improving performance on complex tasks and reducing training costs.

  46. RESEARCH · Hugging Face Daily Papers · · [2 sources]

    Is Your Driving World Model an All-Around Player?

    Researchers have introduced WorldLens, a new benchmark designed to evaluate the realism and behavioral fidelity of driving world models. Current models often excel in either visual realism or physical consistency but not both, creating a gap in how their performance is assessed. WorldLens addresses this by measuring aspects like pixel quality, 4D geometry, closed-loop driving, and human perceptual alignment across 24 dimensions. Evaluations using WorldLens revealed that no single model performs optimally across all criteria, highlighting the need for more comprehensive assessment tools. AI

    IMPACT Establishes a new standard for evaluating driving world models, pushing for improvements in both visual and behavioral realism.

  47. RESEARCH · Hugging Face Daily Papers · · [2 sources]

    Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

    Researchers have published a paper detailing concentration phenomena in mean-field transformers, specifically analyzing their behavior at low temperatures during inference. The study uses a mean-field continuity equation to model token evolution and demonstrates that token distributions rapidly concentrate under a projection map induced by the transformer's matrices. This concentration remains metastable for moderate times, with the Wasserstein distance scaling in relation to temperature and inference time. AI

    IMPACT Provides theoretical insights into transformer behavior, potentially informing future model design and optimization.

  48. RESEARCH · Hugging Face Daily Papers · · [2 sources]

    Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges

    Researchers have developed a novel approach to solve multi-agent path finding (MAPF) problems by reformulating them as a specific type of multi-marginal optimal transport (MMOT) problem. This method leverages a Markovian structure to reduce the computational complexity of MMOT to a polynomial-sized linear program. For large-scale applications, the approach is further adapted using Schrödinger bridges, which provide an iterative, Sinkhorn-type solution that significantly reduces complexity while maintaining near-optimal results. AI

    IMPACT Introduces a more efficient method for multi-robot coordination, potentially impacting logistics and autonomous systems.

  49. RESEARCH · Hugging Face Daily Papers · · [2 sources]

    Variational Inference for Lévy Process-Driven SDEs via Neural Tilting

    Researchers have developed a new neural exponential tilting framework for variational inference in Lévy-driven stochastic differential equations. This method addresses the intractability of Bayesian inference for processes with heavy tails and discontinuities, which are crucial for modeling extreme events in fields like finance and AI safety. The framework uses neural networks to reweight the Lévy measure, preserving jump structures while remaining computationally efficient and enabling more reliable posterior inference than Gaussian-based methods. AI

    IMPACT Enables more reliable modeling of extreme events and heavy tails, crucial for safety-critical AI systems.

  50. RESEARCH · Hugging Face Daily Papers · · [2 sources]

    V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction

    Researchers have introduced V4FinBench, a new benchmark dataset designed to evaluate AI models on corporate bankruptcy prediction. The dataset comprises over one million company-year records from Visegràd Group economies, featuring 131 financial and non-financial features across six prediction horizons. Initial evaluations show that finetuned TabPFN models perform comparably to or better than gradient boosting methods, while Llama-3-8B models lag behind on key metrics. AI

    IMPACT Provides a large-scale, realistic dataset for advancing AI in financial risk assessment and bankruptcy prediction.