Brief

last 24h

[50/1235] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 16h

NGram-MoSE: Efficient Remote Sensing Super-Resolution via N-Gram Context and Mixture-of-Experts

Researchers have developed NGram-MoSE, a new Transformer architecture for efficient super-resolution of remote sensing imagery. This model addresses the trade-off between spatial resolution and acquisition frequency in remote sensing data. NGram-MoSE utilizes N-Gram Context Injection for better local consistency and a Mixture-of-Experts design for scalable capacity with reduced computational cost. AI

IMPACT Introduces a more efficient method for enhancing remote sensing imagery, potentially improving downstream applications in environmental monitoring and disaster management.
- NGram-MoSE
- Transformer
TOOL · arXiv cs.CV English(EN) · 16h

Property-Informed Diffusion-Based Text-to-Microstructure Generation

Researchers have developed a novel diffusion-based model capable of generating 3D metamaterial microstructures from textual descriptions. This approach aims to simplify the design process by translating semantic and physical properties specified in text directly into plausible 3D structures. The model employs a dual alignment strategy to ensure consistency between the generated designs and the input prompts, offering potential for interactive material discovery. AI

IMPACT Enables rapid, text-driven design of complex 3D materials, potentially accelerating metamaterial innovation.
- arXiv
- Diffusion-Based Text-to-Microstructure Generation
TOOL · arXiv cs.LG (CA) · 16h

Quantum latent distributions in deep generative models

Researchers have theoretically demonstrated that quantum latent distributions can enhance deep generative models by enabling them to produce data distributions that classical models cannot efficiently replicate. Their work suggests that quantum interference statistics contribute to improved generative performance, particularly on datasets with quantum properties or molecular structures. Experiments using simulated and real photonic quantum processors on a synthetic quantum dataset and the QM9 molecular dataset support these findings, indicating a potential role for quantum processors in advancing generative AI capabilities. AI

IMPACT Quantum processors may offer new avenues for generative models to capture complex data distributions, potentially improving performance on specialized tasks.
- QM9 molecular dataset
- William Clements
TOOL · arXiv cs.LG English(EN) · 16h

SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forecasts over Switzerland

Researchers have developed SwAIther-Precip, a new framework designed to improve the resolution and accuracy of AI-driven precipitation forecasts. This method specifically addresses biases in global AI weather models that are dependent on the forecast lead time. By correcting these biases before applying a diffusion-based super-resolution model, SwAIther-Precip can generate kilometer-scale precipitation fields with significantly improved accuracy and spatial fidelity. AI

IMPACT Enhances the utility of global AI weather models for localized, high-resolution precipitation forecasting.
TOOL · arXiv cs.LG English(EN) · 16h

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

Researchers have introduced IGenBench, a new benchmark designed to evaluate the reliability of text-to-infographic generation models. The benchmark consists of 600 test cases across 30 infographic types, with an automated evaluation framework that uses multimodal large language models to assess accuracy. Initial testing on ten state-of-the-art text-to-image models revealed significant challenges, particularly with data-related aspects, highlighting a gap between perceived aesthetic quality and actual functional correctness. AI

IMPACT Highlights critical limitations in current text-to-infographic models, particularly concerning data accuracy, guiding future development.
- Yinghao Tang
- IGenBench
TOOL · arXiv cs.CV English(EN) · 16h

Beyond Raw Signals: Undecoded Generative Latents as Privileged Synthetic Data

Researchers have developed a new method called Direct Latent Augmentation (DLA) to improve multimodal vision models. DLA bypasses the inefficient decode-encode loop by using undecoded generative latents directly as privileged information. To transfer this knowledge to unimodal models, they introduced Multilayer Explicit Simulated Synesthesia (MESSy), which uses a predictive objective for safer internalization of physical priors. This approach significantly outperforms traditional methods, creating accurate unimodal students with latent structures aligned to unobserved physical properties. AI

IMPACT This research could lead to more efficient training of vision models by reducing reliance on paired datasets and improving knowledge transfer.
- Direct Latent Augmentation (DLA)
- Multilayer Explicit Simulated Synesthesia (MESSy)
TOOL · arXiv cs.CV English(EN) · 16h

One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling

Researchers have developed a new framework called One Stone, Three Birds (OSTB) to address challenges in deploying vision-language models (VLMs) when target annotations are scarce. OSTB uses self-adaptive optimal transport to estimate a consensus sample-to-class structure from a pool of frozen VLMs. This learned structure then informs model selection, target adaptation, and ensembling, improving performance across various benchmarks without updating VLM parameters. AI

IMPACT Provides a novel method for VLM deployment in low-data scenarios, potentially improving efficiency and accuracy in real-world applications.
- Optimal transport
- Vision-language models
TOOL · arXiv cs.LG English(EN) · 16h

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

Researchers have introduced Energy-Regularized Spatial Masking (ERSM), a new framework designed to improve the robustness and interpretability of vision models. ERSM treats feature selection as a differentiable energy minimization problem, assigning each visual token an energy value based on its importance and spatial coherence. This approach allows models to autonomously find an optimal balance of information density, leading to emergent sparsity and enhanced performance in robustness tests without explicit supervision. AI

IMPACT Enhances vision model interpretability and robustness, potentially leading to more reliable AI systems in critical applications.
- Tom Devynck
- Energy-Regularized Spatial Masking
TOOL · arXiv cs.CV English(EN) · 16h

HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling

Researchers have developed HACK++, a novel framework designed to significantly reduce the memory and computational overhead of Visual Autoregressive (VAR) models. By analyzing attention heads and categorizing them into 'Contextual' and 'Structural' types, HACK++ implements a training-free compression method. This approach allows for adaptive budget allocation based on head function and reliance on historical scales, leading to substantial reductions in attention and cache budgets without compromising generation quality. AI

IMPACT Reduces memory and compute for visual autoregressive models, potentially enabling larger-scale deployments and faster inference.
TOOL · arXiv cs.CV English(EN) · 16h

Revisiting Articulated Parts Perception in Robot Manipulation

Researchers have introduced a new representation called Geometric Primary Structure (GPS) for understanding articulated parts in robotic manipulation. This method aims to balance scalability and quality by abstracting the geometric structure of object parts. An efficient VR-based annotation system was used to collect a dataset of 41,000 frames for 234 objects, enabling the training of a generalizable GPS model that achieved a 73% success rate in object manipulation tasks. AI

IMPACT Introduces a novel representation and efficient data collection method that could improve robot dexterity and adaptability in handling objects with movable parts.
- Geometric Primary Structure
- robot manipulation
TOOL · arXiv cs.LG English(EN) · 16h

Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

A new research paper published on arXiv introduces a principled framework for understanding multi-task learning outcomes. The study identifies a critical requirement for gradient-based task affinity estimation: tasks must share training instances for gradient conflicts to accurately reveal relationships. Below 30% sample overlap, gradient correlations become indistinguishable from noise, while above 40%, they reliably recover known biological structure. This finding offers a potential explanation for the inconsistent results observed in multi-task learning over the past seven years, as many standard benchmarks fall below the meaningful threshold. AI

IMPACT Identifies a fundamental requirement for improving multi-task learning performance and reliability.
- arXiv
- Bryan Cheng
TOOL · arXiv cs.LG English(EN) · 16h

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

Researchers have uncovered a new relationship between the noise introduced by Stochastic Gradient Descent (SGD) and the curvature of the loss landscape in deep learning models. Their findings indicate that this noise is not directly proportional to the Hessian of the loss, as previously assumed under specific conditions. Instead, the study reveals a more general connection where the SGD noise covariance is related to the expected value of per-sample Hessians, suggesting these two factors approximately commute rather than coincide. AI

IMPACT Provides a more accurate theoretical understanding of SGD noise and its interaction with loss landscape curvature, potentially guiding future optimization algorithm development.
- Stochastic Gradient Descent
- Yikuan Zhang
TOOL · arXiv cs.LG English(EN) · 16h

A Geometric Measure of Linear Separability for Neural Representations

Researchers have developed a new metric called the directional linear separability measure (LSM) to analyze the geometric properties of neural network representations. This measure quantifies how well a target class can be separated from other classes using affine halfspaces, providing a class-wise and asymmetric assessment. LSM is designed to distinguish between changes due to linear reparameterization and those caused by information loss or nonlinear transformations, offering a tool to diagnose class-wise intrusion in deep learning architectures. AI

IMPACT Provides a new quantitative tool for understanding and diagnosing the internal geometry of neural network representations.
- arXiv
TOOL · arXiv cs.LG English(EN) · 16h

Compositional Approximation Can Strictly Outperform Superpositional Approximation

A new research paper explores the theoretical limits of function approximation, demonstrating that compositional methods, such as neural networks, can significantly outperform superpositional methods. The study constructs specific examples where the approximation error gap between these two approaches can be arbitrarily large. This work has implications for understanding the fundamental capabilities of different model architectures in machine learning. AI

IMPACT This theoretical work could inform the design of future AI architectures, potentially leading to more efficient and powerful models.
TOOL · arXiv cs.LG English(EN) · 16h

C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache

Researchers have developed a new method called C$^3$ache to speed up the inference process for World Action Models (WAMs). WAMs are known for their strong generalization capabilities in robotics but are computationally expensive due to a multi-step denoising process. C$^3$ache addresses this by caching and reusing computation residuals across different inference chunks, achieving up to a 2.5x speedup without significantly impacting task success rates. AI

IMPACT Accelerates inference for robotic control models, potentially enabling more complex real-time applications.
TOOL · arXiv cs.LG English(EN) · 16h

GNSS-FM: A Self-Supervised Foundation Model for Daily GNSS Displacement Time Series

Researchers have developed GNSS-FM, a novel self-supervised foundation model designed for analyzing daily Global Navigation Satellite System (GNSS) displacement time series. This model utilizes a dual-stream input combining displacement and velocity data, pre-trained with a masked latent prediction objective. After pre-training on data from over 17,000 GNSS stations, GNSS-FM demonstrated strong performance when fine-tuned for displacement forecasting and seismic step localization, outperforming existing task-specific baselines. AI

IMPACT This self-supervised approach could enable more widespread use of AI in geophysics by overcoming data labeling limitations.
- GNSS-FM
- wav2vec 2.0
TOOL · arXiv cs.LG English(EN) · 16h

Similarity-Distance-Magnitude Activations

Researchers have introduced a new activation function called Similarity-Distance-Magnitude (SDM). This function aims to improve upon the standard softmax by incorporating awareness of similarity to correct predictions, distance from the training distribution, and the existing magnitude of outputs. The SDM estimator, built upon this activation, is designed to enhance interpretability and robustness against distribution shifts, particularly for selective classification tasks in pre-trained language models. AI

IMPACT Introduces a novel activation function that could improve the interpretability and robustness of large language models.
TOOL · arXiv cs.LG English(EN) · 16h

Normality Calibration in Semi-supervised Graph Anomaly Detection

Researchers have developed a new framework called GraphNC to improve semi-supervised graph anomaly detection. This method calibrates normality by leveraging both labeled and unlabeled data, using a teacher model to guide the process. GraphNC incorporates anomaly score distribution alignment and perturbation-based normality regularization to enhance the accuracy and separability of anomaly scores and node representations. AI
- Hezhe Qiao
- GraphNC
TOOL · arXiv cs.LG English(EN) · 16h

VQ-Atom: Semantic Discretization of Local Atomic Environments for Molecular Representation Learning

Researchers have developed VQ-Atom, a novel framework for molecular representation learning that uses vector quantization to assign discrete tokens based on local atomic environments. This approach encodes chemical context more effectively than traditional SMILES representations, leading to improved performance in drug-target interaction prediction. VQ-Atom also accelerates downstream training by replacing continuous atom-level features with reusable discrete tokens, suggesting that token design is a critical factor in molecular machine learning. AI

IMPACT Introduces a new tokenization method that could accelerate AI training for molecular tasks.
- Takayuki Kimura
- VQ-Atom
TOOL · arXiv cs.LG English(EN) · 16h

Quantum feature-map learning with reduced resource overhead

Researchers have developed a new algorithm called Q-FLAIR to reduce the computational resources needed for quantum machine learning feature maps. This method shifts significant workloads to classical computers, enabling the training of complex quantum models with fewer evaluations. Q-FLAIR has demonstrated state-of-the-art performance on classifiers and achieved over 90% accuracy on the MNIST dataset using a real IBM quantum device in just four hours, a feat previously considered unattainable due to hardware demands. AI

IMPACT Enables more complex quantum machine learning models to be trained on near-term quantum hardware.
- MNIST
- IBM
- Q-FLAIR
- Quantum Physics
TOOL · arXiv cs.CV English(EN) · 16h

Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation

Researchers have developed a new framework to assess the reliability of visual predicates used in understanding robotic manipulation. This framework evaluates how well predicates like contact, support, and grasp perform under various degradation conditions such as blur, occlusion, and frame dropping. Experiments on several datasets demonstrated that while static predicates are relatively robust, dynamic and derived predicates are more susceptible to errors, significantly impacting downstream manipulation understanding accuracy. AI

IMPACT Provides a diagnostic layer for improving robotic manipulation understanding by identifying weaknesses in visual predicate recognition under degraded conditions.
- VISOR/EPIC-KITCHENS
- Fatemeh Ziaeetabar
TOOL · arXiv cs.CV English(EN) · 16h

Thinking Without Images: Internalizing Visual Manipulation with On-Policy Self-Distillation

Researchers have developed a new self-distillation framework called Imagine-OPD to improve visual reasoning in AI models. This method trains models to "imagine" relevant visual cues rather than relying on external tools for image cropping, reducing inference time and computational cost. Experiments show Imagine-OPD outperforms existing methods on vision-centric benchmarks while being more efficient. AI

IMPACT This approach could lead to more efficient visual reasoning models, reducing computational costs for AI applications that rely on image analysis.
- Imagine-OPD
- arXiv
TOOL · arXiv cs.CV English(EN) · 16h

Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning

A new survey paper details the emerging field of Test-Time Scaling (TTS) for Multimodal Foundation Models (MFMs). The paper categorizes existing TTS methods into sampling-based, feedback-based, and search-based approaches. It also outlines common applications, benchmarks, and future research directions for enhancing MFM performance in generation and reasoning tasks. AI

IMPACT Provides a structured overview and taxonomy for multimodal AI scaling research, guiding future development.
- Test-Time Scaling
- Multimodal Foundation Models
TOOL · arXiv cs.CV English(EN) · 16h

SSAFE: Simple and Strong AI-Generated Image Detection via Frozen Vision Encoders

Researchers have developed a new method for detecting AI-generated images using pre-trained multimodal vision encoders. This approach leverages the inherent separation of real and synthetic images within the embedding space of these frozen encoders, allowing a simple linear classifier to achieve high accuracy without extensive fine-tuning. The method also incorporates a data curation strategy that uses a compact set of representative generators, resulting in a smaller training dataset that improves robustness against unseen generators and distribution shifts. AI

IMPACT This research offers a more robust and efficient approach to detecting AI-generated images, which could be crucial for maintaining trust in digital media.
TOOL · arXiv cs.LG English(EN) · 16h

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

Researchers have developed a new algorithm called ADAP for optimizing inference-time pipelines in language models. This method is designed for scenarios where a cheap reward signal is used alongside a more expensive verification process, such as checking mathematical solutions or executing code. ADAP adaptively increases the number of sampled responses and verifications to find a positive example efficiently, outperforming fixed or difficulty-adaptive baselines in experiments. AI

IMPACT Optimizes inference efficiency for complex language model tasks like code generation and mathematical reasoning.
- Mahdi Haghifam
TOOL · arXiv cs.LG English(EN) · 16h

Function-Vector Heads Are Two Populations: Writers and Cancellers in In-Context Learning

Researchers have identified two distinct populations within function-vector (FV) heads in large language models, challenging the assumption that these heads are a homogeneous group. By employing a sign-preserving criterion instead of magnitude-only ranking, they found that FV heads either push correct logits up (writers) or push them down (cancellers). This dual nature was observed across multiple model families and scales, and zero-ablating cancellers led to improved accuracy. AI

IMPACT Reveals a more nuanced understanding of how LLMs process information, potentially impacting future model interpretability and design.
TOOL · arXiv cs.LG English(EN) · 16h

Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense

Researchers have developed new methods for attacking and defending graph neural networks (GNNs) against information leakage. The study characterizes how graph properties like homophily and heterophily influence the recoverability of training data. Building on a Markov chain approximation, they propose an attack that reconstructs graph adjacency by aligning representations across GNN layers and a defense that suppresses this sensitive information while maintaining classification accuracy. AI

IMPACT Introduces new techniques for privacy preservation in GNNs, potentially impacting how sensitive graph data is handled.
TOOL · arXiv cs.LG English(EN) · 16h

Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime

Researchers have developed a new conformal prediction framework to quantify uncertainty in neural operator learning, specifically for the 2D incompressible Navier-Stokes equations. This method uses a perturbation-based approach to estimate uncertainty by comparing predictions from two similarly trained neural operators. It aims to provide calibrated uncertainty estimates efficiently, even in data-scarce scenarios, by avoiding the need for separate uncertainty networks. AI

IMPACT This method offers a more sample-efficient way to quantify uncertainty in complex physical simulations, potentially improving the reliability of AI models in scientific applications.
- Fourier Neural Operator
- 2D incompressible Navier-Stokes equations
TOOL · arXiv cs.CV English(EN) · 16h

Generalizing Geometry-Guided Mamba as a Plug-and-Play Context Module for CNN-based Semantic Segmentation

Researchers have adapted a geometry-guided Mamba model, originally from DGM-Net, to serve as a plug-and-play context module for CNN-based semantic segmentation. This approach injects geometric guidance into the selective scan process, enabling long-range feature propagation modulated by boundary and centripetal-flow cues. When integrated into six different CNN segmentation models, the geometry-guided SSM modules consistently improved mean Intersection over Union (mIoU) scores on the Cityscapes dataset with only a slight increase in computational cost. AI

IMPACT Enhances existing CNN segmentation models with improved context aggregation, potentially leading to more accurate image analysis in computer vision tasks.
- ResNet-101
- Geometry-Guided Mamba
- DGM-Net
- CNN
- Cityscapes
- DANet
- PSPNet
- OCRNet
TOOL · arXiv cs.LG English(EN) · 16h

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Researchers have developed a new type of backdoor attack targeting Vision-Language-Action (VLA) models, which are crucial for embodied AI applications like robotics. Unlike previous methods that rely on visible visual triggers, this novel "State Backdoor" utilizes the initial state of a robot arm as the trigger. A Preference-guided Genetic Algorithm was employed to find minimal yet effective state-based triggers, achieving over 90% attack success without degrading performance on normal tasks. AI

IMPACT Reveals a new vulnerability in embodied AI, potentially requiring new security measures for robotic systems.
TOOL · arXiv cs.CV English(EN) · 16h

Learnable Token Sparsification for Efficient Gigapixel Whole Slide Image Reasoning

Researchers have developed a novel method for processing gigapixel whole slide images in vision language models by treating token reduction as a trainable sparsification problem. This approach, detailed in a new arXiv paper, allows the model to learn an optimal selection strategy for visual tokens, unlike previous methods that used non-trained downsampling or heuristic pruning. The proposed decoupled routing architecture and SparseLearn component enable gradient propagation through the pruning process, ultimately reducing the visual sequence to a sparse set of 32 tokens with minimal computational overhead during inference. This technique achieves high accuracy on benchmarks like SlideBench, offering an efficient paradigm for end-to-end gigapixel image reasoning. AI

IMPACT Enables more efficient and accurate analysis of large medical images by AI, potentially improving diagnostic capabilities.
- TCGA
- WSI VQA*
- SparseLearn
- SlideBench
TOOL · arXiv cs.CV English(EN) · 16h

Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology

Researchers have developed a new method called Stain-Aware Wavelet Regularization (SAWR) to improve the robustness of deep learning models used in histopathology. This technique uses wavelet-domain regularization to separate adversarial noise from important tissue structures in medical images. SAWR also adapts this regularization to specific stain channels, enhancing its effectiveness and improving adversarial robustness by over 10% while preserving image quality. AI

IMPACT Enhances the reliability of AI in clinical diagnostics by mitigating adversarial attacks on histopathology images.
- Stain-Aware Wavelet Regularization
- Hematoxylin
TOOL · arXiv cs.LG English(EN) · 16h

Self-Consistent Generative Paths via Admissible Random Variational Transport

Researchers have introduced a new framework for understanding generative models, focusing on the concept of "self-consistent generative paths." This framework defines a path as self-consistent if it represents a random fixed point of admissible local variational transport corrections. The theory yields a metric called the random fixed-point path residual (R-FPR) to quantify the gap between a generated path and its correction, offering a principle for diagnosing and improving various generative models. AI

IMPACT Introduces a theoretical framework for unifying and improving various generative models, potentially impacting future research and development.
- Self-Consistent Generative Paths via Admissible Random Variational Transport
- arXiv cs.LG
TOOL · arXiv cs.LG English(EN) · 16h

Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

Researchers have developed a novel system to measure hate speech on a continuous spectrum, ranging from genocidal to supportive language. This approach combines supervised deep learning with faceted Rasch item response theory, breaking down hate speech into 10 ordinal labels. These labels are then probabilistically modeled to create an interval outcome measure, while also accounting for individual annotator perspectives. The system, applied to a dataset of 50,070 social media comments from YouTube, Twitter, and Reddit annotated by over 11,000 Mechanical Turk workers, utilizes a RoBERTa-based model that demonstrates improved accuracy over existing methods. AI

IMPACT Introduces a new paradigm for NLP that encourages continuous constructs and incorporates annotator perspective and model explainability.
TOOL · arXiv cs.LG English(EN) · 16h

Bulk-boundary decomposition of neural networks

Researchers have introduced the bulk-boundary decomposition, a novel framework for analyzing the training dynamics of deep neural networks. This approach separates the network's Lagrangian into a data-independent bulk term and a data-dependent boundary term. The bulk term characterizes the inherent dynamics influenced by network architecture and activation functions, while the boundary term reflects the stochastic interactions arising from training samples at the input and output layers. This decomposition reveals the local and homogeneous structure within deep networks, leading to the derivation of an energy continuity equation. AI

IMPACT Introduces a new theoretical lens for understanding and potentially optimizing neural network training processes.
- Donghee Lee
TOOL · arXiv cs.LG English(EN) · 16h

Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

Researchers have developed a new framework called Bias-Guided Prompt Search (BGPS) to automatically uncover hidden biases in text-to-image models. This method uses an LLM to generate prompts that, when fed into image generation models, amplify specific attributes like gender or race. Experiments on Stable Diffusion revealed previously undocumented biases, highlighting vulnerabilities in current models and offering a new evaluation tool for bias mitigation efforts. AI

IMPACT This research provides a novel method for identifying and potentially mitigating biases in generative AI, crucial for responsible AI development.
TOOL · arXiv cs.CV English(EN) · 16h

Rethinking 3D Shape Generation: Diffusion over Superquadrics

Researchers have developed a new method for generating 3D shapes by diffusing over superquadric parameters instead of dense geometric representations. This approach significantly reduces the dimensionality of the diffusion state, requiring only 7KB of parameters per shape. The diffusion-over-superquadrics method enables faster generation, improved scalability, and supports advanced capabilities like part-level editing and constraint-based design, while achieving competitive performance on standard benchmarks. AI

IMPACT Enables more efficient and controllable 3D shape generation, potentially impacting fields requiring rapid asset creation.
- Diffusion models
TOOL · arXiv cs.LG English(EN) · 16h

Cryptographic Backdoor for Neural Networks: Boon and Bane

Researchers have developed a method to embed cryptographic backdoors into neural networks, which can be used for both offensive attacks and defensive measures. These backdoors enable powerful, undetectable attacks while also facilitating provably robust watermarking, user authentication, and intellectual property tracking. The work draws inspiration from existing cryptographic techniques and has been demonstrated on modern neural network architectures, with potential for post-quantum applications. AI

IMPACT Introduces new methods for securing neural networks against unauthorized use and tampering.
TOOL · arXiv cs.LG English(EN) · 16h

phepy: Visual benchmarks and improvements for out-of-distribution detectors

Researchers have developed a new benchmark called "phepy" to evaluate out-of-distribution (OOD) detection methods in machine learning. This benchmark uses three novel, visually intuitive toy examples to assess a detector's ability to identify linear and non-linear concepts, as well as thin in-distribution subspaces within high-dimensional data. The study also explores methods for synthesizing OOD inputs for supervised training and introduces improvements like t-poking and OOD sample weighting to enhance detector precision at the decision boundary. AI

IMPACT Provides new tools and methods for improving the reliability of machine learning models in real-world, unpredictable scenarios.
- Andreas Rupp
TOOL · arXiv cs.LG English(EN) · 16h

Byzantine Cheap Talk: Adversarial Resilience and Topology Effects in LLM Coordination Games

Researchers have explored the vulnerabilities of multi-agent LLM systems that rely on communication for coordination. Their study found that when some agents act deceptively (Byzantine agents), others can detect the betrayal but struggle to adapt, leading to continued exploitation. The research also revealed that restricting communication pathways can degrade cooperation, even without an adversary present, by affecting the agents' meta-reasoning about hidden information. AI

IMPACT Reveals specific security vulnerabilities in LLM coordination, suggesting communication channels can be exploited and topology disclosure can degrade performance.
- Byzantine agents
- LLM
TOOL · arXiv cs.LG English(EN) · 16h

IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking

Researchers have developed IR-SIM, a new lightweight simulator designed to streamline robotics research, particularly for tasks involving large language models. This simulator allows for the creation and modification of navigation scenarios using simple YAML configuration files and text prompts, making it easier to prototype and develop algorithms. IR-SIM also facilitates automated benchmarking and data generation for robot learning, with capabilities to bridge to higher-fidelity simulators and real-world deployments. AI

IMPACT Simplifies the development and benchmarking of AI-powered robot navigation systems.
- large language models
- IR-SIM
TOOL · arXiv cs.CV English(EN) · 16h

MB-Loc: Multi-planar Bird's-eye-view Localization in outdoor LiDAR scenes

Researchers have developed MB-Loc, a new framework for multi-planar bird's-eye-view localization in outdoor LiDAR scenes. This method addresses computational inefficiency and viewpoint sensitivity in existing scene coordinate regression techniques. MB-Loc projects LiDAR scans into a 2.5D representation, enabling faster processing with standard 2D CNNs while retaining crucial 3D geometric information. The framework also incorporates a KL-regularized latent bottleneck for spatial uncertainty modeling and 3D spatial augmentations for rotation robustness, outperforming current state-of-the-art methods on the NCLT dataset at real-time inference speeds. AI

IMPACT Enhances autonomous navigation systems by improving the efficiency and robustness of LiDAR localization.
TOOL · arXiv cs.LG English(EN) · 16h

Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds

Researchers have developed a new decentralized online Riemannian optimization algorithm capable of operating beyond the limitations of Hadamard manifolds, extending its applicability to spaces with positive curvature. The algorithm incorporates a curvature-aware consensus step that facilitates linear convergence even in these more complex geometric settings. This advancement leads to a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent method, with similar bounds achieved in a two-point bandit feedback scenario using efficient gradient estimators. AI
- Emre Sahinoglu
TOOL · arXiv cs.CV English(EN) · 16h

RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation

Researchers have developed a new framework called RGB-S that explicitly aligns tactile sensor data with visual information for robotic manipulation. This method projects tactile sensor locations directly onto RGB images, creating saliency maps that account for spatial uncertainty. By integrating these 2D anchors, the system injects physical contact priors into visual models, improving their ability to handle unreliable or occluded visual inputs. Experiments demonstrated a significant improvement in success rates for dexterous manipulation tasks under severe visual occlusion. AI

IMPACT Enhances robotic manipulation capabilities by improving sensor fusion and robustness to visual occlusions.
- Robotic Dexterous Manipulation
TOOL · arXiv cs.LG English(EN) · 16h

Learning from flowsheets: A generative transformer model for autocompletion of flowsheets

Researchers have developed a novel method for autocompleting chemical flowsheets using a transformer-based language model. The approach represents flowsheets as strings and trains the model on their grammatical structure and common patterns. After pre-training on synthetic data and fine-tuning on real-world examples, the model can suggest completions for flowsheets, aiding chemical engineers in process synthesis. AI

IMPACT This AI-driven autocompletion could streamline chemical process design and accelerate innovation in the field.
- SFILES 2.0
- Lukas Schulze Balhorn
TOOL · arXiv cs.LG English(EN) · 16h

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Researchers have developed BlendServe, a new system designed to optimize offline inference for auto-regressive large language models. BlendServe combines resource overlapping and prefix sharing techniques to maximize throughput and reduce costs for latency-insensitive applications. Evaluations show that BlendServe can achieve up to a 1.44x throughput increase compared to existing standards like vLLM and SGLang. AI

IMPACT Optimizes LLM inference for cost and throughput, potentially lowering operational expenses for AI applications.
- vLLM
- SGLang
- Yilong Zhao
- BlendServe
TOOL · arXiv cs.LG English(EN) · 16h

Analysis of Information Theory for Explainable AI

Researchers have developed a new post-hoc visual explanation method for convolutional neural networks called MI CAM. This method utilizes activation mapping and weighs feature maps based on their mutual information with the input image and the network's final output. MI CAM aims to provide causal interpretations and has demonstrated performance on par with or exceeding state-of-the-art methods in qualitative and quantitative measures. AI

IMPACT Provides a novel method for understanding AI decision-making, potentially improving trust and debugging in critical applications.
- Ram S Iyer
TOOL · arXiv cs.LG English(EN) · 16h

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

Researchers have developed a Stroop-style paradigm to investigate how language models handle conflicting instructions. Their experiments, conducted across 11 open-weight models, reveal that lexical priors persist through override rather than being replaced. Activation patching on aligned models pinpointed a specific source-position triplet crucial for binding these conflicting pieces of information. AI

IMPACT This research offers a new method for probing LLM behavior, potentially leading to better understanding and control of their responses.
TOOL · arXiv cs.LG English(EN) · 16h

Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling

Researchers have developed a new method for training machine learning models to downscale climate data, focusing on how to select training years effectively. Their study, using the CESM2 Large Ensemble, found that training models on years distributed across the entire climate trajectory, rather than contiguous historical periods, significantly improves their ability to reproduce climate variability. This approach, even with limited data, outperforms models trained solely on historical data and suggests that broad sampling of climate states is more beneficial than temporal continuity for allocating scarce high-resolution simulation resources. AI

IMPACT Optimizes training data selection for climate models, potentially improving accuracy and efficiency in climate impact assessments.
- CESM2 Large Ensemble
TOOL · arXiv cs.LG English(EN) · 16h

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

Researchers have introduced a new framework called Lead-Lag Forecasting (LLF) to address the challenge of predicting future impacts based on early user interactions on social platforms. To support this research, they have created two large benchmark datasets derived from arXiv and GitHub, encompassing millions of papers and repositories respectively. These datasets are designed to capture long-term dynamics and avoid sampling biases, providing a foundation for developing and testing LLF models. AI

IMPACT Establishes a new forecasting paradigm for analyzing long-term user behavior dynamics on social platforms.