Brief

last 24h

[50/962] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · 36氪 (36Kr) 中文(ZH) · 11h

Zhuque-2 Improved Version 6 successful launch

Apple has unveiled an updated Siri with enhanced AI capabilities, aiming to improve user interaction and functionality. In other tech news, Rokid has addressed allegations of its smart glasses being used for illicit recording. Meanwhile, OpenAI is reportedly preparing for an Initial Public Offering (IPO) by submitting confidential filings. AI

IMPACT Apple's Siri AI update may improve user experience, while OpenAI's IPO filing signals potential market shifts.
- Rokid
- OpenAI
- Apple
- Siri
TOOL · Medium — Claude tag English(EN) · 10h

The Claude Code Productivity Hack: Automating Client-Ready HTML Reports

This article details a method for using Anthropic's Claude AI to automate the creation of client-ready HTML reports. It focuses on a technique to eliminate the "Translation Tax" by enabling an autonomous agent to manage its own project workflow through custom skills. The guide provides a practical approach for developers to streamline report generation. AI

IMPACT Provides a practical guide for developers to leverage AI for automated report generation, potentially improving workflow efficiency.
- Claude
- Anthropic
TOOL · Towards AI English(EN) · 14h

When Your Documents Aren’t Just Text: Training Vision Models for Document Understanding

Training AI models on technical documents often overlooks crucial visual information like diagrams and charts, leading to incomplete understanding. Standard text extraction methods discard these elements, resulting in models trained on data with significant meaning gaps. To address this, a computer vision approach using YOLO was employed to detect, classify, and extract these visual components, enabling their integration with textual data for more comprehensive document understanding. AI

IMPACT Improves AI model training by enabling the capture of visual data, leading to better understanding of complex technical documents.
- YOLO
- AI
TOOL · dev.to — MCP tag English(EN) · 10h

4 New MCP Servers Closing East Africa's Coordination Gap

Four new open-source MCP servers were released this week, each designed to address specific coordination failures in East Africa. These servers cover alternative credit scoring using M-PESA data, commodity price intelligence for farmers, insurance solutions including crop risk scoring, and a portable reputation and skills passport for workers. The projects aim to build essential economic infrastructure by leveraging technology to overcome information asymmetry and improve access to financial services and markets. AI

IMPACT These tools leverage AI for credit scoring and decision-making, potentially improving economic participation and access to financial services in the region.
- bima-mcp
- sifa-mcp
- gabrielmahia
- MCP
- East Africa
- mkopo-mcp
- M-PESA
- soko-mcp
TOOL · dev.to — MCP tag English(EN) · 11h

MCPEasy: Stop Rewriting MCP Integration Boilerplate

MCPEasy is a new framework designed to streamline the integration of various MCP tools by eliminating repetitive boilerplate code. It offers unified authentication, a tool registry for automatic discovery, and standardized error handling with features like automatic retries and logging. This framework significantly reduces the amount of code required for tool integration, improving security and maintainability. AI

IMPACT Simplifies developer workflows for integrating AI tools, potentially accelerating adoption.
- MCPEasy
- MCP
TOOL · Tom's Hardware English(EN) · 10h

Linux developers are using AI vibe coding to keep vintage AMD GPUs alive — R600 driver cleaned up with GitHub Copilot gives HD 2000 to HD 6000 series a new lease of life

Linux developers are leveraging AI coding assistants like GitHub Copilot to maintain legacy graphics drivers for older AMD GPUs. This AI-assisted approach helps a small team of maintainers update code for hardware dating back to the late 2000s, ensuring continued functionality for users of these vintage cards. The Linux kernel community has adopted a policy to tag AI-assisted code, with the ultimate responsibility for testing and publishing changes resting with the human developer. AI

IMPACT AI coding assistants are proving useful for maintaining legacy hardware drivers, potentially extending the life of older computing equipment.
- HD 6000
- Gallium3D
- Gert Wollny
- Linus Torvalds
- Mesa
- R600
- Linux
- AMD
- GitHub Copilot
TOOL · 36氪 (36Kr) 中文(ZH) · 11h

China Securities Association Launches Call for AI Application Cases in Securities Industry, Covering Six Major Categories of Scenarios

The Securities Association of China is launching a call for AI application case studies within the securities industry. These cases will focus on six key areas: operational efficiency, customer experience, decision support, innovation, R&D, and risk compliance, with an additional category for emerging applications like intelligent agents. The initiative aims to identify and promote effective and replicable AI implementations across securities firms. AI

IMPACT This initiative could accelerate AI adoption and best practice sharing within China's securities sector, potentially leading to improved efficiency and innovation.
TOOL · Medium — Claude tag English(EN) · 10h

I Built a Skill That Got Merged Into a 211,000-Star GitHub Repo.

A developer successfully contributed a new skill to the popular ECC GitHub repository, which functions as an agent harness system. This repository has garnered over 211,000 stars, indicating significant community interest and adoption. The integration of the new skill highlights the collaborative and open-source nature of AI development, allowing for community contributions to enhance agent capabilities. AI

IMPACT Demonstrates community-driven enhancement of AI agent systems, potentially leading to broader adoption of specialized skills.
TOOL · dev.to — LLM tag English(EN) · 12h

LLM Spend Audit: The 45-Minute Diagnostic for Startups

This article outlines a 45-minute diagnostic process for startups to audit and control their spending on large language models (LLMs). It emphasizes that LLM costs often escalate due to numerous small, unmonitored calls across various functions like retries, background jobs, and internal tools, rather than single expensive prompts. The audit involves mapping all LLM call paths, attaching costs to specific units of value, identifying waste from retries and tool calls, strategically assigning tasks to cheaper models where appropriate, and implementing budget guardrails with clear ownership. AI

IMPACT Provides a structured approach for AI operators to identify and reduce unnecessary LLM operational costs.
- LLM
- startups
TOOL · Medium — Claude tag English(EN) · 10h

7 Claude Code Slash Commands That Saved Me 10+ Hours Every Month

This article highlights seven specific slash commands within Claude Code that can significantly boost developer productivity. The author claims these commands have saved them over 10 hours per month by streamlining common coding tasks. The piece suggests that many users are not fully leveraging Claude Code's capabilities, leading to wasted time. AI

IMPACT Offers practical tips for users of an AI coding assistant to improve efficiency.
- Medium
- Claude Code
TOOL · Medium — Claude tag (CA) · 12h

Claude Code Error 429 Fix: Rate Limit Exceeded (2026)

This article addresses the "Claude Code Error 429: Rate Limit Exceeded," a common issue encountered when using Anthropic's AI models. It explains that this error signifies that too many requests have been made to the API within a given timeframe. The piece offers guidance on how to resolve this by implementing strategies such as exponential backoff, request queuing, and optimizing API calls to manage usage and avoid hitting rate limits. AI

IMPACT Helps developers manage API usage and avoid errors when integrating Claude models into their applications.
- Claude
- Anthropic
TOOL · arXiv cs.AI English(EN) · 16h

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

Researchers have developed a new hybrid model for predicting customer churn on structured data, combining a feature-tokenized transformer (FT-Transformer) with XGBoost. This approach aims to capture complex feature interactions and improve probability calibration, addressing challenges like class imbalance and nonlinear relationships. Tested on a public bank churn dataset, the model achieved an F1 score of 62.10% and an AUC-ROC of 0.861, outperforming a standard Multi-Layer Perceptron baseline. AI

IMPACT Introduces a novel hybrid architecture for structured data prediction, potentially improving accuracy in business applications like customer retention.
TOOL · arXiv cs.LG English(EN) · 16h

TriHead-GAN: A Generative Adversarial Network with Triple-Head Discriminator for Carbon Emission Time Series Generation

Researchers have developed TriHead-GAN, a novel generative adversarial network designed to create synthetic carbon emission time series data. This model addresses the scarcity of high-frequency monitoring data, which hinders deep learning applications in climate policy and regulation. TriHead-GAN's unique triple-head discriminator ensures the generated data accurately reflects cross-variable correlations and realistic temporal variability, outperforming existing methods in experiments. AI

IMPACT Enables more robust AI models for climate monitoring and policy by addressing data scarcity.
TOOL · arXiv cs.AI English(EN) · 16h

SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

Researchers have developed SatIR, a novel retrieval system designed to improve the matching of patients to clinical trials. This system goes beyond simple semantic similarity by treating trial eligibility criteria as formal constraints that must be satisfied. SatIR integrates Satisfiability Modulo Theories (SMT), relational algebra, medical ontologies, and LLMs to convert complex clinical information into executable constraints, enabling more accurate and efficient trial matching. AI

IMPACT This approach could significantly improve patient access to relevant clinical trials by overcoming limitations of traditional similarity-based search.
- Zikai Zhou
- SMT
- LLMs
- SatIR
TOOL · arXiv cs.AI English(EN) · 16h

Crop Recommendation and Agricultural Query Answering System Using Spatio-Temporal Graph Neural Networks and Hybrid Retrieval Augmentation

Researchers have developed a system for precision agriculture that uses Spatio-Temporal Graph Neural Networks (STGCN) and a Transformer-based model to forecast weather for the next 30 days across 1,359 locations in Nepal. The STGCN model demonstrated superior accuracy in predicting weather patterns. This system combines weather forecasts with soil data to provide localized crop recommendations and includes a Retrieval-Augmented Generation chatbot to answer farmers' questions in natural language, all accessible via a mobile application. AI

IMPACT Enhances agricultural decision-making with AI-driven weather forecasts and crop recommendations, potentially improving yields and resilience.
TOOL · arXiv cs.AI English(EN) · 16h

DIYHealth Suite: Dataset, Model, and Benchmark for Health Management at Home

Researchers have introduced the DIYHealth Suite, a new framework aimed at advancing AI-powered health management within home settings. This suite includes a large-scale multimodal dataset called DIYHealth-900K, designed to capture diverse real-world home care scenarios. It also features DIYHealthGPT, an adaptive foundation model utilizing a novel Hybrid Hyper Low-Rank Adaptation technique, and DIYHealthBench, the first benchmark specifically for evaluating foundation models on home care tasks. Experiments show DIYHealthGPT achieving state-of-the-art performance across 11 home care tasks. AI

IMPACT This framework could enable more accessible and personalized AI-driven health monitoring and management outside of clinical settings.
TOOL · arXiv cs.AI English(EN) · 16h

PolyBuild: An End-to-End Method for Polygonal Building Contour Extraction from High-Resolution Remote Sensing Images

Researchers have developed PolyBuild, a novel end-to-end method for extracting building polygon contours directly from high-resolution remote sensing images. This approach bypasses the need for computationally intensive post-processing steps common in existing methods. PolyBuild utilizes an Initial Contour Generation Module for initial extraction and a Contour Optimization Module, incorporating CNN and Transformer features, to refine the contours, achieving superior performance on multiple datasets. AI

IMPACT This method could streamline mapping applications by automating building contour extraction from remote sensing data.
- PolyBuild
- Yaoteng Zhang
TOOL · arXiv cs.AI English(EN) · 16h

Anchor-Conditioned Compositional Control for Landscape Image Generation

Researchers have developed a new framework for fine-tuning diffusion models to enhance compositional control in landscape image generation. This method uses a four-dimensional compositional anchor vector, integrated via a decoupled cross-attention mechanism, to guide image creation. Evaluations show significant improvements in horizon detection and adherence to the rule of thirds, with precision found to be category-dependent. AI

IMPACT Introduces a novel technique for fine-grained control over AI image generation, potentially improving artistic and photographic applications.
TOOL · arXiv cs.AI English(EN) · 16h

Closing the Sim-to-Real Gap: An Evaluation Framework for Autonomous Cyber Defense Configuration of Commercial EDR

Researchers have developed a new framework to evaluate autonomous cyber defense agents that configure commercial Endpoint Detection and Response (EDR) systems. This framework addresses the challenge of a "sim-to-real" gap, where autonomous agents interact with complex, black-box EDR tools like Microsoft Defender XDR. The evaluation, conducted in a simulated Active Directory environment, revealed that commercial EDR telemetry is not optimized for benchmarking, and the autonomous EDR behavior can fluctuate during testing. AI

IMPACT This framework could improve the reliability and safety of AI-driven cybersecurity tools by addressing the sim-to-real gap.
TOOL · arXiv cs.AI English(EN) · 16h

Comparative evaluation of training strategies using partially labelled datasets for segmentation of white matter hyperintensities and stroke lesions in FLAIR MRI

Researchers have developed and evaluated six strategies for training deep learning models to segment white matter hyperintensities and stroke lesions in MRI scans, particularly when dealing with partially labeled datasets. Their analysis, conducted on a large cohort of 2,052 MRI volumes, found that pseudolabeling was the most effective method for improving model performance. This approach demonstrates the potential for creating reliable automated segmentation tools to aid in monitoring cerebral small vessel disease and extracting biomarkers for clinical research. AI

IMPACT Demonstrates a viable method for training AI models on limited labeled data, potentially accelerating clinical research and disease monitoring.
TOOL · arXiv cs.AI English(EN) · 16h

CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation

Researchers have developed CURE, a new framework designed to improve the accuracy and reliability of AI-generated radiology reports. This error-aware curriculum learning approach enhances visual grounding and factual consistency without requiring additional data. By dynamically adjusting training to focus on more challenging samples, CURE significantly boosts grounding accuracy, report quality, and reduces instances of AI-generated hallucinations. AI

IMPACT Enhances AI's ability to generate reliable medical reports, potentially improving diagnostic efficiency and accuracy.
- Pablo Messina
TOOL · arXiv cs.AI English(EN) · 16h

Robust Renal Mass Segmentation on CT: A Validation Study of an AI-Based Framework

Researchers have developed Renal-Net, an AI framework for segmenting renal masses on CT scans, aiming to improve objective assessment of kidney volume and lesions. The algorithm, built using the nnU-Net framework and trained on public data, demonstrated strong generalization and outperformed existing state-of-the-art models. Validation across various patient subgroups and CT contrast phases confirmed the algorithm's robustness and reliability, with the code made publicly available. AI

IMPACT Enhances objective assessment of kidney volume and lesions, potentially improving clinical workflows for renal disease diagnosis and monitoring.
- nnU-Net
- Sarah De Boer
TOOL · arXiv cs.AI English(EN) · 16h

From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing

Researchers have introduced NormBench, a new benchmark designed to evaluate how well AI models can understand and parse legal and policy documents, specifically focusing on identifying nested exceptions and counter-exceptions. The benchmark uses Span-Grounded Deontic Trees (SG-DT) to represent rules and their exceptions, allowing for more precise scope parsing. Evaluations of current large language models revealed issues like "Recursion Decay" and an "Auditability Trap," indicating difficulties in handling complex rule structures and exceptions, though SG-DT showed promise in improving performance on these specific challenges. AI

IMPACT Highlights limitations in current LLMs for precise legal and policy interpretation, suggesting a need for improved reasoning and auditability in rule-following agents.
TOOL · arXiv cs.AI English(EN) · 16h

Bidirectional Semantic Complementary Tool Retrieval for Remote Sensing Agents

Researchers have developed a new method for improving how AI agents retrieve specialized tools for processing remote sensing data. The approach addresses the challenge of semantic asymmetry between general user intentions and specific tool documentation. By enhancing queries with functional semantics and enriching tool descriptions with contextual information, the system aims to improve retrieval accuracy for complex tasks. AI

IMPACT Enhances AI agent capabilities in specialized domains like remote sensing, potentially improving efficiency and accuracy.
TOOL · arXiv cs.AI English(EN) · 16h

Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts

Researchers have developed Graph2Idea, a new framework designed to enhance the generation of scientific research ideas. This system utilizes knowledge graphs to structure retrieved literature, moving beyond the limitations of flat text contexts. By transforming papers into knowledge triples and constructing a target-centered graph, Graph2Idea extracts relevant relational evidence while reducing noise, ultimately guiding LLMs to synthesize more novel, high-quality, and feasible research concepts. AI

IMPACT This framework could improve the efficiency and creativity of scientific research by leveraging structured knowledge graphs for idea generation.
TOOL · arXiv cs.AI English(EN) · 16h

Syll: Open-Source Personal Automation with Cross-Surface Execution

Researchers have introduced Syll, an open-source personal automation system designed to operate across various interfaces including APIs, command lines, and graphical user interfaces. Syll allows users to teach agents by direct demonstration, which are then compiled into reusable skills. The system provides multimodal evidence of agent execution, such as logs and checkpoints, for user inspection and control. Syll externalizes memory, skills, and routines as editable local artifacts, aiming to provide a practical foundation for extensible personal automation. AI

IMPACT Provides a foundation for teachable, inspectable personal AI agents across diverse computing interfaces.
TOOL · arXiv cs.AI English(EN) · 16h

Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care

Baichuan Intelligence has introduced Baichuan-M4, a medical large model designed for continuous patient care. This system integrates a unified runtime for consistent training and deployment, a core reasoning model trained with reinforcement learning for long-term patient memory and multi-agent coordination, and a clinical tool layer for evidence retrieval and multimodal understanding. Baichuan-M4 demonstrates leading performance across various medical evaluations, including static knowledge, dynamic consultations, and image analysis, while significantly reducing hallucination rates. AI

IMPACT This advanced medical AI system could set new benchmarks for continuous patient care and diagnostic accuracy in healthcare.
TOOL · arXiv cs.LG English(EN) · 16h

QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants

Researchers have developed QDSP, a novel interpretable structured learning framework designed to predict mortality or cerebral palsy in very low birth weight infants. The framework integrates Quota-guided Subspace Sampling (QSS) and Differentiable-decision-guided Structure Perception (DSP) to model complex clinical interactions and identify key predictors. QDSP demonstrated high accuracy and AUC on a real-world cohort and public datasets, outperforming existing machine learning models and providing clinically relevant insights. AI

IMPACT Provides a more accurate and interpretable tool for high-risk infant prognostication, potentially improving clinical decision-making.
TOOL · arXiv cs.AI English(EN) · 16h

Rule-based autocorrection of Piping and Instrumentation Diagrams (P&IDs) on graphs

Researchers have developed a novel rule-based method to automatically detect and correct errors in Piping and Instrumentation Diagrams (P&IDs), which are crucial documents in chemical process engineering. The system represents P&IDs as graphs and applies rule graphs to identify and fix discrepancies, significantly reducing the manual workload associated with reviewing hundreds or thousands of pages. A case study demonstrated the method's reliability and effectiveness, utilizing 33 developed rules and the pyDEXPI Python package for P&ID graph generation. AI

IMPACT Automates a critical, labor-intensive task in chemical engineering, potentially speeding up design and review cycles.
TOOL · arXiv cs.AI English(EN) · 16h

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

Researchers explored fine-tuning smaller language models for financial transaction merchant information extraction, aiming to reduce the costs associated with larger models. Their study evaluated 24 variants across four model families, including Gemma, Qwen, Aya, and LLaMA, focusing on accuracy, throughput, and training cost. Findings indicate that models like Qwen 3.5 4B and even the 0.8B version offer competitive performance with significantly fewer parameters and better latency, making them viable alternatives for production deployment. AI

IMPACT Demonstrates that smaller, more efficient models can achieve comparable performance to larger ones for specific tasks, potentially lowering operational costs and increasing accessibility.
- Qwen 3.5
- Aya
- Cohere2
- Databricks
- LLaMA 3.1-8B
- Gemma 3
TOOL · arXiv cs.AI English(EN) · 16h

AI-Integrated Learning Management System for Middle School: A Longitudinal Study of Learning Outcomes Through High School and Beyond

Researchers have developed an AI-integrated Learning Management System (LMS) designed for middle school students to provide timely and targeted support. This system aims to offer formative feedback, recommend practice based on mastery, and alert teachers to persistent struggles, addressing the common issue of students receiving help too late. The platform prioritizes privacy with a data minimization approach and auditable logs, and its effectiveness will be studied longitudinally through high school and beyond to assess its impact on learning trajectories. AI

IMPACT This system could improve educational outcomes by providing personalized, timely support to students, potentially altering long-term learning trajectories.
TOOL · arXiv cs.AI English(EN) · 16h

A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design

Researchers have developed a new method for designing nanocrystal synthesis using AI, addressing the historical trial-and-error approach. They created NanoExtractor, an LLM-enhanced tool that extracts structured synthesis data from literature, achieving high accuracy compared to other models. This data forms the basis of the Nanocrystal Synthesis-Property (NSP) database, which contains nearly 160,000 entries and powers NanoDesigner, an LLM capable of inverse synthesis design. NanoDesigner has successfully proposed viable synthesis routes for known and novel nanocrystals, demonstrating a powerful human-AI collaboration for accelerating materials discovery. AI

IMPACT Enables AI-driven discovery of new materials and synthesis processes, accelerating scientific research.
TOOL · arXiv cs.AI English(EN) · 16h

Web Agents Should Use Typed Actions Instead of Click-Based Browsing

A new position paper proposes a shift from low-level, click-based interactions to typed actions for web agents. This approach, termed 'web verbs,' would expose web operations as typed functions with structured inputs and outputs, enhancing reliability and auditability for long-horizon tasks. The authors argue that this semantic layer is crucial for building trustworthy and scalable agentic web systems. AI

IMPACT This proposal could lead to more reliable and auditable web agents, improving their ability to perform complex, long-horizon tasks.
- Web Agents
- Linxi Jiang
TOOL · arXiv cs.AI English(EN) · 16h

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

Researchers have developed a highly accurate image classification system for historical documents, capable of distinguishing between text, tables, and graphics. Fine-tuned deep learning models, specifically RegNetY-16GF and ViT-large, achieved over 99% accuracy on a dataset of over 48,000 scanned pages. This system is designed to facilitate content-specific processing in large-scale digitization projects, with the models, dataset, and software made publicly available under open-source licenses. AI

IMPACT Enables efficient content-specific processing for large historical document archives, accelerating digitization efforts.
TOOL · arXiv cs.AI English(EN) · 16h

MedicalRec: Medical recommender system for image classification without retraining

Researchers have developed a transformer-based recommender system called MedicalRec to help select optimal machine learning models for medical image classification tasks. This system aims to reduce the energy consumption and waste associated with the trial-and-error process of model selection. MedicalRec was evaluated on a new dataset, MedicalRec-Bench, which contains over 5,000 records of models tested across various medical imaging categories, achieving a HitRate@100 of 75.5%. The dataset and code are publicly available. AI

IMPACT Reduces computational waste in AI model selection for medical imaging, potentially accelerating research and deployment.
- MedicalRec-Bench
- MedicalRec
TOOL · arXiv cs.AI English(EN) · 16h

Test-Time Adaptive Composition for Machine Learning as a Service (MLaaS) in IoT Environments

Researchers have developed a new Test-Time Adaptive (TTA) composition framework designed to improve the effectiveness of Machine Learning as a Service (MLaaS) in dynamic Internet of Things (IoT) environments. This framework addresses challenges with existing adaptive methods by introducing a TTA-aware composability model to ensure service compatibility and a service-level adaptation model to adjust individual services during inference. Experiments show this approach significantly reduces computational time compared to traditional methods. AI

IMPACT Enhances the reliability and efficiency of ML services in dynamic IoT settings, potentially enabling more robust real-time applications.
- Sai Krishna Deepak Kanneganti
- Machine Learning as a Service (MLaaS)
TOOL · arXiv cs.AI English(EN) · 16h

Larch: Learned Query Optimization for Semantic Predicates

Researchers have developed Larch, a new framework designed to optimize the execution of semantic filters within AI SQL queries. Larch addresses the high inference costs and latencies associated with semantic operators, which treat AI-generated filters as black boxes, hindering traditional optimization. The framework utilizes embedding-augmented neural networks and supervised learning models to predict filter selectivities and determine optimal evaluation orders, significantly reducing token usage. AI

IMPACT Optimizes AI-driven database queries, potentially reducing costs and improving performance for AI-powered data analysis.
- Palimpzest
- Quest
TOOL · arXiv cs.AI English(EN) · 16h

Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks

Researchers have developed a new framework for anti-money laundering (AML) transaction monitoring that leverages large language models (LLMs) for improved explainability and accuracy. This system treats triage as an evidence-constrained decision process, combining retrieval-augmented evidence bundling with LLMs that provide structured outputs and explicit citations. The framework also incorporates counterfactual checks to validate decisions and rationales against plausible perturbations, aiming to reduce hallucinations and enhance auditability in regulated workflows. AI

IMPACT Governed LLM systems can provide practical decision support for AML triage without sacrificing compliance requirements for traceability and defensibility.
- LLMs
- Dorothy Torres
TOOL · arXiv cs.AI English(EN) · 16h

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Researchers have developed MEnvAgent, a framework designed to automate the creation of executable software engineering environments across multiple programming languages. This system addresses the scarcity of verifiable datasets for training AI agents by employing a Planning-Execution-Verification architecture and an environment reuse mechanism to reduce computational costs. Evaluations on the MEnvBench benchmark showed MEnvAgent improved task completion rates by 8.6% and reduced time costs by 43%, also enabling the creation of the largest open-source polyglot dataset for verifiable Docker environments. AI

IMPACT Enables creation of larger, more realistic datasets for training AI agents in software engineering, potentially improving their capabilities across diverse programming languages.
- Chuanzhe Guo
- Docker
- MEnvBench
- LLM
- MEnvAgent
TOOL · arXiv cs.AI English(EN) · 16h

DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking

Researchers have introduced DIVERGE, a new retrieval-augmented generation (RAG) framework designed to enhance diversity in responses for open-ended information-seeking tasks. Unlike traditional RAG systems that assume single correct answers, DIVERGE iteratively explores diverse viewpoints and uses diversity-aware retrieval to improve the quality-diversity trade-off. Experiments show DIVERGE can double response diversity without sacrificing quality, addressing a key limitation in current RAG systems. AI

IMPACT Enhances RAG systems for open-ended queries, potentially improving creative and inclusive information access.
- Tianyi Hu
TOOL · arXiv cs.AI English(EN) · 16h

MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework

Researchers have introduced MM-Matryoshka, a novel 2D training framework designed to make visual document retrieval more budget-elastic. This approach allows a single model to adapt its retrieval accuracy based on available computational resources, by selecting a flexible budget for vector width and encoder depth. Experiments show that MM-Matryoshka significantly reduces storage and computational overhead compared to existing methods while maintaining high-quality retrieval. AI

IMPACT Enables more efficient deployment of visual document retrieval systems by allowing dynamic adjustment of computational resources.
- Visual Document Retrieval
- MM-Matryoshka
TOOL · arXiv cs.AI English(EN) · 16h

Toward autocorrection of chemical process flowsheets using large language models

Researchers have developed a new AI method to automatically identify and correct errors in chemical process flowsheets, which are critical diagrams used in engineering. This approach, inspired by large language models used for text correction, aims to reduce safety hazards and inefficiencies caused by flawed diagrams. The model achieved an 80% top-1 accuracy on a synthetic dataset, suggesting its potential as a valuable tool for chemical engineers. AI

IMPACT Potential to improve safety and efficiency in chemical engineering workflows through automated error detection.
TOOL · arXiv cs.AI English(EN) · 16h

DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems

Researchers have developed DSFNet, a novel framework designed to improve multi-modality spatio-temporal forecasting in urban transportation systems. This network explicitly models the complex relationships between different traffic data types and their temporal dynamics. By employing dual-domain spectral filtering, DSFNet captures heterogeneous spatial patterns and cross-modality couplings more effectively than existing methods, leading to significant accuracy improvements. AI

IMPACT Improves accuracy in urban traffic forecasting by explicitly modeling cross-modality couplings and temporal dynamics.
TOOL · arXiv cs.AI English(EN) · 16h

AI-Augmented Closed-Loop Quality Engineering: A Reference Architecture for Continuous Software Quality Intelligence

A new research paper introduces a reference architecture for AI-augmented closed-loop quality engineering in software development. This architecture aims to improve software quality by integrating feedback from production incidents into the development cycle. The proposed system synthesizes requirement analysis, test prioritization, and defect prediction, using a feedback learning model to enhance stability and efficiency across releases. Experiments show a significant reduction in defect leakage and test execution time compared to traditional methods. AI

IMPACT Introduces a novel framework for adaptive quality engineering, potentially improving software release stability and efficiency.
- software engineering
- AI
TOOL · arXiv cs.AI English(EN) · 16h

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Researchers have developed a new framework to optimize vision-language inference for drones operating in low-altitude economy networks. The system aims to reduce task latency and power consumption while meeting accuracy requirements. It employs an alternating optimization algorithm for resource allocation and a large language model-enhanced reinforcement learning approach for trajectory planning. AI

IMPACT This research could enable more efficient and accurate real-time multimodal data processing by drones in various applications.
- UAV-enabled Low-Altitude Economy Networks
- Large Language Model
TOOL · arXiv cs.AI English(EN) · 16h

MASS: Deep Research for Social Sciences with Memory-Augmented Social Simulation

Researchers have introduced MASS, a novel framework for enhancing AI-generated social science research. MASS integrates realistic social simulations with LLMs to foster creativity and provide empirical grounding, moving beyond simple literature retrieval. The system features dynamic goal-path planning, a multi-disciplinary dataset for agent memory, and a structured forgetting mechanism. Experiments show MASS improves overall generation quality by 6.81% and insight by 17.19% compared to baseline LLMs. AI

IMPACT This framework could lead to more insightful and empirically grounded AI-generated research in social sciences.
- LLMs
TOOL · arXiv cs.AI English(EN) · 16h

LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection

Researchers have developed LogNEO, a new framework for detecting anomalies in system logs using EleutherAI's GPT-Neo model. This system employs a novel reinforcement learning approach with a position-aware reward scheme and cross-entropy regularization. LogNEO achieves high F1 scores on standard benchmarks, outperforming prior state-of-the-art methods in recall, and has been demonstrated in a production environment with low latency and high throughput. AI

IMPACT This framework enhances real-time log anomaly detection capabilities, potentially improving system reliability and security in production environments.
- EleutherAI
- Thunderbird
- LogGPT
- Apache Kafka
- Redis
- TensorRT
- LogNEO
- GPT-Neo
TOOL · arXiv cs.AI English(EN) · 16h

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

Researchers have developed a new framework for modeling extremely long user behavior sequences in short-form video recommendation systems. The system uses content-native Semantic IDs instead of traditional item IDs to reduce embedding table size and improve generalization to new content. Additionally, a Global-Aware Compression Transformer condenses user sequences, significantly lowering memory and computational requirements. AI

IMPACT Enables more effective personalization in short-form video platforms by handling longer user histories.
TOOL · arXiv cs.AI English(EN) · 16h

Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs

Researchers have developed a new method for modeling the temporal evolution of legal norms, crucial for AI applications that require precise historical legal data. This approach uses the LRMoo ontology to create a structured pattern for versioning legal texts at a component level. By formalizing legislative amendments as events, the system allows for the exact reconstruction of any legal document as it existed on a specific date, providing a verifiable foundation for legal knowledge graphs and trustworthy AI in the legal domain. AI

IMPACT Provides a deterministic foundation for trustworthy legal AI by enabling precise historical reconstruction of legal texts.
- Hudson De Martim
- LRMoo
TOOL · arXiv cs.AI English(EN) · 16h

MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

Researchers have developed MemoVAD, a novel framework for resource-efficient video anomaly detection on edge devices. This system uses a combination of edge and cloud processing, with a unique uncertainty-aware gating policy that only sends high-uncertainty clips to a cloud-based Vision-Language Model. A dynamic semantic memory stores VLM-verified prototypes, allowing the edge model to progressively learn richer semantics and significantly reduce communication overhead while maintaining high performance. AI

IMPACT Introduces a method to integrate advanced VLM semantics into edge devices for anomaly detection, reducing latency and communication costs.