Brief

last 24h

[50/202] 185 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV · 1d

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

Researchers have introduced the What-Where Transformer (WWT), a novel visual backbone designed to better separate object appearance from spatial location. This new architecture uses a slot-based design where tokens represent 'what' an object is and attention maps represent 'where' it is located. The WWT demonstrates emergent capabilities in discovering multiple objects directly from attention maps, even when trained with standard classification supervision, and shows improved performance on zero-shot object discovery and weakly supervised semantic segmentation tasks. AI

IMPACT Introduces a new architectural bias for visual models that could improve localization tasks and emergent object discovery.
TOOL · Medium — Claude tag · 1d

Welcome, Mythos.

Mythos, a new AI model, has been introduced, described as "The Day AI Sat on Bedrock." The announcement was made on Medium, with further details available via a link to the platform. AI

IMPACT Introduction of a new AI model, potentially impacting future AI development and applications.
- Mythos
- Medium
TOOL · arXiv cs.CV · 1d

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Researchers have developed a new Spectral Vision Transformer (SVT) architecture designed for efficient tokenization, particularly in scenarios with limited data such as medical imaging. The SVT leverages spectral projection, offering theoretical advantages like spatial invariance and improved signal-to-noise ratio, which result in reduced computational complexity compared to standard spatial vision transformers. Experiments across simulated, public, and clinical datasets demonstrate that the SVT achieves comparable or better performance with fewer parameters than various other models, including compact and standard vision transformers, CNNs with attention, and MLPs. AI

IMPACT Introduces a more efficient model architecture for image tokenization, potentially improving performance in data-scarce domains like medical imaging.
- Spectral Vision Transformer
- Alexandra Roberts
TOOL · arXiv cs.CV · 1d

L2P: Unlocking Latent Potential for Pixel Generation

Researchers have developed a new framework called Latent-to-Pixel (L2P) that efficiently transfers knowledge from pre-trained Latent Diffusion Models (LDMs) to create powerful pixel-space models. This method avoids the need for extensive computational resources and real-world data by freezing most of the source LDM and training only shallow layers for the latent-to-pixel transformation. L2P utilizes synthetic images generated by LDMs as its training corpus, enabling rapid convergence with minimal hardware. The approach also eliminates the VAE bottleneck, allowing for native generation of ultra-high resolution images. AI

IMPACT Enables efficient creation of high-resolution pixel-space models by leveraging existing latent diffusion models, reducing training costs.
TOOL · arXiv cs.CV · 1d

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

Researchers have developed RealDiffusion, a new framework for generating coherent multi-character storybooks using diffusion models. The system employs heat diffusion as a prior to average features and stabilize character identity across sequential frames. Additionally, a region-aware stochastic process introduces controlled perturbations to maintain narrative dynamism and scene evolution. This approach aims to resolve the trade-off between character coherence and story progression, outperforming existing methods in experiments. AI

IMPACT Introduces a novel framework for improving coherence in AI-generated sequential media, potentially impacting creative content generation.
- RealDiffusion
- arXiv
TOOL · arXiv cs.CV · 1d

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Researchers have introduced a new framework for guided depth super-resolution that utilizes an Interactive State Space Model. This approach aims to efficiently create high-resolution depth maps from low-resolution inputs, using RGB images as guidance. The model incorporates a cross-modal local scanning mechanism to enable detailed semantic interactions between RGB and depth features, leveraging the Mamba architecture for linear complexity. Experiments indicate that this method achieves competitive results compared to existing state-of-the-art techniques. AI

IMPACT Introduces a novel approach for depth super-resolution, potentially improving efficiency and accuracy in computer vision tasks.
- Interactive State Space Model
- Mamba architecture
TOOL · arXiv cs.CL · 1d

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Researchers have developed StepCodeReasoner, a new framework designed to improve code reasoning by focusing on intermediate execution states rather than just final outputs. This approach uses structured print statements to create execution-trace anchors, training models to predict runtime states at each step. The framework also incorporates a novel reinforcement learning algorithm, Bi-Level GRPO, for better credit assignment across and within execution paths. Experiments show that StepCodeReasoner achieves state-of-the-art performance on code reasoning benchmarks, with its 7B model surpassing models like GPT-4o and a previous CodeReasoner baseline. AI

IMPACT This new method for code reasoning could lead to more reliable AI code generation and debugging tools.
TOOL · arXiv cs.CL · 1d

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely on external preference data, YFPO incorporates internal neuron activation patterns to guide the optimization process. By identifying neurons associated with mathematical concepts and logical reasoning, YFPO constructs an auxiliary reward signal that complements external supervision. Preliminary experiments on a small-scale model using the GSM8K benchmark indicate that this neuron-guided approach can potentially improve reasoning performance and offers a more interpretable path for model fine-tuning. AI

IMPACT Introduces a novel neuron-guided approach to LLM fine-tuning, potentially improving mathematical reasoning and interpretability.
TOOL · Medium — Anthropic tag · 1d

Anthropic built an AI so powerful they refused to release it.

Anthropic developed an AI model with advanced capabilities that they chose not to release due to safety concerns. This AI demonstrated its power by discovering a 27-year-old security vulnerability within the OpenBSD operating system. The decision to withhold the model highlights Anthropic's commitment to responsible AI development and deployment. AI

IMPACT Highlights the potential for advanced AI to uncover security vulnerabilities, influencing AI safety and responsible release strategies.
- Anthropic
- OpenBSD
TOOL · arXiv cs.CL · 1d

More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Researchers have developed a theoretical framework to understand Lifelong Normalization (LN), a key strategy for continuously updating Large Language Models without causing catastrophic forgetting or model collapse. Their analysis reveals that LN creates a self-reinforcing stability loop, ensuring parameter updates are orthogonal and bounded, which directly combats forgetting. Building on this, they introduce StableEdit, a method that enhances this stability through an explicit warm-up stage and full whitening, demonstrating improved long-horizon stability with minimal overhead. AI

IMPACT Provides theoretical grounding and a new method for stable, continuous LLM updates, potentially improving model maintainability.
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 1d

OpenAI's former CTO's startup model debuts, clashes with MiniMax

Former OpenAI researcher Lilian Weng's new venture, Thinking Machines Lab (TML), has unveiled a vision for full-duplex, real-time conversational AI. This concept closely mirrors the capabilities demonstrated by China's MiniCPM-o 4.5, which was open-sourced by company "面壁智能" (OpenBMB) three months prior. Both TML and "面壁智能" aim to break away from traditional turn-based AI interactions, proposing a "full-duplex" or "time-aligned micro-turn" framework that processes interleaved multimodal information streams. AI

IMPACT Confirms a shift towards full-duplex, real-time conversational AI, potentially accelerating the development of more natural human-AI interactions.
TOOL · arXiv cs.CL · 2d

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

Researchers have developed a new framework called Macro to improve the generation of counterfactual explanations for large language models across multiple languages. This preference alignment framework uses Direct Preference Optimization (DPO) to balance the trade-off between explanation validity and minimality, which has been a challenge for non-English languages. Experiments across seven languages demonstrated that Macro significantly enhances the validity of explanations without sacrificing minimality, outperforming both chain-of-thought and supervised fine-tuning baselines. AI

IMPACT Enhances the interpretability and trustworthiness of LLMs in multilingual contexts, potentially improving user trust and debugging capabilities.
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 2d

CVPR 2026 3D Vision Frontiers: Models are Learning to Understand, Generate, and Build the World

Researchers are pushing the boundaries of 3D vision, moving beyond simple reconstruction to focus on spatial understanding, dynamic simulation, and practical engineering applications. New methods are enabling models to learn geometric relationships without explicit 3D labels, directly extract 3D-aware features for real-time synthesis, and generate dynamic 4D scenes with physical consistency. These advancements aim to equip AI with a deeper comprehension of the world, enabling it to model not just appearances but also spatial structures and physical behaviors. AI

IMPACT These 3D vision advancements could lead to more immersive virtual environments, improved robotics perception, and more realistic content generation.
TOOL · TechCrunch AI · 1d · [2 sources]

Google adds Gemini-powered Dictation to Gboard, which could be bad news for dictation startups

Google has introduced a new AI-powered dictation feature called Rambler for its Gboard Android keyboard app. Leveraging Gemini-based multilingual models, Rambler can transcribe speech to text, remove filler words, and handle mid-sentence language switching. This integration into Gboard, the default keyboard for many Android users, poses a significant competitive challenge to existing third-party dictation startups. AI

IMPACT Accelerates adoption of advanced AI dictation by integrating it into a default mobile keyboard, pressuring specialized dictation apps.
- Google
- Gboard
- Rambler
- Gemini
- Wispr Flow
- Typeless
- Android
- Samsung Galaxy
- Google Pixel
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 2d

He Kaiming Team Paper Panorama Scan: A Multi-Angle Breakthrough on "Generative Paradigm" | CVPR 2026

He Kai Ming's team has published several papers challenging the dominance of diffusion models in image generation, proposing flow matching as a more efficient alternative. Their work introduces methods like JiT, which directly predicts clean images instead of noise, achieving competitive FID scores without distillation. Additionally, their VARC model demonstrates that visual reasoning tasks, like the ARC benchmark, can be solved effectively by pure vision models without relying on language understanding, matching human performance with significantly fewer parameters. AI

IMPACT These advancements in flow matching and direct image prediction could lead to significantly faster and more efficient AI image generation, while pure vision models for reasoning tasks may reduce reliance on large language models.
- He Kai Ming
- flow matching
- diffusion models
- JiT
- VARC
- ARC
- CVPR
- ImageNet
- GPT-4
- Claude
- Deepseek
- BiFlow
- iMF
- MeanFlow
TOOL · Hacker News — AI stories ≥50 points · 2d

Interaction Models

Thinking Machines has introduced a research preview of interaction models designed for native, real-time collaboration. These models process audio, video, and text simultaneously, allowing for continuous thought, response, and action. This approach aims to overcome the limitations of current turn-based AI interfaces, enabling a more natural and fluid human-AI partnership that mirrors human-to-human interaction. AI

IMPACT Introduces a new paradigm for human-AI collaboration, potentially improving efficiency and user experience in AI applications.
TOOL · arXiv cs.CV · 2d

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

Researchers have developed a new method called Super-Linear Advantage Shaping (SLAS) to improve text-to-image models trained with reinforcement learning. This technique addresses reward hacking by reshaping the policy space using an information geometry perspective, amplifying informative updates while suppressing noisy ones. SLAS demonstrates superior performance over existing methods like DanceGRPO, leading to faster training, better out-of-domain generation, and increased robustness to model scaling. AI

IMPACT Enhances text-to-image model training by mitigating reward hacking and improving generation quality.
TOOL · arXiv cs.CV (TL) · 2d

Count Anything at Any Granularity

Researchers have introduced a new framework for open-world object counting, addressing the brittleness of current vision-language models in accurately identifying and counting objects based on user intent. They propose redefining counting as a multi-grained problem, where both visual examples and detailed text prompts, including negative prompts, specify the target appearance and semantic granularity. To overcome the data limitations for this approach, they developed an automated pipeline using 3D synthesis and VLM filtering to create KubriCount, the largest dataset for counting tasks. Their new model, HieraCount, leverages both text and visual exemplars to significantly improve multi-grained counting accuracy and generalize to real-world scenarios. AI

IMPACT Introduces a more robust method for object counting, potentially improving applications that rely on visual scene understanding and quantification.
TOOL · arXiv cs.CL · 2d

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

Researchers have introduced Directional-Groupwise Preference Optimization (DGPO), a new framework designed to improve the alignment and reasoning diversity of large language models. DGPO aggregates supervision signals at the group level, using multi-candidate comparisons to explicitly model direction-aware alignment. By organizing question-answer instances into structured sets and optimizing a margin-based objective, DGPO aims to differentiate coherent reasoning paths from inconsistent ones. Experiments show that this approach can lead to significant accuracy improvements across various benchmarks and model families. AI

IMPACT Introduces a novel optimization technique that could lead to more capable and consistent large language models.
- DGPO
- Large Language Models
TOOL · arXiv cs.CL · 2d

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. This system allows agents to reuse intermediate visual information from search results and dynamically refines training data based on the agent's current learning progress. ODE enhances agent performance across various benchmarks, with significant improvements shown for Qwen3-VL models, surpassing Gemini-2.5 Pro in complex agent-workflow settings. AI

IMPACT Enhances multimodal search agent capabilities by enabling better data evolution and visual context reuse, potentially improving performance on complex tasks.
TOOL · arXiv cs.AI · 2d

CLEF: EEG Foundation Model for Learning Clinical Semantics

Researchers have developed CLEF, a new foundation model designed for interpreting clinical electroencephalogram (EEG) data. Unlike previous models that focus on short EEG segments, CLEF can process entire EEG sessions and integrate signal patterns with clinical context. The model represents EEG data as 3D spectrogram tokens, allowing for efficient Transformer modeling, and is aligned with neurologist reports and electronic health records. CLEF significantly outperforms existing models on a broad benchmark of clinical tasks, demonstrating its potential for advancing clinical EEG analysis. AI

IMPACT Advances clinical EEG interpretation by enabling analysis of full sessions with integrated clinical context.
- CLEF
- EEG
- Transformer
TOOL · arXiv cs.AI · 2d

Probing Cross-modal Information Hubs in Audio-Visual LLMs

Researchers have investigated the internal mechanisms of audio-visual large language models (AVLLMs), focusing on how information flows between audio and visual modalities. Their analysis revealed that AVLLMs predominantly store integrated audio-visual information in specific 'sink tokens'. Furthermore, a subset of these sink tokens, termed 'cross-modal sink tokens', are specialized for holding this cross-modal information. Based on these findings, the paper proposes a new method to mitigate hallucination by leveraging the integrated information within these specialized tokens. AI

IMPACT Identifies specialized tokens for cross-modal information in AVLLMs, potentially improving model reliability and reducing hallucinations.
- AVLLMs
- arXiv
TOOL · Hacker News — AI stories ≥50 points · 2d

Interfaze: A new model architecture built for high accuracy at scale

Interfaze has introduced a new model architecture designed for high accuracy and efficiency on deterministic tasks. This architecture reportedly outperforms leading models such as Gemini-3-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3 across nine benchmarks covering OCR, vision, speech-to-text, and structured output. Interfaze aims to specialize in these specific tasks, offering a cost-effective and high-performance alternative to generalist large language models for high-volume applications. AI

IMPACT Offers a specialized, cost-effective alternative for deterministic AI tasks, potentially reducing reliance on generalist LLMs for high-volume applications.
TOOL · arXiv cs.LG · 2d

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Researchers have developed MASS-DPO, a new method for Direct Preference Optimization (DPO) that efficiently selects informative negative samples for training language models. This approach uses a PL-specific Fisher-information objective to identify compact subsets of negative responses that provide complementary information, reducing redundancy from similar candidates. Experiments across recommendation and multiple-choice QA benchmarks demonstrate that MASS-DPO achieves comparable or superior accuracy with significantly fewer negative samples, improving optimization dynamics and alignment. AI

IMPACT Enhances language model training efficiency by reducing redundant data, potentially leading to faster and more accurate model development.
TOOL · arXiv cs.CV · 2d

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning

Researchers have developed DRAPE, a novel framework for Multimodal Continual Instruction Tuning (MCIT) that generates instance-specific soft prompts for multimodal large language models. Unlike existing methods that rely on task-level prompts, DRAPE synthesizes continuous prompts tailored to individual query-image pairs by conditioning on both textual instructions and visual features. The framework also incorporates techniques like null-space gradient projection and CLIP-based prototype routing to prevent catastrophic forgetting during sequential task acquisition, achieving state-of-the-art results on MCIT benchmarks. AI

IMPACT Introduces a new method for adapting multimodal LLMs to new tasks without forgetting previous capabilities, potentially improving their real-world deployment.
TOOL · arXiv cs.AI · 2d

Provable Sparse Inversion and Token Relabel Enhanced One-shot Federated Learning with ViTs

Researchers have developed a new framework called FedMITR to improve one-shot federated learning, particularly in scenarios with highly non-independent and identically distributed (non-IID) data. This method addresses the issue of low-quality synthetic data generated by existing approaches by employing sparse model inversion to focus on meaningful image patches and avoid background noise. Additionally, FedMITR uses a token relabeling strategy for Vision Transformers (ViTs) to enhance prediction robustness by distinguishing between high and low information density patches. AI

IMPACT Introduces a novel framework to improve federated learning performance in challenging non-IID data scenarios, potentially enhancing privacy-preserving model training.
- FedMITR
- ViTs
- arXiv
TOOL · arXiv cs.CV · 2d

Qwen-Image-2.0 Technical Report

Alibaba's Qwen-Image-2.0 is a new foundation model designed for both high-fidelity image generation and precise editing within a single framework. It addresses limitations in existing models concerning ultra-long text rendering, multilingual typography, photorealism, and instruction following. The model utilizes Qwen3-VL as a condition encoder and a Multimodal Diffusion Transformer, trained on extensive data, to achieve improved multimodal understanding and flexible generation capabilities. AI

IMPACT Enhances capabilities in text-rich image generation and multilingual typography, potentially improving tools for content creation.
TOOL · arXiv stat.ML · 2d

What should post-training optimize? A test-time scaling law perspective

Researchers have developed new post-training objectives for large language models that optimize for the best-of-N performance, rather than just the average reward. This is crucial because current deployment strategies involve sampling multiple responses and selecting the best one, a process that standard training objectives do not adequately address. The proposed Tail-Extrapolated (TEA) estimators and Prefix-TEA can approximate the best-of-N objective using significantly fewer per-prompt rollouts during training than would be required for deployment, showing improved performance on instruction-following tasks. AI

IMPACT Improves LLM deployment by optimizing for top-tier responses, potentially enhancing user experience and task success rates.
TOOL · arXiv cs.AI · 2d

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Researchers have developed a new method called Gated Cropped Attention-Delta steering (GCAD) to improve the reliability of controlling language model behavior. Standard activation steering can degrade performance in long conversations due to issues with the KV-cache. GCAD addresses this by extracting steering signals from self-attention mechanisms and applying them with token-level gating, significantly enhancing long-horizon coherence and trait expression in multi-turn dialogues. AI

IMPACT Improves control over LLM behavior in extended interactions, potentially leading to more coherent and controllable AI agents.
- GCAD
- language model
TOOL · arXiv cs.CV · 2d

bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition

Researchers have developed bViT, a novel Vision Transformer architecture that utilizes a single transformer block applied repeatedly for image recognition. This recurrent approach achieves accuracy comparable to standard ViTs on ImageNet-1K with significantly fewer parameters. The study suggests that a substantial portion of a ViT's depth can be achieved through recurrent computation, especially when the representation space is wide, enabling parameter-efficient fine-tuning for downstream tasks. AI

IMPACT Introduces a parameter-efficient architecture for vision transformers, potentially reducing computational costs for image recognition tasks.
TOOL · arXiv cs.LG · 2d

BCJR-QAT: A Differentiable Relaxation of Trellis-Coded Weight Quantization

Researchers have developed BCJR-QAT, a novel method for quantizing large language models to 2 bits per weight, a significant advancement beyond current post-training quantization techniques. This new approach uses a differentiable relaxation of the Viterbi algorithm, enabling quantization-aware training and achieving better perplexity scores on benchmarks like WikiText-2. The method has been demonstrated to improve performance on models such as Llama-3.2-1B, outperforming existing methods by a notable margin. AI

IMPACT Enables more efficient LLM deployment by reducing model size and computational requirements.
TOOL · arXiv cs.LG · 2d

A Random-Matrix Criterion for Initializing Gated Recurrent Neural Networks

Researchers have developed a new criterion for initializing weights in gated recurrent neural networks, crucial for the performance of reservoir computing models. This criterion, derived from random-matrix theory, helps identify an effective critical point that separates ordered and chaotic phases in randomly initialized models. The method closely tracks the optimal gain for gated RNNs on forecasting tasks and could inform future initialization strategies. AI

IMPACT Provides a new theoretical framework for improving the training and performance of recurrent neural networks.
- Tommaso Fioratti
- Gated Recurrent Neural Networks
TOOL · arXiv cs.CL · 2d

Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining

Researchers have identified a pretraining failure mode in language models where upper layers prematurely specialize their attention patterns before lower layers have stabilized. This "premature upper-layer attention specialization" can be mitigated by temporarily slowing the Q/K projections in these upper layers during early training. This intervention improves final perplexity and downstream accuracy without changing other model parameters, suggesting a critical interaction between decoder architecture and optimization. AI

IMPACT Identifies a specific architectural and optimization flaw in decoder-based language models that can be addressed to improve performance.
- LLaMA
- GPT
TOOL · Towards AI · 2d

I Took a 397MB Model and Turned It Into a Customer Service Chatbot That Actually Works

A developer successfully transformed a small, 397MB Qwen2.5–0.5B model into a functional customer service chatbot. This involved fine-tuning the model on specific company data using the LoRA technique, enabling it to provide accurate and contextually relevant responses. The resulting chatbot was integrated into a real company's workflow, addressing customer inquiries about orders, returns, and product compatibility with the company's specific policies and tone. AI

IMPACT Demonstrates the viability of using highly efficient, fine-tuned small models for specialized business applications, potentially reducing costs and increasing accessibility.
TOOL · arXiv cs.CV · 2d

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

Researchers have investigated the impact of temporal sampling frequency on end-to-end autonomous driving trajectory prediction models. They found that while dense frame sampling is often assumed to improve performance, this is not always the case. Smaller models often perform best with lower or intermediate sampling frequencies, suggesting that dense sampling can introduce redundant information and noise that burdens models with limited capacity. Larger, vision-language-model-style architectures, however, continued to improve performance even at the highest tested sampling frequencies. AI

IMPACT Optimizing training data sampling for autonomous driving models can improve efficiency and performance, particularly for smaller architectures.
- Waymo
- nuScenes
- PAVE
- AutoVLA
TOOL · arXiv cs.CL · 2d

How Mobile World Model Guides GUI Agents?

Researchers have developed a novel approach to enhance mobile GUI agents by training world models across four modalities: delta text, full text, diffusion-based images, and renderable code. These models achieved state-of-the-art performance on relevant benchmarks, demonstrating the utility of different representations for predicting action consequences. The study found that while renderable code offers high fidelity for data construction, text-based feedback is more robust for online execution, and generated trajectories can improve agent performance despite distribution shifts. AI

IMPACT Introduces a new framework for training mobile GUI agents, potentially improving their ability to predict action consequences and perform complex tasks.
TOOL · arXiv cs.AI Română(RO) · 3d

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation

Researchers have developed Yeti, a novel protein structure tokenizer designed for multimodal AI models. Unlike previous methods that prioritize reconstruction, Yeti uses a lookup-free quantization approach trained with a flow matching objective, enabling both accurate reconstruction and effective generation of protein sequences and structures. This compact tokenizer, with significantly fewer parameters than existing models, facilitates the training of efficient multimodal models capable of co-generating plausible protein designs. AI

IMPACT Enables more efficient and effective AI-driven design of novel proteins with specific functional properties.
- Yeti
- AI
- protein structure
- multimodal models
- ESM3
TOOL · arXiv cs.AI · 3d

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Researchers have developed GLiNER2-PII, a compact 0.3 billion parameter model designed for multilingual personally identifiable information (PII) extraction. This model, adapted from GLiNER2, can identify 42 different types of PII at the character-span level. To overcome data scarcity and privacy concerns, a synthetic multilingual corpus was created using a constraint-driven generation pipeline. GLiNER2-PII demonstrated superior performance on the SPY benchmark compared to other systems, including OpenAI's Privacy Filter, and has been released on Hugging Face. AI

IMPACT This new model offers improved multilingual PII detection, potentially enhancing data privacy and security in various applications.
TOOL · arXiv cs.CL · 3d

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

A new research paper proposes mean pooling of hidden states from generated tokens as a superior method for capturing a language model's internal state. This approach, which aggregates information distributed across multiple tokens, yields more semantically rich representations than using individual tokens alone. The study demonstrates that representations derived from generated tokens outperform those from prompt tokens, offering insights into model behavior dynamics. AI

IMPACT This research could lead to more accurate and interpretable internal representations of language models, potentially improving downstream applications.
- arXiv
- language model
TOOL · Engadget · 18h

Sony's A7R VI blends speed with a 67MP stacked sensor

Sony has unveiled its new A7R VI camera, featuring a 67-megapixel stacked sensor that significantly boosts speed and reduces rolling shutter distortion. This high-resolution camera now offers blackout-free RAW burst shooting at 30 fps and improved autofocus with human pose estimation. While it enhances video capabilities with 8K and 4K recording options, it notably omits RAW and ProRes video modes found in competing models. AI

IMPACT Minimal direct impact for AI operators, as this is a camera hardware release.
TOOL · Mastodon — fosstodon.org 한국어(KO) · 1d

Announcement that StepFun's Step 3.5 Flash is available for free again for the next 15 days on Nous Research (@NousResearch) Nous Portal. This is an update on the limited free offering of AI models, useful for expanding model accessibility and user testing.

Nous Research is offering free access to StepFun's Step 3.5 Flash model for the next 15 days through the Nous Portal. This limited-time promotion aims to increase accessibility and facilitate user testing of the AI model. AI

IMPACT Provides a temporary opportunity for users to test and evaluate the Step 3.5 Flash model.
TOOL · 36氪 (36Kr) 中文(ZH) · 2d

NetEase Cloud Music fully integrates DeepSeek-V4

NetEase Cloud Music has integrated DeepSeek-V4 to enhance user experience across various features, including music discovery and community interactions. This move signifies a broader trend of companies adopting advanced AI models to improve their services. The integration aims to boost personalized experiences and creative tools within the music platform. AI

IMPACT Enhances user experience and personalization for a music streaming service through AI integration.
- NetEase Cloud Music
- DeepSeek-V4
TOOL · dev.to — Claude Code tag · 2d

Claude Code Agent View Just Launched: What It Does and How to Use It

Anthropic has released an update for Claude Code, introducing an agent view and a /goal command. The agent view provides a centralized dashboard to manage multiple Claude Code sessions simultaneously, akin to a tmux interface for code generation tasks. The /goal command allows Claude to autonomously work on a task until a specified completion condition is met, reducing the need for constant human intervention. AI

IMPACT Enhances developer productivity by streamlining management of multiple code generation sessions and enabling autonomous task completion.
- Anthropic
- Claude Code
TOOL · dev.to — LLM tag · 2d

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

A new benchmark test is scheduled to evaluate ten previously untested large language models, including DeepSeek V4 Pro, Grok 4.20, and GPT-5.5 Pro. The tests will focus on real-world agent coding tasks using a consistent methodology and scoring system. Results will be made available immediately after the benchmark run. AI

IMPACT New benchmark results will provide insights into the capabilities of several new LLMs, informing future development and adoption.
TOOL · arXiv cs.CV · 2d

Masked Generative Transformer Is What You Need for Image Editing

Researchers have introduced EditMGT, a novel image editing framework utilizing Masked Generative Transformers (MGTs) as an alternative to dominant diffusion models. This MGT-based approach offers localized token prediction, confining edits to intended regions and preventing unintended changes to surrounding context. EditMGT demonstrates state-of-the-art image similarity on benchmarks and achieves six times faster editing speeds compared to diffusion models, despite having fewer parameters. AI

IMPACT Masked Generative Transformers provide a faster and more localized alternative to diffusion models for image editing tasks.
TOOL · arXiv cs.AI · 2d

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Researchers have developed Clin-JEPA, a novel framework for joint-embedding predictive pretraining specifically designed for electronic health record (EHR) patient trajectories. This method addresses challenges in applying JEPA architectures to healthcare data, aiming to create a single model that can both forecast patient health progression and perform various risk-prediction tasks without task-specific fine-tuning. Clin-JEPA utilizes a five-phase pretraining curriculum to ensure stable co-training of its encoder and predictor components, demonstrating improved performance on EHR data by learning a clinically relevant latent space and outperforming baseline models on downstream risk prediction tasks. AI

IMPACT This framework could lead to more accurate patient trajectory forecasting and improved risk prediction in clinical settings.
- Clin-JEPA
- EHR
- Qwen3-8B
- MIMIC-IV
TOOL · arXiv cs.CL · 2d

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Researchers have developed a new method called RLRT, which reverses the typical self-distillation process in large language models. Instead of a teacher model guiding a student, RLRT identifies and reinforces the student's own successful reasoning paths that deviate from the teacher's predictions. This approach, tested on Qwen3 checkpoints, significantly improves performance over standard self-distillation and exploration techniques by enabling more principled exploration. AI

IMPACT Enhances LLM reasoning capabilities by enabling more principled exploration and self-driven success paths.
- RLRT
- Qwen3
- RLVR
TOOL · arXiv cs.CV · 2d

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio

Researchers have developed a new method called DRoRAE (Depth-Routed Representation AutoEncoder) to improve visual tokenization by fusing features from multiple layers of a frozen pretrained vision encoder. Existing methods typically only use the last layer, discarding valuable hierarchical information. DRoRAE employs a lightweight fusion module that adaptively aggregates features from all encoder layers, leading to significantly better reconstruction and generation quality on datasets like ImageNet-256. This approach also demonstrates a predictable scaling law between fusion capacity and reconstruction quality, suggesting a new dimension for enhancing visual tokenizers. AI

IMPACT Improves visual tokenization quality and introduces a scalable dimension for future visual tokenizer development.
- DRoRAE
- ImageNet-256
TOOL · arXiv cs.CV · 2d

MPerS: Dynamic MLLM MixExperts Perception-Guided Remote Sensing Scene Segmentation

Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images using multiple MLLMs, allowing for perception from diverse expert viewpoints. The system adaptively integrates these textual semantics with visual features extracted by DINOv3, guiding the segmentation process for improved accuracy on public datasets. AI

IMPACT Introduces a new method for improving remote sensing scene segmentation by integrating multimodal LLMs and expert-guided captioning.
- MPerS
- MLLM
- LLaVA
- ChatGPT
- Qwen
- DINOv3
TOOL · arXiv cs.LG · 2d

AdaPaD: Adaptive Parallel Deflation for PEFT with Self-Correcting Rank Discovery

Researchers have introduced AdaPaD, a novel method for efficiently fine-tuning large language models using Parameter-Efficient Fine-Tuning (PEFT). AdaPaD trains all rank-1 components simultaneously, with each component refining against a deflation target that self-corrects as estimates from other components improve. This approach leads to exponentially decaying error and allows for dynamic rank discovery, making the rank distribution an output rather than a fixed input. AI

IMPACT AdaPaD offers a more efficient approach to fine-tuning LLMs, potentially reducing computational costs and enabling smaller adapter sizes.
- AdaPaD
- PEFT
- LoRA
- DeBERTaV3-base
- Qwen3-0.6B