mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source
4 day(s) with sentiment data
-
MoE architectures are workarounds for LLM training instability, not ideal solutions
Mixture-of-Experts (MoE) architectures are often presented as an efficient solution for scaling large language models, but this analysis argues they are primarily a workaround for training instability in dense transform…
-
New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling
Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and expe…
-
New MoE framework speeds up time series forecasting training
Researchers have developed a new Mixture-of-Experts (MoE) framework designed to accelerate the training of time series forecasting models. This method integrates expert-specific loss information directly into the traini…
-
UC Berkeley and AI2 propose EMO for emergent modularity in MoE models
Researchers from UC Berkeley and the Allen Institute for AI have introduced EMO, a method that encourages emergent modularity in Mixture of Experts (MoE) models through pre-training. This approach investigates how struc…
-
DeepSeek releases open-source coding model matching GPT-4o
DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
-
MoE models misroute tokens on complex reasoning tasks, study finds
Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard route…
-
Graph Normalization offers differentiable approximation for NP-hard MWIS problem
Researchers have developed Graph Normalization (GN), a novel dynamical system that approximates the NP-hard Maximum Weight Independent Set (MWIS) problem. GN offers a principled and differentiable approach, converging t…
-
EMO model enables modularity in large language models with selective expert use
Researchers have developed EMO, a novel Mixture-of-Experts (MoE) model designed for emergent modularity. Unlike traditional monolithic large language models, EMO activates only specific subsets of its parameters for dif…
-
New MoE inference design uses pooled HBM to cut communication latency on Ascend
Researchers have developed a new communication design for Mixture-of-Experts (MoE) inference on Ascend systems, aiming to reduce bottlenecks in token exchange. This approach eliminates intermediate relay and reordering …
-
New SAMoE-C method improves CSI-based HAR with scene-adaptive experts
Researchers have developed a new method called Scene-Adaptive Mixture of Experts with Clustered Specialists (SAMoE-C) to improve human activity recognition using channel state information (CSI). This approach addresses …
-
New research explores finite expert banks for communication-efficient MoE architectures
Researchers have developed a new framework for analyzing sparse Mixture-of-Experts (MoE) architectures, focusing on communication efficiency. They propose treating the MoE gate as a stochastic channel and quantifying ro…
-
New parameter E predicts Mixture-of-Experts model health, preventing dead experts.
Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
-
Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals
Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
-
LAWS architecture offers self-certifying inference caching for LLMs and robotics
Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
-
Tropical geometry reveals sparsity is combinatorial depth in MoE models
A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
-
MoLF model predicts pan-cancer gene expression from histology images
Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
-
Geometry-aware model advances whole-slide image analysis in computational pathology
Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…
-
NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention
NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…
-
SMoE paper proposes expert substitution for efficient edge MoE deployment
Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…
-
Apple researchers unveil SpecMD for faster MoE model inference
Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …