ENTITY mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Total · 30d

11 over 90d

Releases · 30d

0 over 90d

Papers · 30d

8 over 90d

TIER MIX · 90D

frontier release 2
research 4
tool 5

RELATIONSHIPS

instance of arXiv 90%
instance of Innu-aimun 90%
instance of transformers 90%
used by transformer 70%
instance of transformer 70%
instance of LLM 70%
developed by Emo 70%
used by Emo 70%
affiliated with transformers 50%
other LLM 50%

TIMELINE

2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/3 · 49 TOTAL

COMMENTARY · CL_29758 · May 13 · 09:03

MoE architectures are workarounds for LLM training instability, not ideal solutions

Mixture-of-Experts (MoE) architectures are often presented as an efficient solution for scaling large language models, but this analysis argues they are primarily a workaround for training instability in dense transform…
RESEARCH · CL_28307 · May 11 · 17:58

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and expe…
TOOL · CL_27710 · May 11 · 10:33

New MoE framework speeds up time series forecasting training

Researchers have developed a new Mixture-of-Experts (MoE) framework designed to accelerate the training of time series forecasting models. This method integrates expert-specific loss information directly into the traini…
TOOL · CL_25314 · May 10 · 18:50

UC Berkeley and AI2 propose EMO for emergent modularity in MoE models

Researchers from UC Berkeley and the Allen Institute for AI have introduced EMO, a method that encourages emergent modularity in Mixture of Experts (MoE) models through pre-training. This approach investigates how struc…
SIGNIFICANT · CL_23645 · May 9 · 00:10

DeepSeek releases open-source coding model matching GPT-4o

DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
TOOL · CL_25610 · May 8 · 05:26

MoE models misroute tokens on complex reasoning tasks, study finds

Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard route…
TOOL · CL_21909 · May 8 · 04:00

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

Researchers have developed Graph Normalization (GN), a novel dynamical system that approximates the NP-hard Maximum Weight Independent Set (MWIS) problem. GN offers a principled and differentiable approach, converging t…
RESEARCH · CL_22189 · May 8 · 04:00

EMO model enables modularity in large language models with selective expert use

Researchers have developed EMO, a novel Mixture-of-Experts (MoE) model designed for emergent modularity. Unlike traditional monolithic large language models, EMO activates only specific subsets of its parameters for dif…
TOOL · CL_22046 · May 8 · 04:00

New MoE inference design uses pooled HBM to cut communication latency on Ascend

Researchers have developed a new communication design for Mixture-of-Experts (MoE) inference on Ascend systems, aiming to reduce bottlenecks in token exchange. This approach eliminates intermediate relay and reordering …
RESEARCH · CL_21995 · May 8 · 04:00

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts

Researchers have developed a new method called Scene-Adaptive Mixture of Experts with Clustered Specialists (SAMoE-C) to improve human activity recognition using channel state information (CSI). This approach addresses …
TOOL · CL_21907 · May 8 · 04:00

New research explores finite expert banks for communication-efficient MoE architectures

Researchers have developed a new framework for analyzing sparse Mixture-of-Experts (MoE) architectures, focusing on communication efficiency. They propose treating the MoE gate as a stochastic channel and quantifying ro…
RESEARCH · CL_21794 · May 7 · 15:23

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
TOOL · CL_20870 · May 7 · 05:44

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
TOOL · CL_20383 · May 7 · 04:00

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
TOOL · CL_20549 · May 7 · 04:00

Tropical geometry reveals sparsity is combinatorial depth in MoE models

A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
TOOL · CL_20547 · May 7 · 04:00

MoLF model predicts pan-cancer gene expression from histology images

Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
RESEARCH · CL_20274 · May 6 · 17:33

Geometry-aware model advances whole-slide image analysis in computational pathology

Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…
RESEARCH · CL_18472 · May 6 · 04:00

NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention

NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…
TOOL · CL_18630 · May 6 · 04:00

SMoE paper proposes expert substitution for efficient edge MoE deployment

Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…
TOOL · CL_20119 · May 6 · 00:00

Apple researchers unveil SpecMD for faster MoE model inference

Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …

MoE architectures are workarounds for LLM training instability, not ideal solutions

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

New MoE framework speeds up time series forecasting training

UC Berkeley and AI2 propose EMO for emergent modularity in MoE models

DeepSeek releases open-source coding model matching GPT-4o

MoE models misroute tokens on complex reasoning tasks, study finds

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

EMO model enables modularity in large language models with selective expert use

New MoE inference design uses pooled HBM to cut communication latency on Ascend

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts

New research explores finite expert banks for communication-efficient MoE architectures

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Tropical geometry reveals sparsity is combinatorial depth in MoE models

MoLF model predicts pan-cancer gene expression from histology images

Geometry-aware model advances whole-slide image analysis in computational pathology

NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention

SMoE paper proposes expert substitution for efficient edge MoE deployment

Apple researchers unveil SpecMD for faster MoE model inference