ENTITY mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

100

100 over 90d

Releases · 30d

0 over 90d

Papers · 30d

76 over 90d

TIER MIX · 90D

frontier release 8
significant 3
research 36
tool 49
commentary 4

TOPICS

paper 76
model release 56
infra 32
product 19
other 16
safety 7
funding 1

RELATIONSHIPS

instance of Mixture of Experts (MoE) 95%
instance of Emo 95%
instance of arXiv 90%
used by large-language models 90%
instance of Innu-aimun 90%
used by SGLang 90%
instance of DeepSeek V4-Flash 90%
uses large-language models 80%
instance of large-language models 70%
instance of transformers 70%
instance of LLM 70%
used by LLM 70%

TIMELINE

2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 3/5 · 100 TOTAL

RESEARCH · CL_41759 · May 20 · 10:14

New tool DODOCO reveals flaws in MoE model dispatch benchmarks

A new research paper introduces DODOCO, a tool designed to diagnose overhead in dispatch operations for Mixture-of-Experts (MoE) models. The study found that common assumptions about workload representation in benchmark…
TOOL · CL_41905 · May 20 · 08:31

New HDMoE framework enhances cancer survival prediction with multimodal data

Researchers have developed a new framework called HDMoE to improve multimodal cancer survival prediction. This hierarchical decoupling-fusion mixture-of-experts approach aims to better integrate data from sources like w…
RESEARCH · CL_41793 · May 20 · 03:45

Dynamic TMoE framework improves time series forecasting with adaptive experts

Researchers have developed Dynamic TMoE, a novel framework designed to improve non-stationary time series forecasting. This approach addresses the limitations of existing Mixture-of-Experts (MoE) models by dynamically a…
RESEARCH · CL_41804 · May 20 · 01:55

Vision MoE models show stable animate-inanimate expert specialization

Researchers have developed new methods to analyze the internal workings of Mixture-of-Experts (MoE) models in computer vision. Their work moves beyond simply examining how data is routed to specific "experts" within the…
TOOL · CL_41191 · May 19 · 02:53

New MoE framework enhances brain decoding with network-aware experts

Researchers have developed FPED, a novel Mixture-of-Experts (MoE) framework designed for interpretable brain decoding using fMRI data. This approach explicitly models different functional brain networks as specialized e…
FRONTIER RELEASE · CL_33854 · May 15 · 23:00

DeepSeek V4 debuts with MegaMoE optimizations for efficient MoE

DeepSeek has released its V4 model, featuring significant optimizations through a new system called MegaMoE. This system utilizes a 1400-line fused CUDA kernel to enhance performance by fine-grained pipelining of commun…
RESEARCH · CL_36345 · May 14 · 20:39

New $\phi$-balancing framework improves MoE model training

Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-leve…
RESEARCH · CL_32718 · May 14 · 02:48

MetaMoE unifies private MoE models using public proxy data

Researchers have introduced MetaMoE, a novel framework designed to unify independently trained Mixture-of-Experts (MoE) models without requiring access to private client data. The system utilizes public proxy data to ap…
COMMENTARY · CL_29758 · May 13 · 09:03

MoE architectures are workarounds for LLM training instability, not ideal solutions

Mixture-of-Experts (MoE) architectures are often presented as an efficient solution for scaling large language models, but this analysis argues they are primarily a workaround for training instability in dense transform…
RESEARCH · CL_28307 · May 11 · 17:58

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and expe…
TOOL · CL_27710 · May 11 · 10:33

New MoE framework speeds up time series forecasting training

Researchers have developed a new Mixture-of-Experts (MoE) framework designed to accelerate the training of time series forecasting models. This method integrates expert-specific loss information directly into the traini…
RESEARCH · CL_25314 · May 10 · 18:50

EMO AI Model Achieves High Performance with Minimal Experts

Researchers from the Allen Institute for AI and UC Berkeley have developed a new Mixture-of-Experts (MoE) model architecture named EMO. This model achieves nearly full performance while utilizing only 12.5% of its avail…
SIGNIFICANT · CL_23645 · May 9 · 00:10

DeepSeek releases open-source coding model matching GPT-4o

DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
RESEARCH · CL_25612 · May 8 · 13:08

New research explores speculative decoding for faster LLM inference

Multiple research papers published on arXiv explore advancements in speculative decoding for Large Language Models (LLMs). These studies focus on improving inference speed and efficiency by using a smaller "draft" model…
TOOL · CL_25610 · May 8 · 05:26

MoE models misroute tokens on complex reasoning tasks, study finds

Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard route…
TOOL · CL_22046 · May 8 · 04:00

New MoE inference design uses pooled HBM to cut communication latency on Ascend

Researchers have developed a new communication design for Mixture-of-Experts (MoE) inference on Ascend systems, aiming to reduce bottlenecks in token exchange. This approach eliminates intermediate relay and reordering …
TOOL · CL_21909 · May 8 · 04:00

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

Researchers have developed Graph Normalization (GN), a novel dynamical system that approximates the NP-hard Maximum Weight Independent Set (MWIS) problem. GN offers a principled and differentiable approach, converging t…
TOOL · CL_21907 · May 8 · 04:00

New research explores finite expert banks for communication-efficient MoE architectures

Researchers have developed a new framework for analyzing sparse Mixture-of-Experts (MoE) architectures, focusing on communication efficiency. They propose treating the MoE gate as a stochastic channel and quantifying ro…
RESEARCH · CL_22189 · May 7 · 17:59

EMO model enables modularity in large language models with selective expert use

Researchers have developed EMO, a novel Mixture-of-Experts (MoE) model designed for emergent modularity. Unlike traditional monolithic large language models, EMO activates only specific subsets of its parameters for dif…
RESEARCH · CL_21995 · May 7 · 15:45

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts

Researchers have developed a new method called Scene-Adaptive Mixture of Experts with Clustered Specialists (SAMoE-C) to improve human activity recognition using channel state information (CSI). This approach addresses …

New tool DODOCO reveals flaws in MoE model dispatch benchmarks

New HDMoE framework enhances cancer survival prediction with multimodal data

Dynamic TMoE framework improves time series forecasting with adaptive experts

Vision MoE models show stable animate-inanimate expert specialization

New MoE framework enhances brain decoding with network-aware experts

DeepSeek V4 debuts with MegaMoE optimizations for efficient MoE

New $\phi$-balancing framework improves MoE model training

MetaMoE unifies private MoE models using public proxy data

MoE architectures are workarounds for LLM training instability, not ideal solutions

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

New MoE framework speeds up time series forecasting training

EMO AI Model Achieves High Performance with Minimal Experts

DeepSeek releases open-source coding model matching GPT-4o

New research explores speculative decoding for faster LLM inference

MoE models misroute tokens on complex reasoning tasks, study finds

New MoE inference design uses pooled HBM to cut communication latency on Ascend

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

New research explores finite expert banks for communication-efficient MoE architectures

EMO model enables modularity in large language models with selective expert use

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts