GPT-2

ENTITY GPT-2

GPT-2

PulseAugur coverage of GPT-2 — every cluster mentioning GPT-2 across labs, papers, and developer communities, ranked by signal.

Total · 30d

34

34 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

27

27 over 90d

TIER MIX · 90D

frontier release 2
significant 2
research 9
tool 17
commentary 4

RELATIONSHIPS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/2 · 32 TOTAL

RESEARCH · CL_29321 · May 12 · 03:45

FibQuant method offers significant KV-cache compression for LLMs

Researchers have developed FibQuant, a novel vector quantization method designed to significantly compress the key-value (KV) cache used in large language models. This technique aims to reduce the memory traffic associa…
TOOL · CL_28332 · May 11 · 17:41

New method offers formal guarantees for LLM safety classifiers

Researchers have developed a new method to formally verify the safety of Large Language Model (LLM) guardrail classifiers, moving beyond traditional red-teaming. This approach shifts verification from the discrete input…
TOOL · CL_27538 · May 11 · 05:07

New research links optimizers to mode connectivity in neural networks

Researchers have explored the role of optimizers in mode connectivity within neural networks, a concept previously underexplored. Their work demonstrates that solutions generated by a single optimizer, such as AdamW or …
TOOL · CL_27563 · May 11 · 05:01

New theory tackles bandwidth limits for distributed language models

Researchers have developed new theoretical frameworks for training and calibrating language models in distributed settings with limited bandwidth. The Federated Probe-Logit Distillation (FPLD) protocol offers a statisti…
TOOL · CL_25995 · May 11 · 04:00

New theory reveals optimal learning rate schedules for deep learning

Researchers have developed a theoretical framework for optimal learning rate schedules in deep learning, specifically analyzing a random feature model trained with stochastic gradient descent. The study identifies two d…
COMMENTARY · CL_29716 · May 9 · 07:05

LLMs show mixed results on loneliness, but excel at detecting attraction

Large Language Models (LLMs) show mixed results in combating human loneliness, with some research being misinterpreted by media headlines. While LLMs like ChatGPT and Claude can offer accessible, 24/7 mental health supp…
TOOL · CL_21984 · May 8 · 04:00

Pro-KLShampoo optimizer improves LLM pre-training with spectral structure analysis

Researchers have developed Pro-KLShampoo, an optimization technique that combines gradient preconditioning with orthogonalization for more efficient LLM pre-training. This method leverages the observed spike-and-flat ei…
TOOL · CL_22155 · May 8 · 04:00

Small-scale models show bilingualism poses no challenge for language acquisition

Researchers have developed a method using language models to simulate multilingual language acquisition in children. By training GPT-2 models on controlled monolingual and bilingual datasets, they investigated how diffe…
RESEARCH · CL_29329 · May 7 · 17:32

SignSGD and Muon optimizers' performance gains theoretically explained

Researchers have theoretically analyzed why sign-based optimization algorithms like SignSGD and Muon can outperform standard SGD in training large models. A new study suggests that SignSGD's advantage stems from its eff…
TOOL · CL_18835 · May 6 · 04:00

New Polar Express method accelerates matrix decomposition for deep learning

Researchers have developed a new GPU-friendly algorithm called Polar Express for computing matrix decompositions, which is crucial for the Muon optimizer used in training deep neural networks. This method optimizes for …
RESEARCH · CL_15913 · May 5 · 04:00

Researchers explore weight decay, in-context learning, and acceleration for Transformer models

Researchers have developed several new methods to improve the efficiency and theoretical understanding of Transformer models. One paper provides a functional-analytic characterization of weight decay, demonstrating its …
TOOL · CL_15713 · May 5 · 04:00

Researchers develop parametric memory network for efficient token communication in wireless transmission

Researchers have developed an evolving semantic token communication system using a parametric memory network designed for MIMO fading channels. This system transmits only a prefix of each semantic token to reduce overhe…
TOOL · CL_16180 · May 5 · 04:00

LLMs achieve real-time text transmission via entropy coding

Researchers have explored the connection between learning, prediction, and compression for real-time text transmission using LLM-based entropy coding. They analyzed the trade-off between compression efficiency and trans…
TOOL · CL_15950 · May 5 · 04:00

Researchers develop SNMF for interpretable LLM feature analysis

Researchers have developed a new method for understanding the internal workings of large language models by decomposing MLP activations. This technique, semi-nonnegative matrix factorization (SNMF), identifies interpret…
TOOL · CL_17217 · May 3 · 07:06

What is Tokenization Drift and How to Fix It?

Tokenization drift occurs when minor formatting changes in input text, such as spacing or line breaks, lead to different token IDs being generated by a model. This can cause unpredictable shifts in model behavior becaus…
RESEARCH · CL_14113 · May 1 · 16:30

Researchers explore efficient transformers via attention control and algorithmic capture

Researchers are exploring methods to enhance transformer efficiency and understanding. One paper introduces Budgeted Attention Allocation, a head-gating mechanism that allows for cost-quality trade-offs. Another study d…
RESEARCH · CL_11927 · May 1 · 04:00

NetNomos framework integrates logic rules into generative ML for networking

Researchers have developed NetNomos, a novel framework designed to integrate explicit network knowledge into generative machine learning models for networking tasks. This approach addresses limitations in current models…
RESEARCH · CL_11145 · May 1 · 00:34

Porting microgpt to Futhark, Part I

The author details their experience porting Andrej Karpathy's microgpt, a concise Python implementation of a GPT-2-like neural network, to the data-parallel language Futhark. The goal was to improve scalability beyond P…
COMMENTARY · CL_09212 · Apr 29 · 15:26

AI firms accused of fear-mongering to boost stock and influence regulation

AI companies, including Anthropic and OpenAI, are increasingly highlighting the potential dangers of their own creations, a strategy critics argue serves to distract from current harms and inflate stock prices. This tac…
RESEARCH · CL_08432 · Apr 29 · 02:23

Galaxy General LDA-1B model unifies diverse data for embodied AI's GPT-2 moment

Galaxy General LDA has introduced LDA-1B, a 1.6 billion parameter model designed to unify the utilization of diverse data sources for embodied AI. This model employs a novel World-Action Fusion approach, enabling it to …