GPT-2

ENTITY GPT-2

GPT-2

PulseAugur coverage of GPT-2 — every cluster mentioning GPT-2 across labs, papers, and developer communities, ranked by signal.

Total · 30d

34

34 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

27

27 over 90d

TIER MIX · 90D

frontier release 2
significant 2
research 9
tool 17
commentary 4

RELATIONSHIPS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 2/2 · 32 TOTAL

COMMENTARY · CL_07342 · Apr 28 · 06:46

Latent reasoning models may offer safer, more interpretable AI

A LessWrong post explores the potential benefits of latent reasoning models (LRMs) for AI safety and interpretability. These models, which perform Chain-of-Thought (CoT) reasoning within their internal activations rathe…
RESEARCH · CL_06772 · Apr 28 · 04:00

Transformer research probes security flaws, training dynamics, and in-context learning limits

Researchers have identified vulnerabilities in the shuffling defense mechanism used to secure Transformer models during inference, demonstrating an attack that can extract model weights by aligning permuted activations.…
RESEARCH · CL_06664 · Apr 28 · 04:00

Research: Removing LayerNorm in LLMs acts as implicit regularizer, impacting performance based on training data size.

Researchers have investigated the impact of removing Layer Normalization (LayerNorm) from neural network architectures, particularly in models like GPT-2 and Llama. Their findings indicate that replacing LayerNorm with …
RESEARCH · CL_13934 · Apr 27 · 21:55

Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge

A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…
RESEARCH · CL_03012 · Apr 23 · 13:42

New GEM activation functions offer smoother, rational alternatives to ReLU

Researchers have introduced Geometric Monomial (GEM), a new family of activation functions designed for deep neural networks. These functions utilize purely rational arithmetic and offer $C^{2N}$-smoothness, aiming to i…
RESEARCH · CL_05413 · Apr 22 · 15:31

Researchers find variance doesn't equal importance in transformer compression

Researchers have conducted a systematic study on transformer compression, analyzing over 40 experiments across GPT-2 and Mistral 7B models. Their findings indicate that variance in activation directions does not correla…
FRONTIER RELEASE · CL_01749 · Apr 7 · 05:44

Anthropic reveals dangerous Claude Mythos model and $30B ARR amid OpenAI rivalry

Anthropic has announced significant growth, reaching $30 billion in ARR, a substantial increase from $19 billion in March. Concurrently, they unveiled Claude Mythos, a powerful new model deemed too dangerous for public …
FRONTIER RELEASE · CL_01024 · Aug 9 · 11:23

OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models

OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
RESEARCH · CL_00954 · Jul 30 · 22:00

EleutherAI releases open-source tool for interpreting AI model features

EleutherAI has released an open-source library for automatically interpreting features within sparse autoencoders, a method used to decompose model activations. This tool leverages large language models like Llama 3.1 a…
RESEARCH · CL_00955 · Jun 14 · 11:00

OpenAI explores weak-to-strong generalization for AI alignment

OpenAI has introduced a new research direction called weak-to-strong generalization, aiming to address the challenge of aligning future superintelligent AI systems with human supervision. Their initial experiments show …
SIGNIFICANT · CL_02483 · Nov 17 · 08:00

OpenAI names Mira Murati interim CEO amid Sam Altman's departure

OpenAI has announced a significant leadership transition, with CEO Sam Altman departing the company. Chief Technology Officer Mira Murati has been appointed interim CEO, effective immediately, while the board initiates …
FRONTIER RELEASE · CL_00176 · Sep 14 · 22:00

Cracking the code of failed AI pilots

Anthropic has withheld its new Claude Mythos model from public release due to its advanced capabilities in finding and exploiting software vulnerabilities. The company is instead providing access to select cybersecurity…