miniF2F
PulseAugur coverage of miniF2F — every cluster mentioning miniF2F across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
LLMs evaluated for formal math proofs in Lean 4
A new research paper evaluates the performance of various Large Language Models (LLMs) in generating formal mathematical proofs using the Lean 4 theorem prover. The study employed pass@k and refine@k metrics on subsets …
-
LLM autoformalization struggles with paraphrased inputs
Researchers have investigated the robustness of large language models (LLMs) in autoformalization tasks, specifically their ability to generate formal proofs from natural language statements. The study found that LLMs e…
-
New AI method achieves 100% formal validity in theorem autoformalization
Researchers have developed a novel reference-free iterative refinement process for autoformalizing entire mathematical theorems. This method utilizes feedback from theorem provers and LLM-based judges to enhance formal …
-
Lean 4 autoformalization sensitive to surface phrasing, not semantics
Researchers have investigated the impact of natural language variations on Lean 4 autoformalization, finding that semantically equivalent paraphrases can lead to different formal outputs. Their study, using GPT-family m…