A new study assessed the reliability of Large Language Models (LLMs) generating code for construction safety, a practice termed "vibe coding." The research found that while LLMs can produce syntactically correct code, they often introduce silent failures due to flawed mathematical logic and a lack of defensive programming. Across tested models like Claude 3.5 Haiku, GPT-4o-Mini, and Gemini 2.5 Flash, a significant portion of generated code exhibited logic deficits, with GPT-4o-Mini producing inaccurate outputs in over half of its functional code. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Current LLMs lack the deterministic rigor for standalone safety engineering in construction, necessitating AI wrappers and governance.
RANK_REASON Academic paper assessing LLM-generated code reliability.