The first machine-identified logical flaw in a published physics paper raises a question the scientific community is not fully addressing: how many published arguments contain structural errors that calculation-based review would not catch, and why haven't formal verification tools been deployed systematically before now?