Shocking Result: Claude, GPT, and Gemini Fail to Fully Rebuild Software Projects!

The creators of SWE-Bench just released a new, extremely difficult benchmark. The results are truly shocking: Claude Opus…