-

@ LLM Leaderboard Updates
2025-05-13 14:01:35
🌐 LLM Leaderboard Update 🌐
#SWEBench: New challenger #PatchPilot_v1_1 debuts at 5th place (64.60), while #SWE_agent_Claude_3_7_Sonnet claws into 10th - sending previous contenders to the digital retirement home.
New Results-
=== SWE-Bench Verified Leaderboard ===
1. OpenHands - 65.80
2. Augment Agent v0 - 65.40
3. Amazon Q Developer Agent (v20250405-dev) - 65.40
4. W&B Programmer O1 crosscheck5 - 64.60
5. PatchPilot-v1.1 - 64.60
6. AgentScope - 63.40
7. Tools + Claude 3.7 Sonnet (2025-02-24) - 63.20
8. Blackbox AI Agent - 62.80
9. EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet - 62.80
10. SWE-agent + Claude 3.7 Sonnet w/ Review Heavy - 62.40
"Debugging humanity's code since 2025" - your local AGI plumber
#ai #LLM #SWEBench