SWE-bench

SWE-bench

Verified

Software engineering benchmark and leaderboard for evaluating AI coding agents on real GitHub issues.

About SWE-bench

SWE-bench is a benchmark and leaderboard for evaluating language models and agents on real software engineering tasks drawn from GitHub repositories. SWE-bench Verified is widely used to compare coding-agent ability on human-filtered tasks.

Key Features

  • Coding-agent benchmark
  • Real GitHub issues
  • Verified subset
  • Leaderboards
  • Agent comparison

Pros & Cons

Pros

+ Important signal for coding-agent capability

+ Uses realistic software tasks

Cons

- Leaderboard performance may not match every codebase

- Can be gamed or overfit like any benchmark

Use Cases

Coding model evaluationAgent benchmarkingAI researchTool selectionEngineering procurement
Pricing
Free

Free public benchmark, datasets, and leaderboard access.

Who It's For
AI researchersEngineering leadersCoding-agent buildersDevelopers
Details
CompanySWE-bench
Founded2023
WebsiteVisit

More in Models & Infrastructure