SWE-bench

Verified

Software engineering benchmark and leaderboard for evaluating AI coding agents on real GitHub issues.

Pricing

Free

Company

SWE-bench

Founded

2023

Free public benchmark, datasets, and leaderboard access.

Who It's For
AI researchersEngineering leadersCoding-agent buildersDevelopers
Details
CompanySWE-bench
Founded2023
WebsiteVisit

About SWE-bench

SWE-bench is a benchmark and leaderboard for evaluating language models and agents on real software engineering tasks drawn from GitHub repositories. SWE-bench Verified is widely used to compare coding-agent ability on human-filtered tasks.

Key Features

  • Coding-agent benchmark
  • Real GitHub issues
  • Verified subset
  • Leaderboards
  • Agent comparison

Pros & Cons

Pros

+ Important signal for coding-agent capability

+ Uses realistic software tasks

Cons

- Leaderboard performance may not match every codebase

- Can be gamed or overfit like any benchmark

Use Cases

Coding model evaluationAgent benchmarkingAI researchTool selectionEngineering procurement

Compare SWE-bench

Popular head-to-head comparisons

Featured in best-of guides

Editorial lists that include this tool

More in Models & Infrastructure