NewWe published QA-Bench v0 - Measuring how AI models handle code verification

Blogs

Research, benchmarks, and insights on AI-powered code verification.

QA-Bench v0: Measuring How AI Models Handle Code Verification

We built QA-Bench v0, an early evaluation for a task no existing benchmark measures: given a real pull request on a production codebase, can an AI model identify every affected user flow and generate relevant tests?