NewWe published QA-Bench v0 - Measuring how AI models handle code verification