Is it agentic enough? Benchmarking open models on your own tooling | Pasteblog