Testing & Reviewvcanonici/mahout-bench

mahout-bench

CLI benchmark for measuring and mitigating sycophancy in LLMs. Supports multi-provider execution, configurable judges, and long-running evaluation campaigns.

Claude Code Codex Cursor

Suggested install command

npx skills add vcanonici/mahout-bench/mahout-bench

Always inspect the linked repository and skill instructions before running commands. Skills are instructions; permissions and execution still matter.

Instala en 1 click

Submit a related skill

Compatibility

Agent support matrix

3 supported

Agent	Status
Claude Code	Supported
OpenCode	Not listed
Cursor	Supported
MCP	Not listed
GitHub Copilot	Not listed
Windsurf