Back to directory
FrontendNahuelGiudizi/llm-evaluation
llm-benchmark-toolkit
Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.
Suggested install command
npx skills add NahuelGiudizi/llm-evaluation/llm-benchmark-toolkitAlways inspect the linked repository and skill instructions before running commands. Skills are instructions; permissions and execution still matter.
Compatibility
Agent support matrix
3 supported
| Agent | Status |
|---|---|
| Claude Code | Supported |
| OpenCode | Not listed |
| Cursor | Supported |
| MCP | Not listed |
| GitHub Copilot | Not listed |
| Windsurf |