FrontendNahuelGiudizi/llm-evaluation

llm-benchmark-toolkit

Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.

Claude Code Codex Cursor

Suggested install command

npx skills add NahuelGiudizi/llm-evaluation/llm-benchmark-toolkit

Always inspect the linked repository and skill instructions before running commands. Skills are instructions; permissions and execution still matter.

Instala en 1 click

Submit a related skill

Compatibility

Agent support matrix

3 supported

Agent	Status
Claude Code	Supported
OpenCode	Not listed
Cursor	Supported
MCP	Not listed
GitHub Copilot	Not listed
Windsurf