Awesome Gen AI Tools: How to evaluate a summarization task | OpenAI Cookbook
Directory
Browse agent skills
Search by intent: design systems, testing, deployment, security, documentation, Azure, Supabase, React, or any workflow you want an agent to perform better.
Showing 30,241–30,270 of 80,937 skills
Page 1,009 of 2,698 · loading 30 at a time
no-code batch compute platform for LLM evaluation and tuning workloads
Awesome Gen AI Tools: Amazon will offer human benchmarking teams to test AI models - The Verge
Awesome Gen AI Tools: OpenAI Cookbook: Evaluating RAG systems | by Ravi Theja | Nov, 2023 | LlamaIndex Blog
Awesome Gen AI Tools: A Survey on Evaluation of Large Language Models | ACM Transactions on Intelligent Systems and Technology
Awesome Gen AI Tools: The Crucial Role of Model Evaluation in LLM and AI Integrations
Awesome Gen AI Tools: Criteria Evaluation | 🦜️🔗 LangChain
Awesome Gen AI Tools: LLM Evaluation Metrics: Everything You Need for LLM Evaluation - Confident AI
Awesome Gen AI Tools: Large Language Model Evaluation in 2024: 5 Methods
Awesome Gen AI Tools: The Ultimate Guide to LLM Evaluation | Deci
Awesome Gen AI Tools: How to Evaluate Large Language Model Outputs: Current Best Practices | FinetuneDB
Awesome Gen AI Tools: AI Evaluation Metrics | Microsoft Learn
Awesome Gen AI Tools: How to Evaluate LLM Applications: The Complete Guide - Confident AI
Awesome Gen AI Tools: How to Evaluate, Compare, and Optimize LLM Systems
Awesome Gen AI Tools: The Ultimate Guide to LLM Product Evaluation
Awesome Gen AI Tools: LLM Evaluation: Everything You Need To Run, Benchmark Evals
"An Open Source Language Model Specialized in Evaluating Other Language Models."
Methods, Best Practices & Tools | Lakera – Protecting AI teams that disrupt the world
Awesome Gen AI Tools: Reward Bench Leaderboard - a Hugging Face Space by allenai
Awesome Gen AI Tools: LLM Benchmarks: MMLU, HellaSwag, BBH, and Beyond - Confident AI
Awesome Gen AI Tools: Cleanlab Trustworthy Language Model: Score the trustworthiness of any LLM response
multi-agent & multi-LLM client with RAG, multi-modality, automation, code interpreter, and sandboxed file system