| | LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com) |
| 2 points by vismit2000 75 days ago | past |
|
| | AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com) |
| 2 points by dustfinger 7 months ago | past |
|
| | AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com) |
| 2 points by dustfinger 7 months ago | past |
|
| | The Complete LLM Evaluation Playbook: How To Run LLM Evals That Matter (confident-ai.com) |
| 2 points by jeffreyip 11 months ago | past |
|
| | YC helped us raise our seed round in 5 days (confident-ai.com) |
| 4 points by jeffreyip on March 20, 2025 | past |
|
| | LLM Evaluation Metrics (confident-ai.com) |
| 1 point by tmlee on Nov 10, 2024 | past | 1 comment |
|
| | How to evaluate multi-turn LLM chatbots (confident-ai.com) |
| 3 points by 3d27 on Oct 8, 2024 | past |
|
| | LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com) |
| 1 point by dphuang2 on Oct 2, 2024 | past |
|
| | We wrote a comprehensive guide on LLM security (confident-ai.com) |
| 1 point by 3d27 on Aug 20, 2024 | past |
|
| | How to generate synthetic data using SOTA data evolution methods (confident-ai.com) |
| 1 point by 3d27 on May 21, 2024 | past |
|
| | How to build your own LLM evaluation framework (confident-ai.com) |
| 2 points by 3d27 on April 15, 2024 | past |
|
| | Overview of All Major LLM Benchmarks (confident-ai.com) |
| 1 point by 3d27 on March 22, 2024 | past |
|
| | I wrote an article about everything I know about LLM metrics (confident-ai.com) |
| 2 points by 3d27 on March 12, 2024 | past | 1 comment |
|
| | Best practices I learnt from helping health tech enterprise test LLMs (confident-ai.com) |
| 1 point by 3d27 on Feb 27, 2024 | past |
|
| | Best Practices for Unit Testing RAG Systems in Prod (confident-ai.com) |
| 4 points by 3d27 on Feb 6, 2024 | past |
|
| | Everything I know about LLM evaluation metrics (confident-ai.com) |
| 7 points by 3d27 on Jan 24, 2024 | past |
|
| | I used QAG to implement an LLM text summarization evals (confident-ai.com) |
| 3 points by 3d27 on Dec 19, 2023 | past |
|
| | What Is RAG? (With Examples) (confident-ai.com) |
| 1 point by 3d27 on Dec 1, 2023 | past |
|
| | We Replaced Pinecone with PGVector (confident-ai.com) |
| 3 points by shaburn on Nov 1, 2023 | past | 1 comment |
|
| | Testing for Image Similarity with DeepEval (confident-ai.com) |
| 1 point by jacky2wong on Oct 2, 2023 | past |
|
| | DeepEval GuardRails – AI Alignment (confident-ai.com) |
| 2 points by jacky2wong on Sept 30, 2023 | past |
|
| | Evaluating LLMs for Lawyers (confident-ai.com) |
| 1 point by jacky2wong on Sept 25, 2023 | past |
|
| | How to Evaluate LangChain QA Retrieval (confident-ai.com) |
| 1 point by jacky2wong on Sept 23, 2023 | past |
|
| | Automatic Unit Testing for LLMs (confident-ai.com) |
| 2 points by jacky2wong on Sept 12, 2023 | past |
|
| | Framework for Evaluating Rag (confident-ai.com) |
| 1 point by jacky2wong on Sept 11, 2023 | past |
|
| | PDB Support for DeepEval (confident-ai.com) |
| 1 point by jacky2wong on Sept 7, 2023 | past |
|
| | Test for Bias After Finetuning LLMs (confident-ai.com) |
| 1 point by jacky2wong on Sept 2, 2023 | past |
|
| | Measure Answer Relevancy of LLMs (confident-ai.com) |
| 1 point by jacky2wong on Sept 2, 2023 | past |
|
| | Test LLMs for Toxicness (confident-ai.com) |
| 1 point by jacky2wong on Sept 2, 2023 | past |
|
| | Auto-Evaluation of LLMs with DeepEval (confident-ai.com) |
| 2 points by jacky2wong on Sept 1, 2023 | past | 1 comment |
|
|
| More |