Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com)
2 points by vismit2000 75 days ago | past
AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com)
2 points by dustfinger 7 months ago | past
AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com)
2 points by dustfinger 7 months ago | past
The Complete LLM Evaluation Playbook: How To Run LLM Evals That Matter (confident-ai.com)
2 points by jeffreyip 11 months ago | past
YC helped us raise our seed round in 5 days (confident-ai.com)
4 points by jeffreyip on March 20, 2025 | past
LLM Evaluation Metrics (confident-ai.com)
1 point by tmlee on Nov 10, 2024 | past | 1 comment
How to evaluate multi-turn LLM chatbots (confident-ai.com)
3 points by 3d27 on Oct 8, 2024 | past
LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com)
1 point by dphuang2 on Oct 2, 2024 | past
We wrote a comprehensive guide on LLM security (confident-ai.com)
1 point by 3d27 on Aug 20, 2024 | past
How to generate synthetic data using SOTA data evolution methods (confident-ai.com)
1 point by 3d27 on May 21, 2024 | past
How to build your own LLM evaluation framework (confident-ai.com)
2 points by 3d27 on April 15, 2024 | past
Overview of All Major LLM Benchmarks (confident-ai.com)
1 point by 3d27 on March 22, 2024 | past
I wrote an article about everything I know about LLM metrics (confident-ai.com)
2 points by 3d27 on March 12, 2024 | past | 1 comment
Best practices I learnt from helping health tech enterprise test LLMs (confident-ai.com)
1 point by 3d27 on Feb 27, 2024 | past
Best Practices for Unit Testing RAG Systems in Prod (confident-ai.com)
4 points by 3d27 on Feb 6, 2024 | past
Everything I know about LLM evaluation metrics (confident-ai.com)
7 points by 3d27 on Jan 24, 2024 | past
I used QAG to implement an LLM text summarization evals (confident-ai.com)
3 points by 3d27 on Dec 19, 2023 | past
What Is RAG? (With Examples) (confident-ai.com)
1 point by 3d27 on Dec 1, 2023 | past
We Replaced Pinecone with PGVector (confident-ai.com)
3 points by shaburn on Nov 1, 2023 | past | 1 comment
Testing for Image Similarity with DeepEval (confident-ai.com)
1 point by jacky2wong on Oct 2, 2023 | past
DeepEval GuardRails – AI Alignment (confident-ai.com)
2 points by jacky2wong on Sept 30, 2023 | past
Evaluating LLMs for Lawyers (confident-ai.com)
1 point by jacky2wong on Sept 25, 2023 | past
How to Evaluate LangChain QA Retrieval (confident-ai.com)
1 point by jacky2wong on Sept 23, 2023 | past
Automatic Unit Testing for LLMs (confident-ai.com)
2 points by jacky2wong on Sept 12, 2023 | past
Framework for Evaluating Rag (confident-ai.com)
1 point by jacky2wong on Sept 11, 2023 | past
PDB Support for DeepEval (confident-ai.com)
1 point by jacky2wong on Sept 7, 2023 | past
Test for Bias After Finetuning LLMs (confident-ai.com)
1 point by jacky2wong on Sept 2, 2023 | past
Measure Answer Relevancy of LLMs (confident-ai.com)
1 point by jacky2wong on Sept 2, 2023 | past
Test LLMs for Toxicness (confident-ai.com)
1 point by jacky2wong on Sept 2, 2023 | past
Auto-Evaluation of LLMs with DeepEval (confident-ai.com)
2 points by jacky2wong on Sept 1, 2023 | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: