Submissions from confident-ai.com

		LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com)
		2 points by vismit2000 75 days ago \| past
		AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com)
		2 points by dustfinger 7 months ago \| past
		AI Agent Evaluation: The Definitive Guide to Testing AI Agents (confident-ai.com)
		2 points by dustfinger 7 months ago \| past
		The Complete LLM Evaluation Playbook: How To Run LLM Evals That Matter (confident-ai.com)
		2 points by jeffreyip 11 months ago \| past
		YC helped us raise our seed round in 5 days (confident-ai.com)
		4 points by jeffreyip on March 20, 2025 \| past
		LLM Evaluation Metrics (confident-ai.com)
		1 point by tmlee on Nov 10, 2024 \| past \| 1 comment
		How to evaluate multi-turn LLM chatbots (confident-ai.com)
		3 points by 3d27 on Oct 8, 2024 \| past
		LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide (confident-ai.com)
		1 point by dphuang2 on Oct 2, 2024 \| past
		We wrote a comprehensive guide on LLM security (confident-ai.com)
		1 point by 3d27 on Aug 20, 2024 \| past
		How to generate synthetic data using SOTA data evolution methods (confident-ai.com)
		1 point by 3d27 on May 21, 2024 \| past
		How to build your own LLM evaluation framework (confident-ai.com)
		2 points by 3d27 on April 15, 2024 \| past
		Overview of All Major LLM Benchmarks (confident-ai.com)
		1 point by 3d27 on March 22, 2024 \| past
		I wrote an article about everything I know about LLM metrics (confident-ai.com)
		2 points by 3d27 on March 12, 2024 \| past \| 1 comment
		Best practices I learnt from helping health tech enterprise test LLMs (confident-ai.com)
		1 point by 3d27 on Feb 27, 2024 \| past
		Best Practices for Unit Testing RAG Systems in Prod (confident-ai.com)
		4 points by 3d27 on Feb 6, 2024 \| past
		Everything I know about LLM evaluation metrics (confident-ai.com)
		7 points by 3d27 on Jan 24, 2024 \| past
		I used QAG to implement an LLM text summarization evals (confident-ai.com)
		3 points by 3d27 on Dec 19, 2023 \| past
		What Is RAG? (With Examples) (confident-ai.com)
		1 point by 3d27 on Dec 1, 2023 \| past
		We Replaced Pinecone with PGVector (confident-ai.com)
		3 points by shaburn on Nov 1, 2023 \| past \| 1 comment
		Testing for Image Similarity with DeepEval (confident-ai.com)
		1 point by jacky2wong on Oct 2, 2023 \| past
		DeepEval GuardRails – AI Alignment (confident-ai.com)
		2 points by jacky2wong on Sept 30, 2023 \| past
		Evaluating LLMs for Lawyers (confident-ai.com)
		1 point by jacky2wong on Sept 25, 2023 \| past
		How to Evaluate LangChain QA Retrieval (confident-ai.com)
		1 point by jacky2wong on Sept 23, 2023 \| past
		Automatic Unit Testing for LLMs (confident-ai.com)
		2 points by jacky2wong on Sept 12, 2023 \| past
		Framework for Evaluating Rag (confident-ai.com)
		1 point by jacky2wong on Sept 11, 2023 \| past
		PDB Support for DeepEval (confident-ai.com)
		1 point by jacky2wong on Sept 7, 2023 \| past
		Test for Bias After Finetuning LLMs (confident-ai.com)
		1 point by jacky2wong on Sept 2, 2023 \| past
		Measure Answer Relevancy of LLMs (confident-ai.com)
		1 point by jacky2wong on Sept 2, 2023 \| past
		Test LLMs for Toxicness (confident-ai.com)
		1 point by jacky2wong on Sept 2, 2023 \| past
		Auto-Evaluation of LLMs with DeepEval (confident-ai.com)
		2 points by jacky2wong on Sept 1, 2023 \| past \| 1 comment
		More