Submissions from arxiv.org

		Your Agent Is Mine: Measuring Malicious Attacks on the LLM Supply Chain (arxiv.org)
		4 points by bpierre 18 days ago \| past
		Thought Virus: Subliminal Prompting in Multi-Agent Systems (arxiv.org)
		2 points by danielmorozoff 18 days ago \| past
		Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases (arxiv.org)
		1 point by RusDyn 19 days ago \| past
		RoboPhD: Evolving complex agents under tight budgets (arxiv.org)
		3 points by azhenley 19 days ago \| past
		Commercial Persuasion in AI-Mediated Conversations (arxiv.org)
		2 points by gnabgib 19 days ago \| past
		Agentic Code Optimization via Compiler-LLM Cooperation (arxiv.org)
		2 points by matt_d 19 days ago \| past
		PaperOrchestra: Agent "skill pack" for automated paper writing (arxiv.org)
		3 points by noobcoder 19 days ago \| past \| 1 comment
		Benchmarking LLM Tool-Use in the Wild (arxiv.org)
		2 points by Brajeshwar 19 days ago \| past
		The Model Says Walk: How Surface Heuristics Override LLM Reasoning Constraints (arxiv.org)
		1 point by timssopomo 19 days ago \| past
		OpenAI: Short proofs in combinatorics, probability and number theory II (arxiv.org)
		3 points by Tyyps 19 days ago \| past
		Mano-P: Open-source on-device GUI agent, #1 on OSWorld benchmark (arxiv.org)
		2 points by mininglamp 19 days ago \| past
		Neural Computers (arxiv.org)
		2 points by 50kIters 20 days ago \| past
		DesigNet: Learning to Draw Vector Graphics as Designers Do (arxiv.org)
		1 point by 50kIters 20 days ago \| past
		Finetuning Activates Verbatim Recall of Copyrighted Books in LLMs (arxiv.org)
		16 points by guitarlimeo 20 days ago \| past \| 5 comments
		ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
		3 points by xdotli 20 days ago \| past \| 1 comment
		Benchmark to measure AI on graphic design tasks (arxiv.org)
		5 points by purvanshi 20 days ago \| past \| 2 comments
		Frontier AI models are the most cost-efficient (arxiv.org)
		2 points by mzelling 20 days ago \| past
		MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU (arxiv.org)
		326 points by chrsw 20 days ago \| past \| 57 comments
		Improving Interactive In-Context Learning from Natural Language Feedback (arxiv.org)
		1 point by revv00 21 days ago \| past \| 1 comment
		Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks (arxiv.org)
		8 points by pritopian 21 days ago \| past
		AI Assistance Reduces Persistence and Hurts Independent Performance (arxiv.org)
		20 points by dougb5 21 days ago \| past \| 4 comments
		Foundations of Polar Linear Algebra (arxiv.org)
		3 points by znpy 21 days ago \| past
		Frequent ChatGPT users are accurate detectors of AI-generated text (2025) (arxiv.org)
		11 points by croemer 21 days ago \| past \| 2 comments
		SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task (arxiv.org)
		1 point by mohsen1 21 days ago \| past
		The Fast and Spurious: Developer Productivity with GenAI (arxiv.org)
		2 points by jruohonen 22 days ago \| past
		Show HN: A Framework for Evaluating Coding Agents on Sequential SWE (arxiv.org)
		1 point by tdchaitanya 22 days ago \| past
		Attention Residuals (arxiv.org)
		2 points by djhemath 22 days ago \| past \| 1 comment
		Agentic AI and Occupational Displacement: Multi-Regional Task Exposure Analysis (arxiv.org)
		2 points by raviishgupta 22 days ago \| past
		Brevity Constraints Reverse Performance Hierarchies in Language Models (arxiv.org)
		1 point by handfuloflight 22 days ago \| past
		Test-Time Scaling Makes Overtraining Compute-Optimal (arxiv.org)
		1 point by matt_d 22 days ago \| past
		More