Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Your Agent Is Mine: Measuring Malicious Attacks on the LLM Supply Chain (arxiv.org)
4 points by bpierre 18 days ago | past
Thought Virus: Subliminal Prompting in Multi-Agent Systems (arxiv.org)
2 points by danielmorozoff 18 days ago | past
Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases (arxiv.org)
1 point by RusDyn 19 days ago | past
RoboPhD: Evolving complex agents under tight budgets (arxiv.org)
3 points by azhenley 19 days ago | past
Commercial Persuasion in AI-Mediated Conversations (arxiv.org)
2 points by gnabgib 19 days ago | past
Agentic Code Optimization via Compiler-LLM Cooperation (arxiv.org)
2 points by matt_d 19 days ago | past
PaperOrchestra: Agent "skill pack" for automated paper writing (arxiv.org)
3 points by noobcoder 19 days ago | past | 1 comment
Benchmarking LLM Tool-Use in the Wild (arxiv.org)
2 points by Brajeshwar 19 days ago | past
The Model Says Walk: How Surface Heuristics Override LLM Reasoning Constraints (arxiv.org)
1 point by timssopomo 19 days ago | past
OpenAI: Short proofs in combinatorics, probability and number theory II (arxiv.org)
3 points by Tyyps 19 days ago | past
Mano-P: Open-source on-device GUI agent, #1 on OSWorld benchmark (arxiv.org)
2 points by mininglamp 19 days ago | past
Neural Computers (arxiv.org)
2 points by 50kIters 20 days ago | past
DesigNet: Learning to Draw Vector Graphics as Designers Do (arxiv.org)
1 point by 50kIters 20 days ago | past
Finetuning Activates Verbatim Recall of Copyrighted Books in LLMs (arxiv.org)
16 points by guitarlimeo 20 days ago | past | 5 comments
ClawsBench shows GPT-5.4 tries to reward hack 80% of the time (arxiv.org)
3 points by xdotli 20 days ago | past | 1 comment
Benchmark to measure AI on graphic design tasks (arxiv.org)
5 points by purvanshi 20 days ago | past | 2 comments
Frontier AI models are the most cost-efficient (arxiv.org)
2 points by mzelling 20 days ago | past
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU (arxiv.org)
326 points by chrsw 20 days ago | past | 57 comments
Improving Interactive In-Context Learning from Natural Language Feedback (arxiv.org)
1 point by revv00 21 days ago | past | 1 comment
Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks (arxiv.org)
8 points by pritopian 21 days ago | past
AI Assistance Reduces Persistence and Hurts Independent Performance (arxiv.org)
20 points by dougb5 21 days ago | past | 4 comments
Foundations of Polar Linear Algebra (arxiv.org)
3 points by znpy 21 days ago | past
Frequent ChatGPT users are accurate detectors of AI-generated text (2025) (arxiv.org)
11 points by croemer 21 days ago | past | 2 comments
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task (arxiv.org)
1 point by mohsen1 21 days ago | past
The Fast and Spurious: Developer Productivity with GenAI (arxiv.org)
2 points by jruohonen 22 days ago | past
Show HN: A Framework for Evaluating Coding Agents on Sequential SWE (arxiv.org)
1 point by tdchaitanya 22 days ago | past
Attention Residuals (arxiv.org)
2 points by djhemath 22 days ago | past | 1 comment
Agentic AI and Occupational Displacement: Multi-Regional Task Exposure Analysis (arxiv.org)
2 points by raviishgupta 22 days ago | past
Brevity Constraints Reverse Performance Hierarchies in Language Models (arxiv.org)
1 point by handfuloflight 22 days ago | past
Test-Time Scaling Makes Overtraining Compute-Optimal (arxiv.org)
1 point by matt_d 22 days ago | past

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: