> FrontierMath is a secret dataset of “hundreds” of hard maths questions, curated by Epoch AI, and announced last month.
The database stopped being secret when it was fed to proprietary LLMs running in the cloud. If anyone is not thinking that OpenAI has trained and tuned O3 on the "secret" problems people fed to GPT-4o, I have a bridge to sell you.
It's perfectly possible for OpenAI to run the model (or prove others the means to run it) without storing queries/outputs for future. I expect Epoch AI would insist on this. Perhaps OpenAI would lie about it, but that's opening up serious charges.
What evidence do we need that AI companies are exploiting every bit of information they can use to get ahead in the benchmarks to generate more hype? Ignoring terms/agreements, violating copyright, and otherwise exploiting information for personal gain is the foundation of that entire industry for crying out loud.
Some people are also forgetting who is the CEO of OpenAI.
Sam Altman has long talked about believing in the "move fast and break things" way of doing business. Which is just a nicer way of saying do whatever dodgy things you can get away with.
OpenAI's also in the position of having to compete against other LLM trainers - including the open-weights Llama models and their community derivatives, which have been able to do extremely well with a tiny fraction of OpenAI's resources - and to justify their astronomical valuation. The economic incentive to cheat is extreme; I think that cheating has to be the default presumption.
The database stopped being secret when it was fed to proprietary LLMs running in the cloud. If anyone is not thinking that OpenAI has trained and tuned O3 on the "secret" problems people fed to GPT-4o, I have a bridge to sell you.