More

par1970 · 2026-06-03T22:51:24 1780527084

> AI (in this form) will never be able to solve things we truly cannot solve yet.

Argument?

rad_val · 2026-06-03T23:36:28 1780529788

The strongest argument for this is structural: what LLMs are.

In a brutal simplistic way: each token is represented in a high dimensional vector. LLMs operate on them. They are the true, underlying meaning of the token for the LLM. Think of it as 1000+ ways to think of that word/token. Those meanings are baked in at training time. So, LLMs might be able to cross-reference them and solve a class of problems that flew under our radar, but can't come up with revolutionary theories that were never in the training set.

Of course, they will help winning a Nobel in the years to come, no doubt, but can't speak mathematics we can't understand (beyond simple obfuscation) and won't discover anything substantial on their own.

resident423 · 2026-06-04T01:58:36 1780538316

> but can't come up with revolutionary theories that were never in the training set.

Can you elaborate? I don't think the solution to the unit distance problem was in the training set, but I'm guessing you mean there's some higher bar for revolutionary theories LLMs cant reach? If so where do you expect the limit will be?

redox99 · 2026-06-04T00:30:14 1780533014

Instead of going into a long technical argument of why your description of LLMs is flawed, I'll go straight to the point, because people keep moving the goal posts.

What exact problem would need to be solved by LLMs to convince you that they DO discover novel solutions?

rad_val · 2026-06-04T01:27:26 1780536446

I'm more interested why you think my understanding is flawed honestly. I thought I distilled it decently well in two sentences. The bottom line is, in this hyperdimensional space you can find relationships that are not easily distinguished by human minds, but the corpus is still fixed, a llm can't truly know anything beyond its training data.

redox99 · 2026-06-04T03:32:01 1780543921

> Think of it as 1000+ ways to think of that word/token

I assume you used 1000 because that's in the ballpark of the vector size. But these are not independent scalars, like each might store a certain property. Just like in 2D you can have 4 quadrants (or subdivide further), with a vector of size 1000 you can encode an insane amount of meaning.

> Those meanings are baked in at training time. So, LLMs might be able to cross-reference them and solve a class of problems that flew under our radar, but can't come up with revolutionary theories that were never in the training set.

There's a lot of jumping to conclusions here, but I'll try to answer more generally.

This idea of how LLMs work is mostly to build an intuition, like with a CNN you'd say imagine a layer does edge detection, and so on. And to some degree you can detect those kinds of behavior, but a NN is a VERY general architecture. It needn't work like you say, it can calculate any function and running under a loop and a scratchpad (basically an agent) is turing complete.

Even ignoring that, this part is misleading

> Those meanings are baked in at training time.

Being baked in at training time does not mean it didn't build novel meanings at training time.

This is even more significant when you take into account post training RL.

A simple proof that transformers can generate novel, superhuman solutions, is that you can build a transformer based chess bot, feed it 0 human games, and train it with RL until it can beat any human, completely novel and unconstrained by human gameplay (because it would've never seen it).

You can do that with any task that's verifiable, like coding or math.

(Also as a separate fact, as long as a task is easier to verify than solve (basically always), you have somewhat of a million monkeys with a typewriter, and with temperature sampling the model might eventually stumble it's way onto a solution.)

dehsge · 2026-06-04T02:09:42 1780538982

unify general relativity with quantum mechanics. The continuum hypothesis. The traveling salesman problem in polynomial time.

redox99 · 2026-06-04T03:49:52 1780544992

I think it's cool how in a decade we went from

"Neural networks will never be able to understand this sentence that's obvious to humans"

to

"LLMs must be able to solve problems that humanity hasn't been able to after almost a century, and that might even be unsolvable"

dehsge · 2026-06-04T12:08:50 1780574930

So that is kind of the point of studying maths right?

Why something in unsolvable or undecidable can be as important as the output of a theorem.

Questions like these, fields medal level problems or Karp’s 21 NP-complete problem are problems working mathematicians are interested in.

Will LLMs help as an human assistant in the future? Probably.

Will LLMs answer these questions themselves, provide insights and bounds to these new mathematics and teach other mathematicians why this new math they create is true?

Will these models have phds and take candidates teaching them how to apply and think about the maths problems they are interested in?

roywiggins · 2026-06-04T04:21:30 1780546890

it can operate at the level of a mere mathematics professor, who everyone knows are barely conscious, basically automatons. wake me up when it's Einstein

3uruiueijjj · 2026-06-04T07:25:36 1780557936

The continuum hypothesis was proven independent of ZFC over sixty years ago, I think even GPT2 could have told you that much.

int_19h · 2026-06-04T03:42:38 1780544558

I don't see how any of this follow. Yes, the LLMs will learn the "meaning" (here narrowly defined as relative configuration in the embedding space) of vectors that correspond to tokens in whatever tokenizer is used to feed into them. But that vector space is not discrete, and nothing precludes the model from internally operating on other vectors that it never saw in training, based on how they relate to those vectors which it did see.

fc417fc802 · 2026-06-03T23:07:29 1780528049

We have yet to see evidence of proper generalization AFAIK. Examples such as this proof are the closest I'm aware of. I haven't read this one in detail yet but the other examples I've seen have been (upon examination) much closer to an (absurdly) deep literature search than to novel thought.

Obviously that doesn't mean we won't eventually achieve novel thought, or even that the current form is fundamentally incapable of it, merely that we've yet to see evidence of it and thus the default assumption is that we aren't there yet.

rienbdj · 2026-06-03T23:47:04 1780530424

The burden of proof is the other way

par1970 · 2026-05-16T13:46:38 1778939198

> LLMs are nothing close to AGI and not going to lead to it, they can’t distinguish right from wrong, they can’t count, they can’t reason, they generate plausible text from a vast databank of connected text.

Argument?

Are LLMs close to being able to significantly help AGI researchers?

par1970 · 2026-05-16T13:41:53 1778938913

Which doomer argument have you found what problem with?

par1970 · 2026-04-20T21:40:04 1776721204

Do you deny the reported bug finding capabilities, or do you deny that they are dangerous?

par1970 · 2026-04-17T03:58:49 1776398329

I think this is roughly solved. Tell the agent to do all of its calculations in python.

delfinom · 2026-04-17T14:47:31 1776437251

...OR if you are developing a PCB, you have the design data, and pick and place data and the gerber data.

Any combination of such gives you positions of everything within micrometers.

This is not a new problem. Testing of PCBs has been solved a billion times over and the world has had bed of nails tester and flying probe testers for 4 decades old.

We have a 8 finger flying probe machine at our facility that literally all we do is load the board in, load in the design data. It identifies points of interests, learns the fiducials and we let it do a characterization run. We then have engineering review of the resulting data and just let it fly afterwards.

None of this requires AI.

But nowadays any linear regression qualifies as AI so imma go slap a label on it.

par1970 · 2026-04-16T11:29:01 1776338941

Do you have a defense of why human-hammer-nail is a good analogy for human-chatgpt5.4-pwndsamsung?

BLKNSLVR · 2026-04-16T11:44:59 1776339899

AI without a suitably well crafted prompt is like a firework tube held by a 3 year old.

AI without a prompt is a hammer sitting in a drawer.

par1970 · 2026-04-04T19:57:02 1775332622

But the service also tells criminals and adversaries about the bomb locations.

tptacek · 2026-04-04T19:58:36 1775332716

And? So do a variety of other services. Was it your impression that the criminals and adversaries were behind the 8 ball on this?

AI is reviving debates about vulnerability research that we thought we killed off in the 1990s.

tosti · 2026-04-06T13:52:11 1775483531

Perhaps the argument isn't about the ethics of security research, but rather the divide between those who can afford non-free software licenses and those who ethically or circumstancially can't.

tptacek · 2026-04-06T14:05:34 1775484334

You'd see the same thing in 1990s full-disclosure debates, where people trying to create a social/cultural argument against vulnerability research would throw this kind of stuff against the wall just to see what would stick. It's either good to know about vulnerabilities in the code you rely on or it isn't.

tosti · 2026-04-06T18:49:11 1775501351

Yes, of course. It's a bloody shame some of those tools are inaccessible to the poor, the not poor but f* your stupid payment system that doesn't connect to my bank, the software freedom enthousiasts, possibly others.

For myself, software freedom isn't just an ethical issue but also a practical neccesity.

par1970 · 2026-04-02T18:42:34 1775155354

> What do you mean "a priori understanding codebases"?

I took him to be distinguishing between (1) just reading the code/docs and reasoning about it, and (2) that + crafting and running tests.

tptacek · 2026-04-02T19:25:11 1775157911

I don't think that's it; both reading the code and running tests are a posteriori capabilities.

sigbottle · 2026-04-02T20:52:09 1775163129

No you're right. I initially thought you were wrong but it is sus.

My intuition for a priori cut something along the lines of, "Even if you had the entire source code in your head at once, there's limits to reasoning about it". Computability is one hard result. You also have to interact with the real world on a wide variety of hardware systems, or even just a wide variety of systems if you create an API - how do you reason past the abstraction boundary reliably without actually having tests and interacting with systems and getting feedback? Not really possible unless LLM's control everything. More philosophical questions (such as "is our 'correct' actually the right thing?") we grant the easy case that everybody's in consensus - the "easier" problems show up either way.

But getting to the point of "understanding in principle every piece of linux" is pretty undefined and practically doesn't seem possible for a singular LLM or a human. This also seems really hairy for smuggling in whatever implicit premises you want to swing the issue either way.

But personally I (and many other people) have seen late 2025 models get extremely good, and that precisely is because they actually started doing deep tooling and like, actually running and testing their code. I was not getting nearly as much value out of them (still a decent amount of value!) prior to the tooling explosion, not even MCPs were good. It was when they actually started aggressively spawning subshells and executing live tests. But I guess using a priori/posterioi isn't really a useful split here?

par1970 · 2026-04-02T20:41:54 1775162514

Yeah, maybe you are right. But is doing math and reasoning about Turing machines a priori? If so, then it seems plausible to me that reasoning about a codebase (without running it) is also ‘a priori’.

par1970 · 2026-03-31T19:55:36 1774986936

par1970 · 2026-03-19T22:12:35 1773958355

How much domain experience do you have? Is it helping you solve problems for paying customers?

yibers · 2026-03-20T04:52:49 1773982369

I have plenty of domain experience but I won't define myself as an expert. It helped me solve real business problems.