More

emil-lp · 2026-02-17T22:31:31 1771367491

How is the question nonsensical? It's a perfectly valid question.

dugidugout · 2026-02-17T23:11:15 1771369875

Because validity doesn't depend on meaning. Take the classic example: "What is north of the North Pole?". This is a valid phrasing of a question, but is meaningless without extra context about spherical geometry. The trick question in reference is similar in that its intended meaning is contained entirely in the LLM output.

simondotau · 2026-02-18T00:41:50 1771375310

There's nothing syntactically meaningless about wanting your car washed.

dugidugout · 2026-02-18T04:58:34 1771390714

I wasn't under the impression anyone was discussing car washing.

simondotau · 2026-02-18T06:26:05 1771395965

>>>>>>> Still fails the car wash question

>>>>>> Remarkable, since the goal is clearly stated

>>>>> Well it is...non-sensical...the car is already at the car wash

>>>> How is the [car wash] question nonsensical?

>>> Because validity doesn't depend on meaning.

>> There's nothing syntactically meaningless about wanting your car washed.

> I wasn't under the impression anyone was discussing car washing.

Maybe you replied to the wrong post by mistake?

dugidugout · 2026-02-18T23:29:55 1771457395

I was not replying to your remark, but rather, a later comment regarding the "validity" vs "sensibility". I don't see where I made any distinction concerning wanting to wash cars.

But now I suppose I'll engage your remark. The question is clearly a trick in any interpretive frame I can imagine. You are treating the prompt as a coherent reality which it isn't. The query is essentially a logical null-set. Any answer the AI provides is merely an attempt to bridge that void through hallucinated context and certainly has nothing to do with a genuine desire to wash your car.

simondotau · 2026-02-22T00:26:52 1771720012

I wasn't under the impression anyone was discussing "genuine desire".

jatari · 2026-02-17T22:56:44 1771369004

I agree that it doesn't break any rules of the English language, that doesn't make it a valid question in everyday contexts though.

Ask a human that question randomly and see how they respond.

mvdtnz · 2026-02-17T23:16:33 1771370193

Can you explain yourself? I can't see how this question doesn't make sense in any way.

methyl · 2026-02-18T16:09:39 1771430979

Because to 99.9% people it’s obvious and fair to assume that person asking this question knows that you need a car to wash it. No one ever could ask this question not knowing this, so it implies some trick layer.

emil-lp · 2026-02-17T09:16:50 1771319810

It's all encrypted.

emil-lp · 2026-02-15T13:10:58 1771161058

Someone has been sipping from Tegmark's cool-aid.

emil-lp · 2026-02-14T03:14:09 1771038849

Mathematica guarantees correctness. It should be safe for a while.

krackers · 2026-02-14T05:31:33 1771047093

Tell that to the various confirmed computational bugs in mathematica :)

https://mathematica.stackexchange.com/questions/tagged/bugs

emil-lp · 2026-02-13T19:42:16 1771011736

"GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

Probably not something that the average GI Joe would be able to prompt their way to...

I am skeptical until they show the chat log leading up to the conjecture and proof.

Sharlin · 2026-02-13T20:07:30 1771013250

I'm a big LLM sceptic but that's… moving the goalposts a little too far. How could an average Joe even understand the conjecture enough to write the initial prompt? Or do you mean that experts would give him the prompt to copy-paste, and hope that the proverbial monkey can come up with a Henry V? At the very least posit someone like a grad student in particle physics as the human user.

buttered_toast · 2026-02-13T20:23:21 1771014201

I would interpret it as implying that the result was due to a lot more hand-holding that what is let on.

Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?

Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.

These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.

But regardless, a result is a result so I'm content with it.

lupsasca · 2026-02-13T21:43:23 1771019003

Hi I am an author of the paper. We believed that a simple formula should exist but had not been able to find it despite significant effort. It was a collaborative effort but GPT definitely solved the problem for us.

buttered_toast · 2026-02-13T21:48:47 1771019327

Oh that's really cool, I am not versed in physics by any means, can you explain how you believed there to be a simple formula but were unable to find it? What would lead you to believe that instead of just accepting it at face value?

lupsasca · 2026-02-13T21:57:50 1771019870

There are closely related "MHV amplitudes" which naively obey a really complicated formula, but for which there famously also exists a much simpler "Parke-Taylor formula". Alfredo had derived a complicated expression for these new "single-minus amplitudes" and we were hoping we could find an analogue of the simpler "Parke-Taylor formula" for them.

buttered_toast · 2026-02-13T22:00:49 1771020049

Thank you for taking the time to reply, I see you might have already answered this elsewhere so it's much appreciated.

lupsasca · 2026-02-13T23:26:11 1771025171

My pleasure---thank you for your interest!

etraql · 2026-02-13T22:41:27 1771022487

Do you also work at OpenAI? A comment pointing that out was flagged by the LLM marketers.

buttered_toast · 2026-02-13T23:32:22 1771025542

I think it says in the paper that he does, but it's also public knowledge.

https://www.linkedin.com/in/alex-lupsasca-9096a214/

lupsasca · 2026-02-14T05:52:22 1771048342

Correct, on both counts!

lamontcg · 2026-02-13T20:43:17 1771015397

That's kinda the whole point.

SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.

The optimization algorithm was used by human experts to solve the problem.

Sharlin · 2026-02-13T20:55:23 1771016123

In this case there certainly were experts doing hand-holding. But simply being able to ask the right question isn't too much to ask, is it? If it had been merely a grad student or even a PhD student who had asked ChatGPT to figure out the result, and ChatGPT had done that, even interactively with the student, this would be huge news. But an average person? Expecting LLMs to transcend the GIGO principle is a bit too much.

slopusila · 2026-02-13T20:20:40 1771014040

hey, GPT, solve this tough conjecture I've read about on Quanta. make no mistakes

2026-02-13T20:32:18 1771014738

[dead]

terminalbraid · 2026-02-13T20:42:25 1771015345

"Hey GPT thanks for the result. But is it actually true?"

jmalicki · 2026-02-13T21:38:38 1771018718

"Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

Is this so different?

sejje · 2026-02-14T16:28:04 1771086484

The Average Joe reads at an 8th grade level. 21% are illiterate in the US.

LLMs surpassed the average human a long time ago IMO. When LLMs fail to measure up to humans, it's that they fail to measure up against human experts in a given field, not the Average Joe.

We are surrounded by NPCs.

famouswaffles · 2026-02-13T19:48:04 1771012084

The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

kristopolous · 2026-02-13T19:54:22 1771012462

they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

famouswaffles · 2026-02-13T19:56:45 1771012605

I don't see the authors of those libraries getting a credit on the paper, do you ?

>I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

Sure

what · 2026-02-14T04:02:55 1771041775

At least one of the authors works for OpenAI. It’s a puff piece.

name_taken_duh · 2026-02-13T19:59:53 1771012793

And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?

palmotea · 2026-02-13T20:19:08 1771013948

> And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?

Do you really want to be treated like an old PC (dismembered, stripped for parts, and discarded) when your boss is done with you (i.e. not treated specially compared to a computer system)?

But I think if you want a fuller answer, you've got a lot of reading to do. It's not like you're the first person in the world to ask that question.

name_taken_duh · 2026-02-14T00:30:15 1771029015

You misunderstood, I am prohumanism. My comment was about challenging the believe that models cant be as intelligent as we are, which cant be answered definitely, though a lot of empirical evidence seems to point to the fact, that we are not fundamentally different intelligence wise. Just closing our eyes will not help in preserving humanism, so we have to shape the world with models in a human friendly way, aka alignment.

kristopolous · 2026-02-13T20:34:33 1771014873

It's always a value decision. You can say shiny rocks are more important than people and worth murdering over.

Not an uncommon belief.

Here you are saying you personally value a computer program more than people

It exposes a value that you personally hold and that's it

That is separate from the material reality that all this AI stuff is ultimately just computer software... It's an epistemological tautology in the same way that say, a plane, car and refrigerator are all just machines - they can break, need maintenance, take expertise, can be dangerous...

LLMs haven't broken the categorical constraints - you've just been primed to think such a thing is supposed to be different through movies and entertainment.

I hate to tell you but most movie AIs are just allegories for institutional power. They're narrative devices about how callous and indifferent power structures are to our underlying shared humanity

Refreeze5224 · 2026-02-13T19:55:29 1771012529

Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?

famouswaffles · 2026-02-13T20:00:04 1771012804

It's a stupid point then. Are you able to work with a world leading physicist to any significant degree? No

emil-lp · 2026-02-13T20:37:24 1771015044

It's like saying: calculator drives new result in theoretical physics

(In the hands of leading experts.)

famouswaffles · 2026-02-13T20:45:47 1771015547

No it's not like saying that at all, which is why Open AI have a credit on the paper.

camdenreslink · 2026-02-13T21:21:24 1771017684

Open AI have a credit on the paper because it is marketing.

famouswaffles · 2026-02-13T21:53:40 1771019620

Lol Okay

not_kurt_godel · 2026-02-13T22:02:32 1771020152

And even if it were, calculators (computers) were world-changing technology when they were new.

Nevermark · 2026-02-13T23:15:53 1771024553

No it’s like saying: New expert drives new results with existing experts.

The humans put in significant effort and couldn’t do it. They didn’t then crank it out with some search/match algorithm.

They tried a new technology, modeled (literally) on us as reasoners, that is only just being able to reason at their level and it did what they couldn’t.

The fact that the experts were a critical context for the model, doesn’t make the models performance any less significant. Collaborators always provide important context for each other.

emil-lp · 2026-02-13T19:35:42 1771011342

It didn't solve it, it simply found that it had been solved in a publication and that the list of open problems wasn't updated.

Davidzheng · 2026-02-13T19:38:12 1771011492

My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.

https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

carefree-bob · 2026-02-13T21:44:55 1771019095

I am not aware of any unsolved Erdos problem that was solved via an LLM. I am aware of LLMs contributing to variations on known proofs of previously solved Erdos problems. But the issue with having an LLM combine existing solutions or modify existing published solutions is that the previous solutions are in the training data of the LLM, and in general there are many options to make variations on known proofs. Most proofs go through many iterations and simplifications over time, most of which are not sufficiently novel to even warrant publication. The proof you read in a textbook is likely a highly revised and simplified proof of what was first published.

If I'm wrong, please let me know which previously unsolved problem was solved, I would be genuinely curious to see an example of that.

Davidzheng · 2026-02-13T22:01:30 1771020090

It's in the link above, but you can look at #1051 or #851 on the erdosproblems website.

carefree-bob · 2026-02-13T22:35:36 1771022136

The erdosproblems website shows 851 was proved in 1934. https://www.erdosproblems.com/851

I guess 1051 qualifies - from the paper: "Semi-autonomous mathematical discovery with gemini" https://arxiv.org/pdf/2601.22401

"We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems [KN16], but none fully resolves Erdős-1051. Moreover, it does not appear to us that Aletheia’s solution is directly inspired by any previous human argument (unlike in many previously discussed cases), but it does appear to involve a classical idea of moving to the series tail and applying Mahler’s criterion. The solution to Erdős-1051 was generalized further, in a collaborative effort by Aletheia together with human mathematicians and Gemini Deep Think, to produce the research paper [BKK+26]."

Davidzheng · 2026-02-13T23:00:18 1771023618

"The erdosproblems website shows 851 was proved in 1934." I disagree with this characterization of the Erdos problem. The statement proven in 1934 was weaker. As evidence for this, you can see that Erdos posed this problem after 1934.

carefree-bob · 2026-02-15T04:30:54 1771129854

You recommended I look at the erdosproblems website.

But evidence that it was posed after 1934 is not really evidence it was not solved, because one of the things we learned from LLMs was that many of these problems were already solved in the literature, or are relatively straightforward applications of known, yet obscure, results. Particularly in the world of Erdos problems, the majority of which can be described as "off the beaten path" and are basically musings in papers that Erdos was asking -- many of these are in fact solved in more obscure articles and no one made the connection until LLMs allowed us to do systematic literature searches. This was the primary source of "solutions" of these problems by LLMs in the cited paper.

Davidzheng · 2026-02-15T04:45:44 1771130744

The Erdos Problem site also does not say it was solved in 1934. If you read the full sentence there, it refers to a different statement proven which is related.

emp17344 · 2026-02-13T20:29:35 1771014575

Some of these were initially hyped as novel solutions, and then were quietly downgraded after it was discovered the solutions weren’t actually novel.

Insanity · 2026-02-13T21:02:37 1771016557

Yeah that was also my take-away when I was following the developments on it. But then again I don't follow it very closely so _maybe_ some novel solutions are discovered. But given how LLMs work, I'm skeptical about that.

compass_copium · 2026-02-13T22:06:37 1771020397

...am I wrong in thinking that 1(a) is the relevant section here, and shows a lot of red?

Davidzheng · 2026-02-13T22:20:01 1771021201

I honestly don't see the point of the red data points. By now all the erdos problems have been attempted by AIs--so every unsolved one can be a red data point.

compass_copium · 2026-02-14T02:16:14 1771035374

The post's author points that out as well

emil-lp · 2026-02-13T19:23:14 1771010594

This is news to me. Source?

michaelt · 2026-02-14T00:47:25 1771030045

That's because it's not true.

https://www.merriam-webster.com/dictionary/ordering

Order - transitive verb - 1. to put in order : arrange - "The books are ordered alphabetically by author."

noun - 4. b(1) the arrangement, organization, or sequence of objects or of events - "alphabetical/chronological/historical order" "listed the items in order of importance"

https://www.merriam-webster.com/dictionary/sorting

Sort - transitive verb - 1. to put in a certain place or rank according to characteristics - "sort the mail" "sorted the winners from the losers" "sorting the data alphabetically"

noun - 5. an instance of sorting - "a numeric sort of a data file"

rkomorn · 2026-02-14T03:36:10 1771040170

I don't get how this disagrees with what GGP wrote.

aGHz · 2026-02-14T03:28:29 1771039709

https://dictionary.cambridge.org/dictionary/english/sort

to put a number of things in an order or to separate them into groups: Paper, plastic, and cans are sorted for recycling.

sort something into something I'm going to sort these old books into those to be kept and those to be thrown away.

sort something by something You can use the computer to sort the newspaper articles alphabetically, by date, or by subject.

sort (through) She found the ring while sorting (through) some clothes.

nh23423fefe · 2026-02-13T21:00:06 1771016406

"The children were sorted in to two lines by gender then ordered by height"

You might substitute "sorted by height" but its certainly not a correction. While "ordered into lines" would be an error.

thom · 2026-02-13T20:55:38 1771016138

What do you do when you sort your washing?

ninalanyon · 2026-02-13T21:47:29 1771019249

The sorting office for a postal service.

emil-lp · 2026-02-13T19:22:34 1771010554

W is the span or range.

emil-lp · 2026-02-10T17:18:24 1770743904

He's also very active at Stack Exchange

– https://stackexchange.com/users/510234/joel-david-hamkins

– https://mathoverflow.net/users/1946/joel-david-hamkins

emil-lp · 2026-02-10T17:15:24 1770743724

https://web.archive.org/web/20251015174117/https://www.infin...

FillMaths · 2026-02-10T17:38:42 1770745122

This one has the paywall, but the main site has no paywall currently.