Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is this a benchmark though? It doesn’t correlate with intelligence


It started as a joke, but over time performance on this one weirdly appears to correlate to how good the models are generally. I'm not entirely sure why!


it has to do with world model perception. these models don't have it but some can approximate it better than others.


It's simple enough that a person can easily visualize the intended result, but weird enough that generative AI struggles with it


I'm not saying its objective or quantitative, but I do think its an interesting task because it would be challenging for most humans to come up with a good design of a pelican riding a bicycle.

also: NITPICKER ALERT


I think its cool and useful precisely because its not trying to correlate intelligence. It's a weird kind of niche thing that at least intuitively feels useful for judging llms in particular.

I'd much prefer a test which measures my cholesterol than one that would tell me whether I am an elf or not!


What test would be better correlated with intelligence and why?


When the machines become depressed and anxious we'll know they've achieved true intelligence. This is only partly a joke.


This already happens!

There have been many reports of CLI AI tools getting frustrated, giving up, and just deleting the whole codebase in anger.


There are many reports of CLI AI tools displaying words that humans express when they are frustrated and about to give up. Just what they have been trained on. That does not mean they have emotions. And "deleting the whole codebase" sounds more interesting, but I assume is the same thing. "Frustrated" words lead to frustrated actions. Does not mean the LLM was frustrated. Just that in its training data those things happened so it copied them in that situation.


This is a fundamental philosophical issue with no clear resolution.

The same argument could be made about people, animals, etc...


The difference is, people and animals have a body, nerve system and in general those mushy things we think are responsible for emotions.

Computers don't have any of that. And LLM's in particular neither. They were trained to simulate human text responses, that's all. How to get from there to emotions - where is the connection?


Don't confuse the medium with the picture it represents.

Porn is pornographic, whether it is a photo or an oil painting.

Feelings are feelings, whether they're felt by a squishy meat brain or a perfect atom-by-atom simulation of one in a computer. Or a less-than-perfect simulation of one. Or just a vaguely similar system that is largely indistinguishable from it, as observed from the outside.

Individual nerve cells don't have emotions! Ten wired together don't either. Or one hundred, or a thousand... by extension you don't have any feelings either.

See also: https://www.mit.edu/people/dpolicar/writing/prose/text/think...


Do you think a simulation of a weather forcast is the same as the real weather?

(And science fiction .. is not necessarily science)


> Do you think a simulation of a weather forcast is the same as the real weather?

If sufficiently accurate... then yes. It is weather.

We are mere information, encoded in the ripples of the fabric of the universe, nothing more.


This only seems to be an issue for wishy washy types that insist gpt is alive.


A mathematical exam problem not in the training set because mathematical and logical reasoning are usually what people mean by intelligence.

I don’t think Einstein or von Neumann could do this SVG problem, does that mean they’re dumb?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: