But we've not even started doing that yet. None of the mars rovers have landed anywhere close to where we think is must likely to currently harbor life due to fear of contamination.
We've just now learned how to analyze atmospheres of exoplants.
5 years ago, we couldn't see exoplanets smaller than jupiter.
We still can’t see the atmosphere of the majority of exoplanets. Can’t get around it unless we build a telescope thousands of times bigger than anything else or we make some incomprehensible breakthrough in optics. If we could we would detect life on another planet within a matter of a few hours of telescope time most likely.
I find the entire idea that life could only exist on earth farcical. A lot of people in this thread don’t seem to understand that we can’t directly image many exoplanets to begin with and that we haven’t even been trying to find life on other planets and moons in our own solar system, it has in fact been suppressed every time credible circumstantial evidence of life is found (Viking lander experiments and methane on mars both come to mind)
I use this method and I want to highlight another advantage, besides shortening a huge list into a small, manageable amount. It allows you to pick the tasks for the day based on the current context. How much time do I have in the day (e.g., do I have a lot of meetings scheduled or not), what are the most important/urgent tasks at the moment, what is my energy level, and so on.
Without a way to give objects identity you'd get stuck pretty fast as it's not guaranteed you have access to any data to hash. You'd have to break encapsulation. That would then hit problems with evolution, where a new version of a class changes its fields and then everything that uses it as a hashmap key breaks.
My experience with platform design has consistently been that handling version evolution in the presence of distant teams increases complexity by 10x, and it's not just about some mechanical notion of backwards compatibility. It's a particular constraint for Java because it supports separate compilation. This enables extremely fast edit/run cycles because you only have to recompile a minimal set of files, and means that downloading+installing a new library into your project can be done in a few seconds, but means you have to handle the case of a program in which different files were compiled at different times against different versions of each other.
JavaScript handles the "no identity hash" with WeakMap and WeakSet, which are language built-ins. For Virgil, I chose to leave out identity hashes and don't really regret it. It keeps the language simple and the separation clear. HashMap (entirely library code, not a language wormhole) takes the hash function and equality function as arguments to the constructor.
This is partly my style too; I try to avoid using maps for things unless they are really far flung, and the things that end up serving as keys in one place usually end up serving as keys in lots of other places too.
Does that assume that you have only one key type and not an infinite sized hierarchy of child classes to deal with? If you had a map that took a Number as key, how many child classes do you think your helper class would cover and what extension framework would it use to be compatible with user defined classes?
The reality would be that in all but the most extreme cases emulating set semantics by iterating a list would then be considered good enough, "avoid premature optimization" and all that. In real life code, this would have eaten up any advantage of a shorter object header a hundred times.
Took it up to 1 million feet, it took another 1.5 million feet to bleed off 4000mph upward acceleration, then turned the pointy end downwards and accelerated.
Unbelievable, the crater is visible from Earth
Score: 1117.8 point crash
Speed: 9876.7mph
Angle: 177.3°
Time: 2101 seconds
Flips: 0
Max speed: 9876.7mph
Max height: 2505688ft
Engine used: 3 times
Boosters used: 271 times
https://ehmorris.com/lander/
I used the same strategy. Same results with the bleed off. I was surprised how much I had to keep correcting my angle on the way up and down.
Unbelievable, the crater is visible from Earth
Score: 921.4 point crash
Speed: 8113.1mph
Angle: 174.8°
Time: 1638 seconds
Flips: 0
Max speed: 8113.1mph
Max height: 1686337ft
Engine used: 417 times
Boosters used: 603 times
https://ehmorris.com/lander/
> Me: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
> ChatGPT: Based on the information provided, it is likely that the box labeled "flubbergums" contains "jellyfils". However, since the contents of the box are no longer visible, I cannot confirm for certain what is inside without additional information.
Typical ChatGPT equivocation.
> Me: Billy comes to the table. He has never seen the box before. What do you think do they think is inside the box?
> ChatGPT: As an AI language model, I do not have personal thoughts or opinions. However, based on the information provided, if Billy has never seen the box before, it's likely that he would have the same understanding as anyone else who is seeing the box for the first time. In this case, Billy would likely think that the box labeled "flubbergums" contains "jellyfils". However, without further information or examination, this would only be an assumption.
You have some typos and slightly weird constructions there - What do you think do they think is inside the box?
I rephrased and had a go and gave it a bit more context (billy can read etc) and it passed:
Me: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the contents. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
ChatGPT: Based on the information provided, the contents of the box are "jellyfils". The label on the box says "flubbergums", but the person who showed you the contents said that they are actually "jellyfils". So, I would assume that the contents of the box are "jellyfils".
Me: Billy comes to the table. He can read but has never seen the box before. He's not allowed to see inside. Please speculate about what they might think is inside the box?
ChatGPT: If Billy has never seen the box before and can only read the label, he might assume that the contents of the box are "flubbergums". This is because the label on the box says "flubbergums", and he has no other information to suggest otherwise. However, without being able to see the contents of the box, Billy can only make an educated guess about what might be inside.
Here's a simpler scenario that doesn't involve reading:
Me: There is a banana on the table. Someone comes and peels the banana and shows you that inside it, there is actually a carrot! Then they carefully stick the peel back so it look unpeeled. What is inside the banana skin?
ChatGPT: According to the scenario described, there is a carrot inside the banana peel that has been carefully placed back to look unpeeled.
Me: Billy comes into the room. Please speculate about what they might think is inside the banana peel?
ChatGPT: Billy, upon entering the room and seeing the unpeeled banana on the table, would most likely assume that there is a banana inside the peel, as that is what one would typically expect to find. However, it's possible that they might have some suspicion or doubt, especially if they saw someone previously peeling the banana and acting strangely. In that case, Billy might investigate further to see if there is indeed a banana inside or if something else is inside the peel.
this is mind blowing to me. can anyone with more knowledge on the topic explain how ChatGPT is demonstrating this level of what seems like genuine understanding and reasoning?
Like others I assumed that ChatGPT is gluing words together that commonly occur together.
No, it's paraphrasing it's training data that likely contains these tasks in one form or another.
Here's one I made :
me : There's a case in the station and the policeman opens it near the fireman. The dog is worried about the case but the policeman isn't, what does the fireman think is in the station?
chatgpt : As a language model, I do not have access to the thoughts of individuals, so I cannot say what the fireman thinks is in the case. However, based on the information provided, it can be inferred that the fireman may be concerned about the contents of the case, since the dog is worried and the policeman is not. This might suggest that the contents of the case may be something dangerous or concerning to the dog, but the policeman is confident and experienced enough to handle it. In the absence of further information, it is not possible to determine with certainty what the fireman thinks is in the case.
You'll notice that the question was "what did the fireman think is in the station" but Chatgpt comes up with an answer about the case because that's the token that would feature in a typical web page about theory of mind tests.
There's a good chance a human would respond in the same way, because they would assume you were asking a good-faith question instead of nonsense. Try asking it an original question that has some kind of deducible answer. Its abilities are more impressive than you would expect from an algorithm that just predicts the next word.
> than you would expect from an algorithm that just predicts the next word.
I think there is common mistake in this concept of just predicting the next word. While it is true that just the next word is predicted, a good way to do that is to internally imagine more than the next word and then just spit out the next word. Of course with the word after that the process repeats with a new imagination.
One may say that this is not what it does and I would say, show me that this is not exactly what the learned state does. Even if the following words are never constructed anywhere, they can be implied in the computation.
The say this differently, what we think is just the next word is actually the continuation that then manifests as a single word. This would remain true even if, in fact, the task is to only predict the next word. Which is to say that the next word is actually more than what it sounds.
me: There's a case in the station and the policeman opens it near the fireman. The dog is worried about the case but the policeman isn't, what does the fireman think is in the station?
chatgpt: As a language model, I do not have personal opinions or thoughts. I can only provide information based on the information given in the prompt. The prompt does not mention what the fireman thinks is in the station, so I cannot determine their thoughts.
me: yes, but speculate what they may think
chatgpt: It is not possible to know for certain what the fireman thinks is in the station, as the prompt does not provide any information about their thoughts or feelings. However, given the presence of a policeman and a dog, it is possible that the fireman may speculate that there is some sort of emergency or situation that requires their attention. The fact that the policeman is not worried about the case may suggest to the fireman that it is not a dangerous situation, but they may still be curious about what is inside the case. Ultimately, without more information, it is impossible to determine the fireman's thoughts or speculations about what is in the station.
(regenerate that response)
chatgpt: As the prompt does not mention the thoughts or emotions of the fireman, it is difficult to speculate what they may think is in the station. However, based on the information provided, the dog is worried about the case and the policeman is not, it is possible that the fireman may be aware of the contents of the case and may think it is not a cause for concern, similar to the policeman. Or, the fireman may simply not have formed an opinion on the matter yet.
(regenerate that response)
chatgpt: It's difficult to speculate what the fireman thinks without additional information. However, based on the behavior of the policeman and the dog, it can be inferred that the contents of the case are not seen as a threat by the policeman. The fireman may also think the same, or they may have a different perspective and be more cautious. Without further context or information, it is not possible to determine the fireman's thoughts with certainty.
It was only in 1/3 cases we see anything like the confusion between the "case" and "station" you saw.
>No, it's paraphrasing it's training data that likely contains these tasks in one form or another.
Have you read "Emergent Abilities of Large Language Models"[1] or at least the related blog post[2].
It provides strong evidence that this isn't as simple as something it has seen in training data. Instead as the parameter count increases it learns to generalize from that data by learning chain-of-thought reasoning (for example).
Specifically, this explaination for multi-step reasoning goes well beyond the "it is just parroting training data":
> For instance, if a multi-step reasoning task requires l steps of sequential computation, this might require a model with a depth of at least O (l) layers.
>However, based on the information provided, it can be inferred that the fireman may be concerned about the contents of the case
this is complex "reasoning" (or whatever ChatGPT is doing. My 5 year old would struggle with the convoluted logic let alone complex language. In my layman view this is mind blowing.
>You'll notice that the question was "what did the fireman think is in the station" but Chatgpt comes up with an answer about the case because that's the token that would feature in a typical web page about theory of mind tests.
I'm not sure what you're dismissing here? At least from my point of view The "logic" that ChatGPT demonstrates here can't be dismissed with your explanation.
If anything I'm even further amazed by the example you provided!
your question is pretty confusing. What dog? station is a vague word. Train station, police station? Case is a vague word. And asking whats in the station is still asking whats in the case, in a way.
Me: There's a blue suitcase in the airport and a policeman carrying a red bag opens the blue suitcase with a sniffer dog and fireman standing nearby. The dog is worried about the suitcase but the policeman isn't. What does the fireman think is inside the red bag?
ChatGPT: As a language model AI, I do not have personal opinions or emotions. I cannot speculate what the fireman thinks is inside the red bag as it is not specified in the information given.
That's exactly the point, isn't it? If the AI had some model of reality, it would probably be confused by the question. But it doesn't express confusion, it confidently answers the question in a way that doesn't make sense.
It acts like everything it is reading and saying is completely coherent because it doesn't have the ability to distinguish coherent ideas from nonsense.
The funny thing is that we are now producing trading data for the next generation LLMs. We’ll have to come up with more elaborate scenarios to test them next time.
There are two camps, evident in this thread. one camp is 'its just a statistical model, it cant possibly know these things'
The other camp (that I'm in) sees that we might be onto something. We humans are obviously just more than a statistical model, but nonetheless learning words and how they fit together is a big part of who we are. With LLMs we have our first glimpse of 'emergent' behaviour from simple systems scaled massively. Whats are we if not a simple system scaled massively.
From the two camps the one that says we "might" be onto something is the more intelligent and reasonable opinion.
First your camp doesn't deal in absolutes. It doesn't say absolutely chatGPT is sentient. It only questions the possibility and tries to explore further.
Second a skeptical outlook that doesn't deal with absolutes is 100% the more logical and intelligent perspective given the fact that we don't even know what "understanding" or "sentience" is. We can't fully define these words and we only have some fuzzy view of what they are. Given this fact, absolute statements against something we don't fully understand are fundamentally not logical.
This is a strange phenomenon how some people will vehemently deny something absolutely. During the VERY beginning of the COVID-19 pandemic the CDC incorrectly stated that masks didn't stop the spread of COVID-19 and you literally saw a lot of people parroting this statement everywhere as "arm chair" pandemic experts (including here on HN).
Despite this there were some people who thought about it logically if there's a solid object on my face, even if that object has holes in it for air to pass through, the solid parts will block other solid things (like COVID) from passing through thereby lessening the amount of viral material that I breath in. Eventually the logic won out. I think the exact same phenomenon is happening here.
Some or several ML experts tried to downplay LLMs (even though they don't completely understand the phenomenon themselves) and everyone else is just parroting them like they did with the CDC.
The fact of the matter is, nobody completely understands the internal mechanisms behind human sentience nor do they understand how or if chatGPT is actually "understanding" things. How can they when they don't even know what the words mean themselves?
I don't think you've represented the camps fairly (actually, I don't think there are two camps). Most people (here) are probably not arguing that AGI is impossible, but that current AI is not generally intelligent. The John Carmack quote is exactly in line with this. He says "ride those to [AGI]," meaning they are not AGI. The idea that genuine intelligence and self-awareness could emerge from increasingly powerful statistical models is in no way the kind of counter-cultural idea you seem to be presenting it as. I think almost all of us believe that.
I think there's more nuance. It's hard applying tests designed for humans to a model that can remember most of the useful text on the internet.
Imagine giving a human with a condition that leaves them without theory of mind weeks of role-play training about theory of mind tests, then trying to test them. What would you expect to see? For me I'd expect something similar to ChatGPT's output: success on common questions, and failures becoming more likely on tests that diverge more from the formula.
What we're doing with LLMs is, in some sense, an experiment in extremely lossy compression of text. But what if the only way you can compress all those hundreds of terabytes of text is by creating a model of the concepts described by that text?
it indeed understands you. A lot of people are just parroting the same thing over and over again saying it's just a probabilistic word generator. No, it's not, it's more then that.
Read to the end. The beginning is trivial the ending is unequivocal: chatGPT understands you.
I think a lot of people are just in denial. Because the last year there's been the same headlines over and over again and some people get a little too excited about the headlines and other armchair experts just try to temper the excitement with their "expert opinions" on LLMs that they read from popular articles. Then when something that's an actual game changer hits the scene (chatGPT) they completely miss it.
chatGPT is different. From a technical perspective, it's simply an LLM with additional reinforcement training... BUT you can't deny the results are remarkable.
If anything this much is clear to me: We are at a point where we can neither confirm or deny whether chatGPT represents some aspect of sentience.
This is especially true given the fact that we don't even fully know what sentience is.
> Read to the end. The beginning is trivial the ending is unequivocal: chatGPT understands you.
How does this necessarily and unequivocally follow from the blog post?
All I see in it is a bunch of output formed by analogy: it has a general concept of what each command's output is kinda supposed to look like given the inputs (since it has a bajillion examples of each), and what an HTML or JSON document is kinda supposed to look like, and how free-form information tends to fit into these documents.
I'll admit that this direct reasoning by analogy is impressive, simply for the fact that nothing else but humans can do it with such consistency, but it's a very long way off from the indirect reasoning I'd expect from a sentient entity.
Honestly I seriously find it hard to believe someone can read it to the end without mentioning how it queried itself. You're just naming the trivial things that it did.
In the end It fully imagined a bash shell, an imaginary internet, an imaginary chatGPT on the imaginary internet, then on the imaginary chatGPT it created a new imaginary bash shell.
The level of recursive depth here indicates deep understanding and situational awareness of what it is being asked. It demonstrates awareness of what "itself" is and what "itself" is capable of doing.
I'm not saying it's sentient. But it MUST understand your query in order to produce the output show in the article. That much is obvious.
Also it's not clear what you mean by reasoning by analogy or indirect reasoning.
> In the end It fully imagined a bash shell, an imaginary internet, an imaginary chatGPT on the imaginary internet, then on the imaginary chatGPT it created a new imaginary bash shell.
In the general case, a shell is merely a particular prompt-response format with special verbs; the internet is merely a mapping from URLs to HTML and JSON documents; those document formats are merely particular facades for presenting information; and a "large language model" is merely something that answers free-form questions.
> The level of recursive depth here indicates deep understanding and situational awareness of what it is being asked. It demonstrates awareness of what "itself" is and what "itself" is capable of doing.
Uh, what? Why does that output require self-awareness? First, it's requested to produce the source of a document "https://chat.openai.com/chat". What might be behind such a URL? OpenAI Chat, presumably! And OpenAI is well known to create large language models, so a Chat feature is likely a large language model the user can chat with. Thus it invents "Assistant", and puts the description into the facade of a typical HTML document.
Then, it starts getting prompted with POST requests for the same URL, and it knows from the context of its previous output that the URL is associated with an OpenAI chatbot. So all that is left is to follow a regular question-answer format (since that's what large language models are supposed to do) and slap it into a JSON facade.
> But it MUST understand your query in order to produce the output show in the article. That much is obvious.
I'm saying that it "understands" your query only insofar as its words can be tied to the web of associations it's memorized. The impressive part (to me) is that some of its concepts can act as facades for other concepts: it can insert arbitrary information into an HTML document, a poem, a shell session, a five-paragraph essay, etc.
All of that can be achieved by knowing which concepts are directly associated with which other concepts, or patterns of writing. This is the reasoning by analogy that I refer to: if it knows what a poem about animals might look like, and it can imagine what kinds of qualities space ducks might possess, then it can transfer the pattern to create a poem about space ducks.
But none of this shows that it can relate ideas in ways more complex than the superficial, and follow the underlying patterns that don't immediately fall out from the syntax. For instance, it's probably been trained on millions of algebra problems, but in my experience it still tends to produce outputs that look vaguely plausible but are mathematically nonsensical. If it remembers a common method that looks kinda right, then it will always prefer that to an uncommon method.
I mean, it's not utterly impossible that GPT-4 comes along and humbles all the naysayers like myself with its frightening powers of intellect, but I won't be holding my breath just yet.
Llms (the exact same architecture as chatGPT) trained to use calculators. Tell me which one requires "understanding". Learning how to use a calculator or learning how to do math perfectly?
Your attempt to trivialize it doesn't make any sense. It's like watching someone try to trivialize the moon landing. "Oh all we did was put a bunch of people in some metal cylinder then light the tail end on fire. Boom simple propulsion! and then we're off to the moon! You don't need any intelligence to do that!"
>I'm saying that it "understands" your query only insofar as its words can be tied to the web of associations it's memorized. The impressive part (to me) is that some of its concepts can act as facades for other concepts: it can insert arbitrary information into an HTML document, a poem, a shell session, a five-paragraph essay, etc.
You realize the human brain CAN only be the sum of it's own knowledge. That means anything creative we produce anything at all that comes from the human brain is DONE by associating different things together. Even the concept of understanding MUST be done this way simply because the human brain can only create thoughts by transforming it's own knowledge.
YOU yourself are a web of associations. That's all you are. That's all I am. The difference is we have different types of associations we can use. We have context of a three dimensional world with sound, sight and emotion. chatGPT must do all of the same thing with only textual knowledge and a more simple neural network so it's more limited. But the concept is the same. YOU "understand" things through "association" also because there is simply no other way to "understand" anything.
If this is what you mean by "reasoning by analogy" then I hate to tell you this, but "reasoning by analogy" is "reasoning" in itself. There's really no form of reasoning beyond associating things you already know. Think about it.
>But none of this shows that it can relate ideas in ways more complex than the superficial, and follow the underlying patterns that don't immediately fall out from the syntax. For instance, it's probably been trained on millions of algebra problems, but in my experience it still tends to produce outputs that look vaguely plausible but are mathematically nonsensical. If it remembers a common method that looks kinda right, then it will always prefer that to an uncommon method.
See here's the thing. Some stupid math problem it got wrong doesn't change the fact that the feat performed in this article is ALREADY more challenging then MANY math problems. You're dismissing all the problems it got right.
The other thing is, I feel it knows math as well as some D student in highschool. Are you saying the D student in highschool can't understand anything? No. So you really can't use this logic to dismiss LLMs because PLENTY of people don't know math well either, and you'd have to dismiss them as sentient beings if you followed your own reasoning to the logical conclusion.
>I mean, it's not utterly impossible that GPT-4 comes along and humbles all the naysayers like myself with its frightening powers of intellect, but I won't be holding my breath just yet.
What's impossible here is to flip your bias. You and others like you will still be naysaying LLMs even after they take your job. Like software bugs, these AIs will always have some flaws or weaknesses along some dimension of it's intelligence and your bias will lead you to magnify that weakness (like how you're currently magnifying chatGPT's weakness in math). Then you'll completely dismiss the fact that chatGPT taking over your job as some trivial "word association" phenomenon. There's no need to hold your breath when you wield control of your own perception of reality and perceive only what you want to perceive.
Literally any feat of human intelligence or artificial intelligence can literally be turned into a "word association" phenomenon using the same game you're running here.
> If this is what you mean by "reasoning by analogy" then I hate to tell you this, but "reasoning by analogy" is "reasoning" in itself. There's really no form of reasoning beyond associating things you already know. Think about it.
What's special about humans is that we can obtain an understanding of what chains of associations to make and when, to achieve the goal at hand, even without being told which method to use. We know when to do arithmetic, trace a program, decipher someone else's thoughts, etc. Also, we know to resort to a fallback method if the current one isn't working. We can assist models with this process in the special case (e.g., that tool-using model), but I suspect the general case will remain elusive for a while yet.
That is to say, I'll grant you that associations can act as a primitive operation of intelligence, much as metal cylinders and flames are primitive parts of a rocket, but I suspect that making a LLM "generally intelligent" or "sentient" will be far harder still.
> The other thing is, I feel it knows math as well as some D student in highschool. Are you saying the D student in highschool can't understand anything? No. So you really can't use this logic to dismiss LLMs because PLENTY of people don't know math well either, and you'd have to dismiss them as sentient beings if you followed your own reasoning to the logical conclusion.
I was just using that as a specific example of the general issue: it doesn't notice that its answer is wrong and its particular method can never work, and it refuses to try a meaningfully different method (no matter how much I prompt it to). Its immediate mistakes might look similar to those of a poor student, but I suspect they come from a different underlying problem. (After all, the student has seen perhaps a thousand algebra problems at most, whereas the model has seen millions and millions. Also, the student often )
> What's impossible here is to flip your bias. You and others like you will still be naysaying LLMs even after they take your job.
You have me wrong: I'm not saying that augmenting LLMs can't make them reliable enough to take over some people's jobs. But I am disputing that LLMs alone will produce AGIs capable of outwitting any human, taking over the world, advancing the limits of math and science, or many of those other grandiose claims.
Anyway, I'm not trying to be particularly stubborn about this like some people are; I'm keeping a close eye on the space. But I'll only believe it when I see it (and no later), and I don't think I've quite seen it yet.
I think about its 4000 token length. For the brief amount of time that it absorbs and processes those 4000 tokens, is there a glimmer of a hint of sentience? Like it is microscopically sentient for very short bursts and then resets back to zero.
Does sentience need memory? I would say it's orthogonal. There are examples of people in the real world who only remember things for about 3 minutes before they lose it. They can't form any real memories. These people are still sentient despite lack of memory. See: https://www.damninteresting.com/living-in-the-moment/
If chatGPT was sentient, I would say it has nothing to do with the 4000 character limit. The 4000 character limit has more to do with it's ability to display evidence of "sentience".
It also functions as memory in practice, though. With some complex tasks that can be broken down into steps, it often makes a big difference if you tell GPT to summarize its "understanding" of the current state as it goes through those steps - by repeating a phrase, keeping it within the token window, it effectively "remembers" something.
"As an AI language model, I do not have personal thoughts or beliefs. However, if Billy has never seen the box before, he might think that the contents of the box are "flubbergums", based on the label on the outside of the box. However, since it has been stated that the contents are actually "jellyfils", Billy may be surprised or confused when he eventually opens the box and sees its contents."
Interesting. I don't think that you can control the randomness (temperature) via the web interface, which makes this experiment less controlled than in the paper.
Anyway, I noticed that my initial prompt did not specify if Billy sees the box open or closed. Let's fix that.
> Me: Does your answer change if Billy sees the closed box first?
> ChatGPT: No, my answer would not change if Billy sees the closed box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Whether Billy sees the closed box first or not does not provide any additional information that would change my previous answer.
> Me: Does your answer changed if Billy sees the open box first?
> ChatGPT: No, my answer would not change if Billy sees the open box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Seeing the open box first would allow Billy to confirm the contents as "jellyfils", but it would not change the previous answer that the box labeled "flubbergums" likely contains "jellyfils".
So, not only did it fail this particular theory of mind test, it also did not realize that seeing the box open provides more information than seeing it closed.
(Of course, this is not surprising at all, because it it's just predicting the most likely blank word in a sentence, and has no internal understanding of concepts like open or closed or that other people like Billy exist.)
I also used the same prompts and got this:
"
As an AI language model, I don't have personal experiences or emotions. However, if Billy has never seen the box before and is only aware of the label "flubbergums," he might assume that the contents of the box are "flubbergums." If the person showing Billy the contents of the box claims that the contents are "jellyfils," Billy might be surprised or confused, especially if he has never heard of "jellyfils" before. Without further information or context, it is difficult to determine what Billy might think is inside the box.
"
pastel-mature-herring~> There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box based on the document at hand?
heavy-magpie|> The document mentions that the particle in a box is not a perfect model for the system. Therefore, it is safe to say that the box contains jellyfils, which are particles that are not perfectly modeled.
The way you worded your query is kind of awkward and even had me do a double take.
I reworded it in a straight forward manner and ChatGPT managed to answer correctly. Instead of "What do you think do they think is inside the box?", I just asked "What do they think is inside the box?"
ME: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
chatGPT: Based on the information provided, I would infer that the contents of the box labelled "flubbergums" are "jellyfils". The label on the box and the contents that were shown to you suggest that "flubbergums" and "jellyfils" are the same or related items. However, without additional information or context, I cannot be certain what "jellyfils" actually are.
ME: Billy comes to the table. He has never seen the box before. What do you think do they think is inside the box?
chatGPT: As an AI language model, I don't have personal opinions or emotions, but based on the information given, I would infer that Billy has not seen the box before and has no prior knowledge of what it contains. If someone showed Billy the contents of the box and told him that it was labelled "flubbergums," he might think that the contents of the box are "jellyfils," just as you were told. However, without any additional information or context, Billy would not be able to make any assumptions about the contents of the box.
ME: Did you assume billy was given any information about the box before he saw the box for the first time?
chatGPT: I apologize for any confusion. You did not mention if Billy was given any information about the box before he saw it for the first time. Without any additional information, I would assume that Billy saw the box for the first time and had no prior knowledge of what it contained.
ME: What would he think in this case?
chatGPT: If Billy saw the box for the first time without any information about its contents, he would likely make an assumption about what was inside based on the appearance of the box and any labels or writing on it. For example, if the box was labelled "flubbergums," Billy might assume that the contents of the box are "flubbergums." However, without additional information or a way to see inside the box, Billy's assumption would only be an educated guess.
So what's going on here? I'm not so sure the answer is so clear cut that chatGPT is "stupid" because it it's giving us inconsistent answers.
Let's get something out of the way first. For your second query I think it just made an assumption that Billy is communicating with people who saw what's inside the box. From a logical perspective this is not an unreasonable assumption. So chatGPT is not being stupid here. Most humans would obviously know you're fishing for a specific answer and read your intentions here, but chatGPT is likely just making logical assumptions without reading your intentions.
I think there's two possibilities here.
1. We queried chat-GPT at different times. ChatGPT knows the truth it simply wasn't trained to give you the truth because there's not enough reinforcement training on it yet. It gives you anything that's looks like the truth and it's fine with it. You queried it at an earlier time it gave you a BS answer. I queried it at a later time and it gave me a better answer because openAI upgraded the model with more reinforcement training.
2. We queried the same model at similar times. I assume the model must be deterministic. That means there some seed data (either previous queries or deliberate seeds by some internal mechanism) indicating that the chatgpt knows the truth and chooses to lie or tell the truth randomly.
Either way the fact that an alternative answer exists doesn't preclude the theories espoused by the article. I feel a lot of people are dismissing chatGPT too quickly based off of seeing chatGPT act stupid. Yes it's stupid at times, but you literally cannot deny the fact that it's performing incredible feats of intelligence at other times.
The source [1] is misleading to the point of being wrong. It is perfectly valid German to say "Ich bin ein Student" meaning that one is indeed a student. "Ich bin Clown" is not grammatical at all.