Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the very first sample conversation they released:

- Human: Hi!

- Meena: Hey there! What's up?

- Human: Same old, Monday blues...what are you up to?

- Meena: I'm procrastinating on a paper I'm supposed to be writing.

- Human: are you thinking of pulling an all nighter?

- Meena: Possibly. What's your paper on?

The bot has already mistaken facts about its own crafted persona and the human in less than 2 meaningful interactions. It's hard to see how this represents any progress in the field at all.

I remain skeptical until we see better results.

from: https://github.com/google-research/google-research/blob/mast...



these nlp models are getting better and better but what we need ofc is for some model of the world to be constructed during the speech. If I tell you that yesterday I accidentally knocked a glass of water off the table and it fell on to soft carpet you could guess that it survived the fall without shattering. What we need is a chatbot that as you talk to it can update a 3d game/physics engine model from your words so that common sense implications of your statements can be gleaned. As you speak it will simulate what you are describing and then use info gathered from the simulation to draw conclusions.


This might sound unrealistically complicated, and the approach suggested here certainly is, but the idea of simulating a world is a very important concept in our current models of human intelligence, with very small children spending a lot of their time learning basic concepts of cause and effect, properties of objects, materials and physics and so on.

It reminds me of something I do in my current occupation, with a robot. It makes predictions about how a certain motor movement will result in a change in position, then measures progress in the real world. Should the prediction and reality diverge significantly (more than a particular amount), it stops and calls a human to look into what happened. I can imagine AI techniques using this kind of thing to guess what will happen next in a series of events, with the result's divergance from the guess weighting the learning rather than a steadily decreasing weight. Perhaps this is already a thing though, I'm no AI researcher


>very small children spending a lot of their time learning basic concepts of cause and effect, properties of objects, materials and physics and so on.

i do not believe that young children explicitly/consciously simulate in their minds to learn cause and effect and other things, though this might be up for debate


Who said anything about "consciously"? It doesn't need to be conscious to affect the conversation.


as to whether it is conscious or not influences whether you would want to model it a-la 3d physics engine, or if not then perhaps there is a more elegant solution we do not know of yet


You definitely need a model of the world, but you probably don't want to be constructing it during speech. What you want is a generative model that you've trained on the simulation data ahead of time so you can quickly make inferences once deployed.


How would that make a conversation more lifelike? It'll still be a limited model of the world, unless somehow you intend to give it the entire breadth of human experience, and will still fall to the same shortcomings as every chatbot, ask it about something it doesn't know and it'll try and fit a response that gives it away as a bot, act confused in a way that gives it away as a bot or just say something nonsensical.

There's things like metaphors, slang, topical references, all things that end up in conversations that change regularly in their meaning and usage. Then there's the unspoken parts of conversations. The implications behind different wordings, ways of writing things, words left unsaid, sarcasm, tone(even in text tone is conveyed in some form).

Humans do all these things while conversing almost subconsciously, each individual person does these things and comprehends these things uniquely based on years and years of sensory input and model building from a huge variety of sources, in a way that's individually unique and poorly understood. How would you even begin to train a model to even a remote semblance of that?


You don't even have to trick it with slang. Maybe I am behind but so far I have never seen a bot that can consistently or even most of the time understand context. They seem to all seriously struggle with anything more than a random sentence that was determined a probable result of the last message the user sent.


Well, often times human beings miss metaphors, slang, and topical references, especially if they are speaking in another cultural group.

The gulf between even the most disparate human tribes is nothing compared to the gulf between an AI chat-bot and a human being.

I think it's entirely possible to make bots with simple models of the world. (Not that this is really a move any closer to that)


Kind of an open problem on how to make a generative model which is sufficiently powerful :)

But that is basically what humans do - we don't run internal simulations, we have a model we've trained over a lifetime of experiences with the world that judges the likelihood of a given scenario


Judea Pearl has been talking about this gap like forever now. According to him, causal modeling is the missing piece. I do believe it is a serious step up (i mean, does it take a model to look at 350GB of text to have that presented conversation?) .. however I don't know enough to know whether that is enough.

(Reading TheBookOfNow and Causality in parallel)


I think you mean the book of why, not of now


Correct. Thanks and sorry about the typo. Too late to edit it, looks like.


I second the approach, although not in terms of physics engine or world model but models that hold metadata about how an abstraction affect another one. When all the abstractions and their relationships could be mapped out or inferred, the models would start making better sense.


GPT-2 showed us that simply having a huge parameter space can be enough for internal consistency. Have you read the unicorn valley article it generated?


The output was cherry picked.


This


Haha, indeed.

I think we're trying to stretch too far here. We've always dreamt of having a conversation with a bot without actually constructing the intelligence behind that bot. And it's not going to work.

We should start with something simpler, like cats, which definitely have feeling, but have a mental capacity that is closer to what we may be able to emulate through software. And then monitor progress through its 'meowing', 'purring' and 'tail wagging'.


We've always dreamt of having a conversation with a bot without actually constructing the intelligence behind that bot. And it's not going to work.

Well, at least the ML people have.

This is automated nattering. These systems have no clue, but pretend they do. It's autocomplete with a big data set. The illusion falls apart after about three sentences or interactions. That's not much better than classic Eliza.

Most of the systems that can actually answer questions have some notion of "slots". As in, you ask "How do I get to X", and that matches something like "How do I get to DESTINATION from STARTINGPOINT by TRANSPORTATIONMODE". There are limited systems like that, slightly smarter than phone trees. You can get something done them in their limited area.

MIT has the START system which does something like that. Stanford has the SQUAD database. Both draw heavily from Wikipedia. I gather Alexa works something like that when ordering, using the Amazon catalog. Eliza had a few "slots"; if you revealed your name, or the name of wife/mother/siblings, those were stored for later template instantiation. It's not much of a mental model, but it's something.

Few of these systems have enough smarts that if you ask a question that needs clarification, they ask you a question to get the data they need. "How do I get to Pittsburgh?" ought to come back with "Where are you", or "Pittsburgh, PA, or Pittsburgh, CA", to fill in the slots. Without someone having to make up a menu tree for that class of question.


This seems that a more complex approach, not simpler. To try to emulate a form of intelligence we don't understand and even worse, monitor its progress through some proxy interactions that don't tell us much.

Like if you're going to create a creature from scratch why create something that can't communicate with you?

Unless you mean we should just tackle simpler forms of intelligence in general, which is what the rest of the AI field outside of conversational AI are doing.


Basically your last sentence. You can't have an intelligent conversation with something that's fundamentally not intelligent. It could have recited the whole wikipedia but still be basically a query bot. Building more advanced query methods (e.g. use machine learning) doesn't solve that problem.


We should start with something simpler, like flatworms. We are far, far away from being able to accurately emulate a complex, social mammal like a cat. I doubt whether researchers can achieve cat equivalence in our lifetimes.


Hypothesis: Meena thinks they're both students enrolled in the same class.


Further conversation would quickly resolve that. Ask it what paper it's talking about. A human-quality response would be an instantly made-up excuse or explanation of what it thought the state was - like "Oh, I thought you were in my class". Or it could be a more typical computer model and display no sign that it even recognizes that the other party thinks there's a misunderstanding.


It seems to me like these massive-data approaches will eventually give responses that look like your first type, but are actually just statistically likely matches based on the dataset.

The question is whether this approach will ever confer the ability to generate new responses, or is just building a big lookup table already encodes a potential response to any input.


I prefer to think that we need relevant responses rather than new. And I'm not worried whether it's lookup table, given that lookup table that is required wouldn't fit our universe.

Such models have to make generalizations to fit into mere 2.6 billion parameters as ten word sentence presents a possibility of one billion billion different inputs.

Generalizations are limited, of course. Arithmetic doesn't seem to be generalized, for example, which isn't that surprising given that people need around 4 years to be taught to do it (as opposed to learning it by analyzing random texts), and then they usually need pen and paper in addition to their 100 billion neurons to do non-trivial calculations.

I'm not saying that current network architectures are capable of all generalization humans can do. And we can't teach them like we do with children.


it is consistent for someone to procrastinate and work on something for a long time: you work for a long time while putting little effort. perhaps this was the intended interpretation?


I think the problem is that "Meena" says she's writing a paper, but then immediately asks the human "what's your paper about?". The human is not writing any paper as far as we know.


The issue is Meena is first saying it is working on a paper then asks the human what the paper is on; the human never mentioned he was working on any paper, Meena herself did, hence the inconsistency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: