Title of the Metamers paper (which sounds really interesting) referenced in the text: "Model metamers illuminate divergences between biological and artificial neural networks".
This is literally the opposite of the main editorial push in this article and headline. Misrepresenting researchers in this way to beef up a silly mainstream narrative about ML/DL is definitely uncritical STEM worship-type territory.
I don't think it's misrepresenting researchers. The article chiefly focuses on Richards' research, which shows correlation between a self-supervised model and parts of a human brain, hence the title, but also takes the space to talk about metamers. Pretty sure two things can be similar in some ways and different in others, so the two conclusions are not necessarily mutually exclusive.
It doesn't misrepresent the researchers. It mentions the "metamers" paper specifically to say that "Not everyone is convinced" and has a subsection titled, "Uncured Pathologies" to tell about the research works with opposing view from the main work talked about in the article.
The headline is declaring that "self-taught" (digital) AI shows similarities to how the brain works and as a result misrepresents research in this field because this is not scientific fact. It's a metaphor, literally. Overstating results misrepresents the balance of opinion in a field and therefore the value of the metamers research. Metamers is treated fairly in its own right but these issues remain because the headline and most of the copy are promoting one narrative.
Let’s not forget that “self-taught ai” is kind of a misnomer for multiple reasons. First of all, although it’s of the class of “self-supervised” models, it is still targeted by some person or organization for some purpose. Also, although it is called “ai” it is actually ml which means it uses some data (often user data). Even this label-less data is still data.
This misleading terminology can be used purposefully by corps (the major funders of ml research) to distract people from what they’re really up to. Not to say it’s not good research…just that’s it’s not necessarily used for the best of purposes.
Classically, AI as a field was all about how to encode human knowledge in a form accessible to logical inference techniques. This involved a lot of manual entering of rules, logical propositions, fact databases, etc.
The rise of statistical learning, which can benefit from a lot more un-preprocessed data, is comparatively new.
It's not that AI didn't use data before, just that the classical models relied on a lot of data being meticulously encoded by hand. As you can imagine, this was fine for academic pursuits, but real applications quickly hit the limits of what was feasible to tackle this way due the amount of (expert) labour needed. The success of the machine learning approach is largely due to the massive scaling afforded by not requiring human intervention in every individual data point.
To expand on this, it wasn't a consensus until comparatively recently that artificial intelligence required learning techniques.
Now it seems obvious but for long time there was the idea that logic programming and knowledge engineering with large ontologies was the route to intelligence.
I doubt anyone in the AI field now makes any distinction between AI and ML. This of course implies the use of data. There's no attempt to hide this by anyone I'm aware of - it's just how it works.
The question was more about the distinction wrt AI I think.
I understood the original comment to argue that it can't be sentient because its only created by ingesting insane amounts of human interactions.
Though I doubt we'll see artificial sentience during my lifetime, i strongly disagree with that take. From my point of view Humans need that too to become smart and learn language etc, so that shouldn't be an argument. Neither for nor against.
> To some neuroscientists, it seems as if the artificial networks are beginning to reveal some of the actual methods our brains use to learn.
I'm probably pinning a lot of hope on "some" here because I still tend to think at BEST we have plausible models to outcome. But, if biology teaches us anything it's that multiple
pathways to an outcome are not just possible, they're normal.
I'd say when the next order model on top of this one goes to the same place a biological system does in a way which exposes theory of mind, I'll be less sceptical. Even then it could be analogous but distinct paths to the same outcome and so useful and informative but not necessarily telling us how the brain "does it"
The other part of the "some" is that we're really talking about some portion of the visual system here. A model that can infer the geometry of a bus from a partial image of a bus having being shown lots of partial models of buses is a potentially very useful model (potentially one vastly superior at drawing buses to humans), but it's some way from a theory of mind and frankly doesn't sound that similar to how human brains conceptually model buses, where the basic shape is something operating at a very low level of our subconscious...
1. > Self-supervised learning allows a neural network to figure out for itself what matters.
2. > Such “supervised” training requires data laboriously labeled by humans, and the neural networks often take shortcuts, [...]. For example, a neural network might use the presence of grass to recognize a photo of a cow, because cows are typically photographed in fields.
I don't really agree with that distinction. In that regard self-supervised and supervised are strictly equivalent.
For 1., this is true of classic supervised model too, e.g. with an "old school" inception model trained on labeled Imagenet data, the bottom layers have learned more fundamental concepts that are generally useful such as edge detections, colours, texture and so on, that where never labeled or specifically required by the task. That's why a classic pattern is to freeze those layers when retraining the model for a new task.
For 2., the exact same things happen in self-supervised learning, and in any ML task for that matter: gradient descent will always try to find the shortest path (after all, that's why it works). One common self-supervised learning task is to take a patch from an image and then ask the model to figure out which image that patch came from among a large set of images (contrastive loss). If you are not careful with how the images are transformed, the model can rely on effective but ultimately useless techniques to achieve this, a typical example being detecting artifacts in the picture linked to the type of camera and lens that was used to take the picture, allowing massively filter the potential images candidates to consider in the target batch.
After all, self-supervised learning is just about procedurally generating the labels for a task instead of depending on humans to do so, but it is still supervised.
Besides being able to leverage massive amount of data that would require the entire population of earth working for amazon mechanical turk to label, the real difference is that with SSL we can easily try many different tasks and find ones which lend themselves better (e.g. because they have the right level of difficulty) to teach the network generalizable concepts that will yield better performances on downstream tasks. But this is strictly the result of having a much shorter iteration/experiment loop, not of some qualitatively superior learning caused by the fact that the label was not human-made.
This is literally the opposite of the main editorial push in this article and headline. Misrepresenting researchers in this way to beef up a silly mainstream narrative about ML/DL is definitely uncritical STEM worship-type territory.