I've occasionally seen my Tesla on autopilot slow down for like a shadow or something. You definitely need to pay attention to it.
The article quotes: “With only one sensor type, it’s harder to be sure because you do not have the cross-check from a different type of sensor,” he said.
This is presented as though it's a clear win though. We know for example from the Uber fatality that you can also get false negatives because of failures of sensor fusion, which is a very hard problem. Requiring multiple sensors means that all of your sensor systems need to notice a danger, and you need to be able to integrate that data properly, before you can react.
So I can definitely see that there are major advantages to vision-only. But it's clear that the problem has not yet been solved by Tesla nonwithstanding Elon's inflated claims.
>Requiring multiple sensors means that all of your sensor systems need to notice a danger, and you need to be able to integrate that data properly, before you can react.
This isn't true. The principle in general is that you want to detect situations of danger, minimizing false positives and false negatives, with your pay off function probably valuing false negatives as much worse than false positives. With more sensors, the worst case scenario is that it doesn't improve the payoff function and you simply don't use the sensor information. But in the best case scenario the extra sensor can be used to massively increase your payoff - your camera gets confused by a shadow for example, but your radar doesn't detect shadows, so you can eliminate those false positives. Since the general principal to solve self-driving has been to train AI to process sensor input to make decisions, the worst case scenario is you end up with a neural net that literally just doesn't look at the other sensor data (assuming you're training your AI reasonably, if you aren't then maybe you shouldn't be working on an incredibly difficult problem with safety critical applications)
Tesla has essentially bet that they can get their optical systems so good that other sensors like Lidar do hit that situation where they offer nothing in terms of improving the pay off, but every piece of evidence we've seen so far has indicated that they haven't achieved that, and this is many years after Tesla had been confident that they would acheive it.
I don't know what point you're trying to make. It's logically clear that you don't need unanimity in what the sensors are detecting to make a decision.
> Requiring multiple sensors means that all of your sensor systems need to notice a danger
This is the opposite of true. In well-designed sensor fusion algorithms, every new piece of sensor data, however noisy, helps to overconstrain the estimate. Each sensor reading helps to inform the interpretation of the other readings. If your system is designed such that more information worsens your inference, you have designed a terrible system.
I massively disagree with your reasonning as to why vision only has advantages, or at least i disagree that they're really advantages.
What you are saying and I understand you're not saying this is the way to go but merely giving a point of view but hear me out, is that if you have two inputs instead of one, the two might disagree, which makes it harder to figure out which one is right. But if you have only one input, it can't disagree with it self so it's easier to decide.
But if the one you have is the one of the two that would be wrong, you haven't gained anything, beside being sure about it while being wrong. At least with two inputs you're able to know something is off. And usually you add more inputs so you can decide, and make a strategy depending on context (absolute majority, 50%+1, ...).
And sorry but if when having multiple input the system is confused, or needs time to correlate all the data, or whatever like that, this is still useful as again it lets you know something is wrong danger will robinson.
I think the field of engineering, with planes and spaceships and whatever, has proven how the "simplificity" advantage of single sensor is a scam that will cost lives to avoid fixing issues.
Maybe systems that use both vision and lidar or radar have this problem, but it seems unlikely that taking away the r/lidar will solve it. I don't think it is so much that multi-mode systems have problems because they have more information, but that they have problems despite having more information.
Are you saying that the lack of redundancy is as a "major advantage to vision-only?" Because integrating those redundant systems (sensor fusion) is hard?
> all of your sensor systems need to notice a danger...before you can react
I'd be really surprised if they implemented it this way. I thought the whole pitch behind lidar was that if vision doesn't see the person walking in front of you, lidar can still see the moving person shaped thing and you can still act.
You're still left with the two-sensor problem though: if lidar says "pedestrian" but vision says "clear," you don't know if it's a false positive in lidar or a false negative in vision.
If your safety reaction is consequence-free, no problem. If it isn't (like slamming on brakes in traffic), you really want a 3rd sensor so you can do 2oo3 voting.
Idk why people have the preconception of two (or more) sensors each deciding on an outcome indepently..
Suppose you have a color sensor and a form sensor and you have to identify an orange (fruit). Clearly in conjuction the two sensors will be way better, than assigning everything with color orange orange and everything round an orage.
It's for safety reasons so that the system doesn't enter an undefined state if one of those sensors fails, which it eventually will.
If the signal goes from "black nothing" to "orange round" you know you went from no object to an orange, but what if your form sensor breaks and it goes from "black nothing" to "orange nothing?"
You just made a point for redundancy ie more sensors...
Sensor failures are independent events and need to be correctly identified no matter what.
To take Lidar and Vision, both say something about distance and form of objects. Together they can achieve better performance. If one of them fails (and failure is identified) it defaults to the other sensor only and will be somewhat worse, but should at least allow it to pull over and warn you about sensor failure.
Also failures are much easier to identify when you have a baseline reference of one or more other sensors, in isolation much harder.
What if you have multiple cameras and multiple lidar sensors? You get into statistical modeling immediately and I imagine that’s how the production systems like these work. If 3 cameras and 2 lidars detect a person, the model says X, which translates to such and such FPR and FNR.
Would be cool to know what approaches they take. Stereo and then perspective scaling (smaller thing are farther away) and then probably a blend of spice of ML for recognizing things/knowing expected scale on the fly... idk I'm arm-chair skeptical of vision only but humans do it.
Actually on the small downtown streets of my home city they allow parking too close to the intersection such that the view of my civic doesn't allow me to see particularly far down a relatively busy one way street. To compensate for this I roll down the window to listen for tire noise. Maybe not strictly necessary but I'm not sure its quite true to say that people drive by vision 100 percent of the time. Of course they could just add a fisheye mirror on each intersections street or roll back the parking but still. Another example that comes to mind is those old one lane tunnels where you have to stop at the edge and listen for a honk for a few seconds, honk and then go.
Parking too close to the intersection is the bane of my existence in my home city. If someone decides to park their lifted 4runner anywhere on the block, say goodbye to seeing oncoming traffic -- it just becomes a game of russian roulette, trying to cross the street. And I drive a (small) SUV!
An increasingly complex set of if / else statements.
Is it a left turn? Is it a stationary emergency vehicle? Is it a small child? Don't worry, it's all software and we'll keep sending OTA updates until we catch every edge case.
Yeah it's just amazing to have this real time world system chewing through data. Particularly if say they're taking a picture and pulling points from it and doing that at some refresh rate/computing... idk just cool.
And the loops checking like "I accelerated, it has been this much time, did I actually move 10 meters" or whatever. Then you check if something stationary shrunk by that much in size from the camera perspective...
Side tangent but some drones use this cool technology called "visual inertial odometry"
The article quotes: “With only one sensor type, it’s harder to be sure because you do not have the cross-check from a different type of sensor,” he said.
This is presented as though it's a clear win though. We know for example from the Uber fatality that you can also get false negatives because of failures of sensor fusion, which is a very hard problem. Requiring multiple sensors means that all of your sensor systems need to notice a danger, and you need to be able to integrate that data properly, before you can react.
So I can definitely see that there are major advantages to vision-only. But it's clear that the problem has not yet been solved by Tesla nonwithstanding Elon's inflated claims.