Click on start. Yeah that’s in the bottom left. Yup that one. Then look for settings. No the word settings. Has a little arrow next to it. Yeah hit that. It didn’t do anything? Wait, click with the mouse button on the left. Yeah it brought up a little menu to the right? That’s good. Now look for… ok let’s start over and remember not to click anywhere besides where I tell you to. And keep the mouse where it is. Ok find the start button again. No its in the bottom left of your screen…
Same here. With call forwarding our 24/7 support usually rang at my house. Night was the 'drunk shift' and usually login problems. One user was particularly edgy about his password and would not say it even to me, they were stored as crypts, so I could reset it. He said he had pasted it from another place (which probably means he forgot it and was too arrogant to admit it) Round and round until I checked the logs and he was trying to sign on with a pw of '********' which is how it had gone into the clipboard. Instead of engaging with him further I set his pw to that. Problem solved.
My greatest win was to add a few lines to our RADIUS server to flip case one time on bad logins, so if 'mYpASSWORD123' failed it would try 'MyPassword123' and let them in if it worked. Logs showed thousands of fixed logins per month and reduced tech support calls to less than a third. We declared victory over CAPSLOCK.
Being one of those gamers, I do question why a lot of the games targeted at us (looking at you Path of Exile) has to have so much goddamned pointless busy-work clicking. I play a league, then skip a league, which means I skip buying a supporter pack, just because my arthritis can't put up with their bullshit interface year-round.
I doubt the guts of something like an Echo would be capable of containing any kind of speech to text model. It'll probably be a lot more of a concern in a few years when speech models like that are getting baked into cheap chips for consumer devices, though.
I'm not as doubtful: iPhones have had on-device speech recognition since at least 2019, with iOS 13[1]. Amazon might have legal or policy reasons for not doing offline speech recognition, but the technology has been there for a while.
This is a many years old dev board with Wi-Fi, Bluetooth, a multi-mic ring array, RBG LEDs, speaker amplifier, SD slot, etc. Even this old version supports wake word with ring buffer, etc. That said this is also a (fun!) dev board - not a finished product. Newer versions based on the S3 are much more powerful.
Teardowns of Echo devices show much more capable hardware but at Amazon’s scale with their engineering, supply chain, and manufacturing resources they’re probably not losing all that much on Echo devices - although the Echo unit at Amazon has been posting serious losses but I’ve never seen breakouts of where those losses are coming from.
$29 is the sale price, not the BOM price. The entire point of "smart" home devices is to reduce consumer friction around revenue-driving actions; it would be surprising if Amazon were to prioritize making a profit on a $29 device rather than the sales and subscriptions it facilitates.
I don't know if it's every smart home vendor's goal to drive revenue. Apple sells their smart speakers for $100 or $300, which doesn't seem like a "loss leader" price, and it doesn't ask you to buy anything else. Their marketing page mentions what it can integrate with (smart locks, smart lights, etc.) but those things are notably not subscriptions. If you buy a smart light switch and a smart speaker, then you can just say "Hey Siri turn off the lights in the kitchen" and nobody ever gets any more money, and you didn't have to go downstairs and turn off the lights in the kitchen.
This seems like a very justified smart home. I am skeptical of all things proprietary technology, but Apple's stuff bothers me the least.
I’m a big fan of Whisper and whisper.cpp but doing reliable wake word detection with any kind of reasonable latency on a Raspberry Pi is likely to be a poor fit and very bad experience.
The Whisper model operates on 30 sec speech chunks. Input audio has to be padded to that length. So you’re constantly going to be recording audio, padding, looking for wake word, and then activating full recording upon detection. All on padded 30 sec chunks looking back…
Then there is model size and availability. Whisper base or maybe even tiny could potentially give decent results for wake word detection but I’m skeptical. Wake words can be surprisingly tricky.
That’s just for wake word assuming you’re going to stream audio after, as reliably doing ASR and NLP to figure out speaker intent is far too challenging and time consuming to be done on Raspberry Pi class hardware in anything approaching response times that would be considered acceptable. Whisper does pretty well with relatively high noise/low quality speech and far-field microphones are amazing but I doubt that's enough to provide anything approaching Echo/Alexa quality in the real world.
This is a carefully scripted demo[0] showing it takes a whopping 15 seconds to wake, ASR the speech, and return the result. The average person could easily take their phone out of their pocket, unlock it, look for the weather app, and read the weather in less time.
This demo[1] claims "real time" but looking at the example videos it clearly isn't and the accuracy leaves a lot to be desired. This is with three threads on a Raspberry Pi 4.
I just tried asking Echo what the weather is like and it was so fast I had trouble timing it - somewhere around one second.
Recently I made some progress on efficiently detecting short voice commands (wake words) on RPi4 [0]. Checkout the "command" example in whisper.cpp and it's "Guided mode" operation. There are additional improvements on the way too.
As I said I really appreciate and respect all of the work you’re doing on whisper.cpp but when it comes to things like wake words and commands I have to think Whisper is just fundamentally the wrong tool for the job. Tiny is a 39m parameter model with fairly poor accuracy and high latency (without GPU) that just about maxes out a Raspberry Pi - all for a few a few very carefully pronounced words under ideal conditions (in this case).
That said, there isn’t much in the open source space (that I’ve found) that’s even remotely competitive with Alexa/Echo so I’m all for any efforts and attention in this area. Perfect is the enemy of good but this thread started off with people wondering if Whisper or anything based on it was close to Alexa/Echo for wake word activated assistant tasks. I think it’s very safe to say it isn’t.
Again, I really appreciate your work on making Whisper more accessible to the masses for local ASR - please don’t take this as criticism for your efforts. If anything I’ve been involved in open source projects and it’s frustrating when people try to jam a square peg in a round hole, only to come back and complain your labor of love didn’t work for them.
True, but note that they're using the tiny model. In my experimentation, you need at least the small model to get transcription I'd call "good", which is still a bit slower than you'd like on a moderately fast laptop from 2019.
That said, whisper is incredible and the era of very good local speech-to-text on moderate hardware is basically here, or will be in the next year.
Dragon Naturally Speaking required extensive training on the voice of the user, and the result wasn't as accurate as modern machine-learning speech-to-text models. You had to watch it and actively correct it's results.
IMO, the accuracy of modern speech-to-text models is still nowhere near accurate enough. Maybe they should bring back the per-user training
The issue is settled now. Germany paid huge war reparations after WWII until the 1990s.
Apart from that, many of these explosives are malfunctioning bombs that were supposed to go off when firefighters and rescue personell are at the site. Such devices are pure malice; the defense argument can't justify them.
You can't randomly levy all sorts of minor nuisance onto somebody after they've conceded and signed a bunch of treaties largely drafted by the winning side meant to settle this. That's just a great way to self-sabotage your diplomatic reputation.
But there is (or rather was) an greater argument, of not wasting anymore time, with historic fights over and over again and rather move on and look forward.
All the great conflicts are getting solved that way and not with bean counting.
i really do not like being the guy that goes "but chatgpt" in threads about other topics, but i have found it to be pretty good at generating simplified recipes.