There was a post on here recently about how you should build your own agent, and I completely agree. I'd say most competent developers should be building even more complex projects than an agent. Once you do you quickly realize how it's a constant uphill battle, and it quickly becomes apparent that the data you're working with is the primary issue.
I don't know if that is what gp and above is talking about. "Agents" are the kind of thing/word that helps to paper over the very fact that these things only work because of huge amount of humans in-the-loop in the outset (that is, you know, labor). Agents help us believe that LLM's can do everything for us, even bootstrap themselves, but, what the above thread is about is that, really, what you get out correlates only to what you put in in the first place.
I've been building tools for stuff I don't want to do. Any task where I need to take some amount of data, structured or unstructured, and need a specific outcome is perfect. That way I can spend more time on the thing I do want to do (including building these little tools).
I appreciate this thinking. This gives me the vibes of "let me draw, paint, sing for fun, while AI takes care of my chores". I agree with that, but I can’t help but wonder if the agent ever considers whether things you enjoy should be left to you, but takes everything it can.
Honestly, I've gotten really far simply by transcribing audio with whisper, having a cheap model clean up the output to make it make sense (especially in a coding context), and copying the result to the clipboard. My goal is less about speed and more about not touching the keyboard, though.
Thanks. Could you share more? I'm about to reinvent this wheel right now. (Add a bunch of manual find-replace strings to my setup...)
Here's my current setup:
vt.py (mine) - voice type - uses pyqt to make a status icon and use global hotkeys for start/stop/cancel recording. Formerly used 3rd party APIs, now uses parakeet_py (patent pending).
parakeet_py (mine): A Python binding for transcribe-rs, which is what Handy (see below) uses internally (just a wrapper for Parakeet V3). Claude Code made this one.
(Previously I was using voxtral-small-latest (Mistral API), which is very good except that sometimes it will output its own answer to my question instead of transcribing it.)
In other words, I'm running Parakeet V3 on my CPU, on a ten year old laptop, and it works great. I just have it set up in a slightly convoluted way...
I didn't expect the "generate me some rust bindings" thing to work, or I would have probably gone with a simpler option! (Unexpected downside of Claude is really smart: you end up with a Rube Goldberg machine to maintain!)
For the record, Handy - https://github.com/cjpais/Handy/issues - does 80% of what I want. Gives a nice UI for Parakeet. But I didn't like the hotkey design, didn't like the lack of flexibility for autocorrect etc... already had the muscle memory from my vt.py ;)
My use case is pretty specific - I have a 6 week old baby. So, I've been walking on my walking pad with her in the carrier. Typing in that situation is really not pleasant for anyone, especially the baby. Speed isn't my concern, I just want to keep my momentum in these moments.
My setup is as follow:
- Simple hotkey to kick off shell script to record
- Simple python script that uses ionotify to watch directory where audio is saved. Uses whisper. This same script runs the transcription through Haiku 4.5 to clean it up. I tell it not to modify the contents, but it's haiku, so sometimes it just does it anyway. The original transcript and the ai cleaner versions are dumped into a directory
- The cleaned up version is run through another script to decide if it's code, a project brief, an email. I usually start the recording "this is code", "this is a project brief" to make it easy. Then, depending on what it is the original, the transcribed, and the context get run through different prompts with different output formats.
It's not fancy, but it works really well. I could probably vibe code this into a more robust workflow system all using ionotify and do some more advanced things. Integrating more sophisticated tool calling could be really neat.
The technical capacity of a CTO matters less then the CTOs ability to stay in their lane (for a lack of a better term).
I once worked for a company with a self taught CTO (and not the good kind). They had a number of star players, and this CTO would frequently lash out at them. All because he was getting in the way of them doing their jobs, doing work he wasn't qualified to do, trying forcing them to clean up after him, and then yelling at them for it. It was insanely toxic. I only lasted a few months. It was so bad I back channelled patches and project briefs to people he liked to get them approved.
Had this CTO remained people, project and product focused everything would have been fine.
> this CTO would frequently lash out at them [...] doing work he wasn't qualified to do, trying forcing them to clean up after him [...] and then yelling at them for it
Was that a Fintech in Germany, by any chance? :)
I once witnessed a meeting between a CTO and a Tech Lead. The CTO was attending from his laptop in an open office, and he was yelling in Russian for one hour straight at another Tech Lead because he wanted the tech lead to finish his work. It was a pathetic display, with the whole company watching and wondering what was going on.
Eventually he was "phased out" by having a few people promoted to VP of engineering who would deal directly with the CEO instead of him.
Last I heard he tried to rewrite the financial core in Golang by himself, but he failed since nobody wanted to work together with him and he doesn't really knew the language.
Self taught in the programming sense, or the people management sense? Because I feel like the letter is much more common than not in software. Just curious in case there's an expected background you're thinking of when you say that. I have no point of reference for CTO backgrounds beyond generic MBAs or senior devs that either gave themselves the titles as founders or failed upwards.
reply