After analyzing thousands of agent trajectories, we’ve seen all sorts of bizarre LLM quirks, this being one of them. Would love to hear other war stories also!
That was an insane time. The pace was unreal. I remember Netscape 2.0 had at least 6-7 beta releases prior to the full release. And each one just dropped something massive and fundamental to the Internet - JavaScript (then called LiveScript IIRC) being one of those things. Just casually dropping what would dominate the entire industry in a browser beta.
The only other period I have experienced that comes close is what is happening now. What an incredible time to build.
Super interesting how this arc has played out for Microsoft. They went from having this massive advantage in being an early OpenAI partner with early access to their models to largely losing the consumer AI space: Copilot is almost never mentioned in the same breath as Claude and ChatGPT. Though I guess their huge stake in OpenAI will still pay out massively from a valuation perspective.
Microsoft seems to be actively discarding the consumer PC market for Windows. It's gamers and enterprise, it seems. Enterprise users don't get a lot of say in what's on their desktop.
I'm not even sure if it's gamers anymore, given how they're throttling local compute with buggy updates. Though maybe that's a strategy to increase demand for cloud gaming... shoot the left foot so that the right foot can hop
The biggest omission that immediately stands out to me is: "provides a clear sense of direction".
I've seen so many examples of teams and organizations that experience a lack of clarity, with all sorts of negative downstream consequences - muddled strategies, moving goalposts, fatigue/low morale. Having a leader that can provide that clarity is so important.
But beyond that, we wanted a playful, accessible brand. We think dev tools (particularly ones like Tonkotsu) are consumer products and we didn't want the staid/corporate branding many tools have.
Forgot to mention that an interesting behavior we see emerging is “human as editor”: let the agents make several commits in a branch, and then the human does a single refinement pass over it before raising a PR. Curious if others use this workflow or something else?
Cool, but at the same time, it feels overwhelming: so many different CLI or IDE tools, so many extension points. It will be fascinating to see how this all shakes out.
It's more the agentic stuff, and things specific to the gemini-cli, which is behind the alternatives in features and capabilities. It was also making an insane number of requests (>1000 in a day, which is about the same as my Copilot usage for a month), but I'm sure they are doing their accounting differently. Google has tried to do AI accounting differently, but has acquiesced to counting tokens instead of chars, fingers crossed they do here to instead of being a snowflake that takes more effort to align in comparisons
I can't stand the cutesiness they've embedded into it either. I don't want that in a work tool
My general sense is that ai-clis will lose out to IDE integrations. I'd prefer a single tool and experience over having to context switch. Putting my AI partner in the same tool and env I use is better than having it separate with hacks to make it seem like it can be in there too, only sorta not quite
I'd add my general impression is that all the good designers that were at Google are gone, looking across their portfolio
Take even the Gemini Web App... what set Google apart in the early days? Search was just an input box, no clutter or calls to action. They have recently decided to break from this (they did have it clean beforehand) and try to get me to use image generation and other calls to action. Please get rid of the slop before I can even make my own slop!
and don't egg me on about Google Cloud... it's speed now feels like Jira, which to many people's surprise has changed course and is quite fast now
It still doesn't have an explicit planning mode, which is IMO table stakes.
I always need to remember to tell it not to touch the code, just read it or it'll take any question like "could we add library X to this" as "immediately add library X to this"...
Really feels like computer use models may be vertical agent killers once they get good enough. Many knowledge work domains boil down to: use a web app, send an email. (e.g. recruiting, sales outreach)
Why do you need an agent use web app through UI? Can't agent be integrated into web app natively? IMO for verticals you mentioned the missing piece is for an agent to be able to make phone calls.
Fascinatingly deep study. It shows the hyperoptimization needed to build these businesses: from all the work needed to calibrate pricing for each country, to technical safeguards like the fingerprinting. A lot of work had to be done here.