More

derekcheng08 · 2025-12-09T16:31:55 1765297915

After analyzing thousands of agent trajectories, we’ve seen all sorts of bizarre LLM quirks, this being one of them. Would love to hear other war stories also!

derekcheng08 · 2025-12-04T20:47:42 1764881262

Pretty cool! Always struck me as odd that the IDE has largely remained single-player for 40 years.

derekcheng08 · 2025-12-04T17:21:04 1764868864

That was an insane time. The pace was unreal. I remember Netscape 2.0 had at least 6-7 beta releases prior to the full release. And each one just dropped something massive and fundamental to the Internet - JavaScript (then called LiveScript IIRC) being one of those things. Just casually dropping what would dominate the entire industry in a browser beta.

The only other period I have experienced that comes close is what is happening now. What an incredible time to build.

pookha · 2025-12-04T18:24:45 1764872685

LOL. And Netscape 1.1 just casually birthed SSL (2.0) for the world and future generations.

derekcheng08 · 2025-12-04T17:18:32 1764868712

Super interesting how this arc has played out for Microsoft. They went from having this massive advantage in being an early OpenAI partner with early access to their models to largely losing the consumer AI space: Copilot is almost never mentioned in the same breath as Claude and ChatGPT. Though I guess their huge stake in OpenAI will still pay out massively from a valuation perspective.

xnorswap · 2025-12-04T17:49:26 1764870566

It's because Copilot isn't (just) a model, it's a brand that's been slapped on any old rubbish.

If Clippy were still around, that'd have been rebranded as Copilot by now.

blitzar · 2025-12-04T19:06:57 1764875217

If they resurected Clippy and made it the face of their Ai I would switch in a heartbeat.

int_19h · 2025-12-04T22:34:35 1764887675

https://felixrieseberg.github.io/clippy/

blitzar · 2025-12-04T22:46:38 1764888398

That is impressive! I really want clippy to chime in and tell me it looks like i am writing a letter and offer to help.

downrightmike · 2025-12-04T20:09:18 1764878958

They made Copilot the term for AI and smeared it everywhere to the point that it has no meaning and therefore no usage when talking about AI.

Zigurd · 2025-12-04T17:31:32 1764869492

Microsoft seems to be actively discarding the consumer PC market for Windows. It's gamers and enterprise, it seems. Enterprise users don't get a lot of say in what's on their desktop.

tetris11 · 2025-12-05T18:52:14 1764960734

I'm not even sure if it's gamers anymore, given how they're throttling local compute with buggy updates. Though maybe that's a strategy to increase demand for cloud gaming... shoot the left foot so that the right foot can hop

derekcheng08 · 2025-12-04T17:08:24 1764868104

The biggest omission that immediately stands out to me is: "provides a clear sense of direction".

I've seen so many examples of teams and organizations that experience a lack of clarity, with all sorts of negative downstream consequences - muddled strategies, moving goalposts, fatigue/low morale. Having a leader that can provide that clarity is so important.

derekcheng08 · 2025-10-09T15:42:23 1760024543

I do really like Tonkotsu :)

But beyond that, we wanted a playful, accessible brand. We think dev tools (particularly ones like Tonkotsu) are consumer products and we didn't want the staid/corporate branding many tools have.

rolph · 2025-10-09T16:02:44 1760025764

im thinking about the properties of Tonkotsu [broth], in comparison/contrast to Tonkotsu [app] such as traditional, frugal, ubiquitous, vital, etc.

its one of the first things i consider when i see a duet like this.

i get a visual of a barren dev cubicle, with a young dev sitting at the terminal, slouched over a bowl of ramen, while coding.

-----------------------------------------------------

on the technical side you may find this interesting. you are creating a CAS

https://en.wikipedia.org/wiki/Complex_adaptive_system

derekcheng08 · 2025-10-09T15:24:19 1760023459

Forgot to mention that an interesting behavior we see emerging is “human as editor”: let the agents make several commits in a branch, and then the human does a single refinement pass over it before raising a PR. Curious if others use this workflow or something else?

derekcheng08 · 2025-10-08T16:51:07 1759942267

Cool, but at the same time, it feels overwhelming: so many different CLI or IDE tools, so many extension points. It will be fascinating to see how this all shakes out.

verdverm · 2025-10-08T17:30:21 1759944621

not in gemini-cli's flavor, it's one of the worst tools I've used

derekcheng08 · 2025-10-08T18:15:32 1759947332

Is it model quality or the CLI itself?

verdverm · 2025-10-08T22:02:45 1759960965

I like the Gemini / Gemma family of models

It's more the agentic stuff, and things specific to the gemini-cli, which is behind the alternatives in features and capabilities. It was also making an insane number of requests (>1000 in a day, which is about the same as my Copilot usage for a month), but I'm sure they are doing their accounting differently. Google has tried to do AI accounting differently, but has acquiesced to counting tokens instead of chars, fingers crossed they do here to instead of being a snowflake that takes more effort to align in comparisons

I can't stand the cutesiness they've embedded into it either. I don't want that in a work tool

My general sense is that ai-clis will lose out to IDE integrations. I'd prefer a single tool and experience over having to context switch. Putting my AI partner in the same tool and env I use is better than having it separate with hacks to make it seem like it can be in there too, only sorta not quite

verdverm · 2025-10-08T22:08:56 1759961336

I'd add my general impression is that all the good designers that were at Google are gone, looking across their portfolio

Take even the Gemini Web App... what set Google apart in the early days? Search was just an input box, no clutter or calls to action. They have recently decided to break from this (they did have it clean beforehand) and try to get me to use image generation and other calls to action. Please get rid of the slop before I can even make my own slop!

and don't egg me on about Google Cloud... it's speed now feels like Jira, which to many people's surprise has changed course and is quite fast now

theshrike79 · 2025-10-13T14:25:39 1760365539

It still doesn't have an explicit planning mode, which is IMO table stakes.

I always need to remember to tell it not to touch the code, just read it or it'll take any question like "could we add library X to this" as "immediately add library X to this"...

derekcheng08 · 2025-10-08T02:40:09 1759891209

Really feels like computer use models may be vertical agent killers once they get good enough. Many knowledge work domains boil down to: use a web app, send an email. (e.g. recruiting, sales outreach)

loandbehold · 2025-10-08T05:30:01 1759901401

Why do you need an agent use web app through UI? Can't agent be integrated into web app natively? IMO for verticals you mentioned the missing piece is for an agent to be able to make phone calls.

tgsovlerkhgsel · 2025-10-08T12:40:26 1759927226

Native integration, APIs etc. require the web app author to do something. A computer use agent using the UI doesn't.

derekcheng08 · 2025-10-07T14:45:02 1759848302

Fascinatingly deep study. It shows the hyperoptimization needed to build these businesses: from all the work needed to calibrate pricing for each country, to technical safeguards like the fingerprinting. A lot of work had to be done here.