Heh, I felt the same. I'm a web dev but I do not want a electron app. We can do better, I used to write electron apps because I wasn't able to build a proper native app. Now I can!
I've been building a native macOS/iOS app that lets me manage my agents. Both the ability to actually control/chat fully from the app and to just monitor your existing CLI sessions (and/or take 'em over in the app).
Also has a rust server that backs it so I can throw it anywhere (container, pi, etc) and the connect to it. If anyone wants to see it, but I have seen like 4 other people at least doing something similar: https://github.com/Robdel12/OrbitDock
I really want to use google’s models but they have the classic Google product problem that we all like to complain about.
I am legit scared to login and use Gemini CLI because the last time I thought I was using my “free” account allowance via Google workspace. Ended up spending $10 before realizing it was API billing and the UI was so hard to figure out I gave up. I’m sure I can spend 20-40 more mins to sort this out, but ugh, I don’t want to.
With alllll that said.. is Gemini 3.1 more agentic now? That’s usually where it failed. Very smart and capable models, but hard to apply them? Just me?
May be very silly of me, but I avoid using Gemini on my personal Google account. I use it at work, because my employer provides it.
I am scared some automated system may just decide I am doing something bad and terminate my account. I have been moving important things to Proton, but there are some stuff that I couldn't change that would cause me a lot of annoyance. It's not trivial to set up an alternative account just for Gemini, because my Google account is basically on every device I use.
I mostly use LLMs as coding assistant, learning assistant, and general queries (e.g.: It helped me set up a server for self hosting), so nothing weird.
For what it's worth, there was an (unfortunately unsuccessful) HN submission from a guy who got his Gemini account banned, apparently without losing his whole Google account: https://news.ycombinator.com/item?id=47007906
Comforting to know that they may ban you from only some of their services, I guess?
I really regret relying so much on my Google account for so long. Untangling myself from it is really hard. Some places treat your email as a login, not as simply as a way to contact you. This is doubly concerning for government websites, where setting up a new account may just not be a possibility.
At some point I suppose Gemini will be the only viable option for LLMs, so oh well.
100% agreed. I wish someone would make a test for how reliably the LLMs follow tool use instructions etc. The pelicans are nice but not useful for me to judge how well a model will slot into a production stack.
At first when I got started with using LLMs I read/analyzed benchmarks, looked at what example prompts people used and so on, but many times, a new model does best at the benchmark, and you think it'll be better, but then in real work, it completely drops the ball. Since then I've stopped even reading benchmarks, I don't care an iota about them, they always seem more misdirected than helpful.
Today I have my own private benchmarks, with tests I run myself, with private test cases I refuse to share publicly. These have been built up during the last 1/1.5 years, whenever I find something that my current model struggles with, then it becomes a new test case to include in the benchmark.
Nowadays it's as easy as `just bench $provider $model` and it runs my benchmarks against it, and I get a score that actually reflects what I use the models for, and it feels like it more or less matches with actually using the models. I recommend people who use LLMs for serious work to try the same approach, and stop relying on public benchmarks that (seemingly) are all gamed by now.
Would you be willing to give a rough outline of one or a few test cases? I am having a bit of a hard time imagining what and how you are testing. Is it like "change the signature of function X in file @Y to take parameter Z" and then comparing the result with what you expect?
> For those building with a mix of bash and custom tools, Gemini 3.1 Pro Preview comes with a separate endpoint available via the API called gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example view_file or search_code).
It sounds like there was at least a deliberate attempt to improve it.
It's absolutely amazing how hostile Google is to releasing billing options that are reasonable, controllable, or even fucking understandable.
I want to do relatively simple things like:
1. Buy shit from you
2. For a controllable amount (ex - let me pick a limit on costs)
3. Without spending literally HOURS trying to understand 17 different fucking products, all overlapping, with myriad project configs, api keys that should work, then don't actually work, even though the billing links to the same damn api key page, and says it should work.
And frankly - you can't do any of it. No controls (at best delayed alerts). No clear access. No real product differentiation pages. No guides or onboarding pages to simplify the matter. No support. SHIT LOADS of completely incorrect and outdated docs, that link to dead pages, or say incorrect things.
I've used all 3 major providers - AWS, GCP, Azure.
AWS is no gem... it also has it's own byzantine processes to sign up and pay for things. And it also doesn't support any real and reasonable way to stop spend when you hit limits (abusive practices).
But at least I can generally sign up for and consume a new service without hours and hours of debugging.
For context - Google own Gemini 3 utterly fails to figure out how to do something as simple as "access the image doodle feature" proudly marketed here: https://gemini.google/overview/image-generation/
It can't figure out how to do. Honestly, I still can't figure out how to do it, despite signing up for about 5 different products, and trying 4 different UIs. The closest I got was to their inpainting/outpainting UI on the legacy models in their image create studio.
And none of that involved creating a billing account, which I already had, and was required for 3 of the signups.
As far as I'm concerned, this feature is fake marketing. It doesn't exist. That's the "quality" level of GCP.
Yep, MacRumors is one of many bottom-feeders regurgitating rumors and half-truths from a fatter bottom-feeder, trading Apple hysteria for ad impressions. It's another small signal that the tech industry is now mostly eating itself.
I’m a heavy Claude code user and it’s pretty clear they’re starting to bend under their vibe coding. Each Claude code update breaks a ton of stuff, has perf issues, etc.
And then this. They want to own your dev workflow and for some reason believe Claude code is special enough to be closed source. The react TUI is kinda a nightmare to deal with I bet.
I will say, very happy with the improvements made to Codex 5.3. I’ve been spending A LOT more time with codex and the entire agent toolchain is OSS.
Not sure what anthropic’s plan is, but I haven’t been a fan of their moves in the past month and a half.
I switched to Codex 5.3 too, it's cheaper also anyway and as dumb as it sounds, Scam Altman is actually the less annoying CEO compared to Amodei which is kind of an achievement. Amodei really looking more and more like some huckster giving these idiotic predictions to the press.
>Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive. It would be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end for they do so with the approval of their own conscience. They may be more likely to go to Heaven yet at the same time likelier to make a Hell of earth. This very kindness stings with intolerable insult. To be "cured" against one's will and cured of states which we may not regard as disease is to be put on a level of those who have not yet reached the age of reason or those who never will; to be classed with infants, imbeciles, and domestic animals.
I'm not sure why I'm getting downvoted, but VS Code integration really does stink. Often times it will just simply not send the API request and just say reconnecting and I've had it simply freeze where the VS Code OpenAI Codex plugin has frozen, but all the other plugins like Cline or Roo are working perfectly fine. So the VS Code integration is almost unusable in my experience.
> But my experience is they’re not really even close to the closed paid models.
They are usually as good as the flagship model for 12-18 months ago. Which may sound like a massive difference, because somehow it is, but it's also fairly reasonable, you don't need to live to the bleeding edge.
And it's worth pointing out that Claude Code now dispatches "subagents" from Opus->Sonnet and Opus->Haiku ... all the time, depending on the problem.
Running this thing locally on my Spark with 4-bit quant I'm getting 30-35 tokens/sec in opencode but it doesn't feel any "stupider" than Haiku, that's for sure. Haiku can be dumb as a post. This thing is smarter than that.
It feels somewhere around Sonnet 4 level, and I am finding it genuinely useful at 4-bit even. Though I have paid subscriptions elsewhere, so I doubt I'll actually use it much.
I could see configuration OpenCode somehow to use paid Kimi 2.5 or Gemini for the planning/analysis & compaction, and this for the task execution. It seems entirely competent.
I worked for Percy for 4 years. We were “stuck” with imagemagik to do diffing (I’m sure they still might). I was able to build my own differ with Claude/LLM help.
I looked at the source. It seems most of the code is not included in the GitHub repo, which itself contains a bit of JS glue. The .tgz uploaded to npm has various prebuilt binaries. Can I take a look at the rust code?
I'm not trying to imply LLMs aren't useful. I just want more info from GP so that I can evaluate their claims.
I've been building a native macOS/iOS app that lets me manage my agents. Both the ability to actually control/chat fully from the app and to just monitor your existing CLI sessions (and/or take 'em over in the app).
Terrible little demo as I work on it right now w/claude: https://i.imgur.com/ght1g3t.mp4
iOS app w/codex: https://i.imgur.com/YNhlu4q.mp4
Also has a rust server that backs it so I can throw it anywhere (container, pi, etc) and the connect to it. If anyone wants to see it, but I have seen like 4 other people at least doing something similar: https://github.com/Robdel12/OrbitDock
reply