Hacker Newsnew | past | comments | ask | show | jobs | submit | Robdel12's commentslogin

Heh, I felt the same. I'm a web dev but I do not want a electron app. We can do better, I used to write electron apps because I wasn't able to build a proper native app. Now I can!

I've been building a native macOS/iOS app that lets me manage my agents. Both the ability to actually control/chat fully from the app and to just monitor your existing CLI sessions (and/or take 'em over in the app).

Terrible little demo as I work on it right now w/claude: https://i.imgur.com/ght1g3t.mp4

iOS app w/codex: https://i.imgur.com/YNhlu4q.mp4

Also has a rust server that backs it so I can throw it anywhere (container, pi, etc) and the connect to it. If anyone wants to see it, but I have seen like 4 other people at least doing something similar: https://github.com/Robdel12/OrbitDock


I really want to use google’s models but they have the classic Google product problem that we all like to complain about.

I am legit scared to login and use Gemini CLI because the last time I thought I was using my “free” account allowance via Google workspace. Ended up spending $10 before realizing it was API billing and the UI was so hard to figure out I gave up. I’m sure I can spend 20-40 more mins to sort this out, but ugh, I don’t want to.

With alllll that said.. is Gemini 3.1 more agentic now? That’s usually where it failed. Very smart and capable models, but hard to apply them? Just me?


May be very silly of me, but I avoid using Gemini on my personal Google account. I use it at work, because my employer provides it.

I am scared some automated system may just decide I am doing something bad and terminate my account. I have been moving important things to Proton, but there are some stuff that I couldn't change that would cause me a lot of annoyance. It's not trivial to set up an alternative account just for Gemini, because my Google account is basically on every device I use.

I mostly use LLMs as coding assistant, learning assistant, and general queries (e.g.: It helped me set up a server for self hosting), so nothing weird.


For what it's worth, there was an (unfortunately unsuccessful) HN submission from a guy who got his Gemini account banned, apparently without losing his whole Google account: https://news.ycombinator.com/item?id=47007906

Comforting to know that they may ban you from only some of their services, I guess?

I really regret relying so much on my Google account for so long. Untangling myself from it is really hard. Some places treat your email as a login, not as simply as a way to contact you. This is doubly concerning for government websites, where setting up a new account may just not be a possibility.

At some point I suppose Gemini will be the only viable option for LLMs, so oh well.


Same feeling here, if it makes you feel any better (for sure it made me better seeing I'm not alone in this).

100% agreed. I wish someone would make a test for how reliably the LLMs follow tool use instructions etc. The pelicans are nice but not useful for me to judge how well a model will slot into a production stack.

At first when I got started with using LLMs I read/analyzed benchmarks, looked at what example prompts people used and so on, but many times, a new model does best at the benchmark, and you think it'll be better, but then in real work, it completely drops the ball. Since then I've stopped even reading benchmarks, I don't care an iota about them, they always seem more misdirected than helpful.

Today I have my own private benchmarks, with tests I run myself, with private test cases I refuse to share publicly. These have been built up during the last 1/1.5 years, whenever I find something that my current model struggles with, then it becomes a new test case to include in the benchmark.

Nowadays it's as easy as `just bench $provider $model` and it runs my benchmarks against it, and I get a score that actually reflects what I use the models for, and it feels like it more or less matches with actually using the models. I recommend people who use LLMs for serious work to try the same approach, and stop relying on public benchmarks that (seemingly) are all gamed by now.


share

The harness? Trivial to build yourself, ask your LLM for help, it's ~1000 LOC you could hack together in 10-15 minutes.

As for the test cases themselves, that would obviously defeat the purpose, so no :)


Would you be willing to give a rough outline of one or a few test cases? I am having a bit of a hard time imagining what and how you are testing. Is it like "change the signature of function X in file @Y to take parameter Z" and then comparing the result with what you expect?

> For those building with a mix of bash and custom tools, Gemini 3.1 Pro Preview comes with a separate endpoint available via the API called gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example view_file or search_code).

It sounds like there was at least a deliberate attempt to improve it.


You can delete the billing from a given API key

You could always use it through Copilot. The credits based billing is pretty simple without surprise charges.

So much this.

It's absolutely amazing how hostile Google is to releasing billing options that are reasonable, controllable, or even fucking understandable.

I want to do relatively simple things like:

1. Buy shit from you

2. For a controllable amount (ex - let me pick a limit on costs)

3. Without spending literally HOURS trying to understand 17 different fucking products, all overlapping, with myriad project configs, api keys that should work, then don't actually work, even though the billing links to the same damn api key page, and says it should work.

And frankly - you can't do any of it. No controls (at best delayed alerts). No clear access. No real product differentiation pages. No guides or onboarding pages to simplify the matter. No support. SHIT LOADS of completely incorrect and outdated docs, that link to dead pages, or say incorrect things.

So I won't buy shit from them. Period.


You think AWS is better?

Scarily - yes, although not by much.

I've used all 3 major providers - AWS, GCP, Azure.

AWS is no gem... it also has it's own byzantine processes to sign up and pay for things. And it also doesn't support any real and reasonable way to stop spend when you hit limits (abusive practices).

But at least I can generally sign up for and consume a new service without hours and hours of debugging.

For context - Google own Gemini 3 utterly fails to figure out how to do something as simple as "access the image doodle feature" proudly marketed here: https://gemini.google/overview/image-generation/

It can't figure out how to do. Honestly, I still can't figure out how to do it, despite signing up for about 5 different products, and trying 4 different UIs. The closest I got was to their inpainting/outpainting UI on the legacy models in their image create studio.

And none of that involved creating a billing account, which I already had, and was required for 3 of the signups.

As far as I'm concerned, this feature is fake marketing. It doesn't exist. That's the "quality" level of GCP.


Exact reason I used none of these platforms for my personal projects, ever.

Who is comparing to AWS and why? They can both be terrible at the same time, you know.

I've been using it lately with OpenCode and it's working pretty well (except for API reliability issues).

use openrouter instead

This is actually an excellent idea, I’ll give this a shot tonight!

> according to Bloomberg's Mark Gurman.

The entire Apple rumor world is based off this guys Sunday email.


Yep, MacRumors is one of many bottom-feeders regurgitating rumors and half-truths from a fatter bottom-feeder, trading Apple hysteria for ad impressions. It's another small signal that the tech industry is now mostly eating itself.

I’m a heavy Claude code user and it’s pretty clear they’re starting to bend under their vibe coding. Each Claude code update breaks a ton of stuff, has perf issues, etc.

And then this. They want to own your dev workflow and for some reason believe Claude code is special enough to be closed source. The react TUI is kinda a nightmare to deal with I bet.

I will say, very happy with the improvements made to Codex 5.3. I’ve been spending A LOT more time with codex and the entire agent toolchain is OSS.

Not sure what anthropic’s plan is, but I haven’t been a fan of their moves in the past month and a half.


Same, codex 5.3 was able to solve a problem that I personally was stuck on even with help from Claude for the last 2 weeks.

Yeah, I can feel it too, it _mostly_ works but.. feels like it needs a rewrite.

for example Amp "feels" much better. Also like in Amp how I can just send the message whenever and it doesn't get queued

* I know, lots of "feels" in there..


I switched to Codex 5.3 too, it's cheaper also anyway and as dumb as it sounds, Scam Altman is actually the less annoying CEO compared to Amodei which is kind of an achievement. Amodei really looking more and more like some huckster giving these idiotic predictions to the press.


So he's been accused of various crimes and has not been not found guilty?

Not like Epstein at all then.


OpenAI’s president is a Trump mega-donor

https://news.ycombinator.com/item?id=46771231


>Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive. It would be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end for they do so with the approval of their own conscience. They may be more likely to go to Heaven yet at the same time likelier to make a Hell of earth. This very kindness stings with intolerable insult. To be "cured" against one's will and cured of states which we may not regard as disease is to be put on a level of those who have not yet reached the age of reason or those who never will; to be classed with infants, imbeciles, and domestic animals.

Sam wants money. Dario wants to be your dad.

I'm going with Sam.


The article is about Greg Brockman, president of OpenAI.

Codex has been useless for me on standard Plus plan unfortunately. Actually thoroughly disappointed. And VS code integration is totally broken.

I'm not sure why I'm getting downvoted, but VS Code integration really does stink. Often times it will just simply not send the API request and just say reconnecting and I've had it simply freeze where the VS Code OpenAI Codex plugin has frozen, but all the other plugins like Cline or Roo are working perfectly fine. So the VS Code integration is almost unusable in my experience.

Salesforce is the worst, lol


I really really want local or self hosted models to work. But my experience is they’re not really even close to the closed paid models.

Does anyone any experience with these and is this release actually workable in practice?


> But my experience is they’re not really even close to the closed paid models.

They are usually as good as the flagship model for 12-18 months ago. Which may sound like a massive difference, because somehow it is, but it's also fairly reasonable, you don't need to live to the bleeding edge.


And it's worth pointing out that Claude Code now dispatches "subagents" from Opus->Sonnet and Opus->Haiku ... all the time, depending on the problem.

Running this thing locally on my Spark with 4-bit quant I'm getting 30-35 tokens/sec in opencode but it doesn't feel any "stupider" than Haiku, that's for sure. Haiku can be dumb as a post. This thing is smarter than that.

It feels somewhere around Sonnet 4 level, and I am finding it genuinely useful at 4-bit even. Though I have paid subscriptions elsewhere, so I doubt I'll actually use it much.

I could see configuration OpenCode somehow to use paid Kimi 2.5 or Gemini for the planning/analysis & compaction, and this for the task execution. It seems entirely competent.


The effort put into it is not “just a joke”. The creator knows exactly what he did and the joke excuse is weak


This is realllly cool. I have a rabbit hole to go down into tonight


I’ve always enjoyed creating working things for people to use. Which is why I love LLM coding agents.

Is it wild the thing I took a while to learn is now done basically by a GPU? Yes. But hand writing code is not my identity. Never has been.


https://www.npmjs.com/package/@vizzly-testing/honeydiff

I worked for Percy for 4 years. We were “stuck” with imagemagik to do diffing (I’m sure they still might). I was able to build my own differ with Claude/LLM help.

That special enough for you? Or?


I looked at the source. It seems most of the code is not included in the GitHub repo, which itself contains a bit of JS glue. The .tgz uploaded to npm has various prebuilt binaries. Can I take a look at the rust code?

I'm not trying to imply LLMs aren't useful. I just want more info from GP so that I can evaluate their claims.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: