Looks very interesting, I fully agree that running CI locally is viable.
But what I didn't pick up for a quick scan of README is best pattern for integrating with git. Do you expect users to manually run (a script calling) selfci manually or is it hooked up to git or similar? When does the merge hooks come into play? Do you ask selfci to merge?
Not an AI researcher and I don't really know, but intuitively it makes a lot of sense to me.
To do well as an LLM you want to end up with the weights that gets furthest in the direction of "reasoning".
So assume that with just one language there's a possibility to get stuck in local optima of weights that do well on the English test set but which doesn't reason well.
If you then take the same model size but it has to manage to learn several languages, with the same number of weights, this would eliminate a lot of those local optima because if you don't manage to get the weights into a regime where real reasoning/deeper concepts is "understood" then it's not possible to do well with several languages with the same number of weights.
And if you speak several languages that would naturally bring in more abstraction, that the concept of "cat" is different from the word "cat" in a given language, and so on.
Asking because I was looking at both Cloudflare and Bunny literally this week...and I feel like I don't know anything about it. Googling for it, with "hackernews" as keyword to avoid all the blogspam, didn't bring up all that much.
(I ended up with Cloudflare and am sure that for my purposes it doesn't matter at all which I choose.)
- The free CDN is basically unusable with my ISP Telekom Germany due to a long-running and well documented peering dispute. This is not necessarily an issue with Cloudflare itself, but means that I have to pay for the Pro plan for every domain if I want to have a functioning site in my home country. The $25 per domain / project add up.
- Cloudflare recently had repeated, long outages that took down my projects for hours at a time.
- Their database offering (D1) had some unpredictable latency spikes that I never managed to fully track down.
- As a European, I'm trying to minimize the money I spent on US cloud services and am actively looking for European alternatives.
You don‘t have to get the Pro plan to solve the Deutsche Telekom issues. You can also use their Argo product for $5/month - but only makes sense if your egress costs wouldn‘t exceed the pro plans pricing.
The reverse. Argo gives better peering than any paid plan. Its the reason for the product‘s existence. They can use more costly peering that they couldn‘t use with their free egress model.
Thanks for the pointer, not doubting that is true. My egress is unfortunately too large for it to make financial sense.
However, at the time I did plenty of trace routes to confirm that the Pro plans peering is at least better than the Free plan for the Telekom problem. Free plan would route traffic to NYC and back, while Pro plan traffic terminates in Frankfurt.
> from copying and pasting code into ChatGPT, to Copilot auto-completions [...], to Cursor, and finally the new breed of coding agent harnesses like Claude Code, Codex, Amp, Droid, and opencode
Reading HN I feel a bit out of touch since I seem to be "stuck" on Cursor. Tried to make the jump further to Claude Code like everyone tells me to, but it just doesn't feel right...
It may be due to the size of my codebase -- I'm 6 months into solo developer bootstrap startup, so there isn't all that much there, and I can iterate very quickly with Cursor. And it's mostly SPA browser click-tested stuff. Comparatively it feels like Claude Code spends an eternity to do something.
(That said Cursor's UI does drive me crazy sometimes. In particular the extra layer of diff-review of AI changes (red/green) which is not integrated into git -- I would have preferred that to instead actively use something integrated in git (Staged vs Unstaged hunks). More important to have a good code review experience than to remember which changes I made vs which changes AI made..)
For me cursor provides a much tighter feedback loop than Claude code. I can review revert iterate change models to get what I need. It feels sometimes Claude code is presented more as a yolo option where you put more trust on the agent about what it will produce.
I think the ability to change models is critical. Some models are better at designing frontend than others. Some are better at different programming languages, writing copy, blogs, etc.
I feel sabotaged if I can’t switch the models easily to try the same prompt and context across all the frontier options
Same. For actual productions app I'm typically reviewing the thinking messages and code changes as they happen to ensure it stays on the rails. I heavily use the "revert" to previous state so I can update the prompt with more accurate info that might have come out of the agents trial and error. I find that if I don't do this, the agent makes a mess that often doesn't get cleaned up on its way to the actually solution. Maybe a similar workflow is possible with Claude Code...
You can ask Claude to work with you step by step and use /rewind. It only shows the diff though, which, hides some of the problem. Since diffs can seem fine in isolation, but when viewed in context can have obvious issues.
Ya I guess if you have the IDE open and monitor unstaged git, it's a similar workflow. The other cursor feature I use heavily is the ability to add specific lines and ranges of a file to the context. Feels like in the CLI this would just be pasted text and Claude would have to work a lot harder to resolve the source file and range
Probably an ideal compromise solution for you would be to install the official Claude Code extension for VS Code, so you have an IDE for navigating large, complex codebases while still having CC integration.
Bootstrapped solo dev here. I enjoyed using Claude to get little things done which I happed on my TODO list below the important stuff, like updating a landing page, or in your case perhaps adding automated testing for the frontend stuff (so you don't have to click yourself). It's just nice having someone coming up with a proposal on how to implement something, even it's not the perfect way, it's good as a starter.
Also I have one Claude instance running to implement the main feature, in a tight feedback loop so that I know exactly what it's doing.
Yes, sometimes it takes a bit longer, but I use the time checking what the other Claudes are doing...
Claude Code spends most of its time poking around the files. It doesn't have any knowledge of the project by default (no file index etc), unless they changed it recently.
When I was using it a lot, I created a startup hook that just dumped a file listing into the context, or the actual full code on very small repos.
I also got some gains from using a custom edit tool I made which can edit multiple chunks in multiple files simultaneously. It was about 3x faster. I had some edge cases where it broke though, so I ended up disabling it.
I see in your public issue tracker that a lot of people are desperate simply for an option to turn that thing off ("Automatically accept all LLM changes"). Then we could use any kind of plugin really for reviews with git.
Seems like there's a speed/autonomy spectrum where Cursor is the fastest, Codex is the best for long-running jobs, and Claude is somewhere in the middle.
Personally, I found Cursor to be too inaccurate to be useful (possibly because I use Julia, which is relatively obscure) – Opus has been roughly the right level for my "pair programming" workflow.
I mainly use Opus as well, Cursor isn't tied to any AI model and both Opus and Sonnet and a lot of others are available. Of course there's differences in how the context is managed, but Opus is usually amazing in Cursor at least.
I will very quickly @- the parts of the code that are relevant to get the context up and running right away. Seems in Claude that's harder..
(They also have their own, "Composer 1", which is just lightning fast compared to the others...and sometimes feels as smart as Opus, but now and then don't find the solution if it's too complicated and I have to ask Opus to clean it up. But if there's simple stuff I switch to it.)
> remember which changes I made vs which changes AI made..
They are improving this use case too with their enhanced blame. I think it was mentioned in their latest update blog.
You'll be able to hover over lines to see if you wrote it, or an AI. If it was an AI, it will show which model and a reference to the prompt that generated it.
Others have mentioned SVG AI tools... I've tried 3-4 over the previous days and eventually ended up with svgai.org (after I've used Google Gemini for bitmap).
You can instruct it to make edits, or say "Use SVG gradients for the windows" and so on and you can further iterate on the SVG.
It can be frustrating at times, but the end result was worth it for me.
Though for some images I've done 2-3 roundtrips manual editing, Nano Banana, svgai.org ...
The advantage is that it produces sane output paths that I can edit easily for final manual touches in Inkscape.
Some of the other "AI" tools are often just simply algorithms for bitmap->vector and the paths/curves they produce are harder to work with, and also give a specific feel to the vector art..
I honestly don't think humans fit your definition of intelligent. Or at least not that much better than LLMs.
Look at human technology history...it is all people doing minor tweaks on what other people did. Innovation isn't the result of individual humans so much as it is the result of the collective of humanity over history.
If humans were truly innovative, should we not have invented for instance at least a way of society and economics that was stable, by now? If anything surprise me about humans it is how "stuck" we are in the mold of what others humans do.
Circulate all the knowledge we have over and over, throw in some chance, some reasoning skills of the kind LLMs demonstrate every day in coding, have millions of instances most of whom never innovate anything but some do, and a feedback mechanism -- that seems like human innovation history to me, and does not seem like demonstrating anything LLMs clearly do not possess. Except of course not being plugged into history and the world the way humans are.
I just long for DBs to evolve from "stateful" to "stateless". CQRS at the DB level.
* All inserts into append only tables. ("UserCreatedByEnrollment", "UserDeletedBySupport" instead of INSERT vs UPDATE on a stateful CRUD table)
* Declare views on these tables in the DB that present the data you want to query -- including automatically maintained materialized indices on multiple columns resulting from joins. So your "User" view is an expression involving those event tables (or "UserForApp" and "UserForSupport"), and the DB takes care of maintaining indices on these which are consistent with the insert-only tables.
* Put in archival policies saying to delete / archive events that do not affect the given subset of views. ("Delete everything in UserCreatedByEnrollment that isn't shown through UserForApp or UserForSupport")
I tend to structure my code and DB schemas like this anyway, but lack of smoother DB support means it's currently for people who are especially interested in it.
Some bleeding edge DBs let you do at least some of this efficient and user-friendly. I.e. they will maintain powerful materialized views and you don't have to write triggers etc manually. But I long for the day we get more OLTP focus in this area not just OLAP.
My point is that event sourcing would have been a lot less painful if popular DBs had builtin support for it in the way I describe.
If you go with event sourcing today you end up with having to do a lot of things that the DB could have been able to handle automatically, but there's an abstraction mismatch.
(I've worked with 3-4 different strategies for doing event sourcing in SQL DBs in my career)
Why use a nailgun instead of a hammer, if the nailgun still requires supervision and handholding?
Example: Say I discover a problem in the SPA design that can be fixed by tuning some CSS.
Without LLM: Dig around the code until I find the right spot. If it's been some months since I was there this can easily cost five minutes.
With LLM: Explain what is wrong. Perhaps description is vague ("cancel button is too invisible, I need another solution") or specific ("1px more margin here please"). The LLM makes a best effort fix within 30 secs. The diff points to just the right location so you can fine tune it.
reply