Hacker Newsnew | past | comments | ask | show | jobs | submit | more CjHuber's commentslogin

If it’s because of that, then honestly it’s as insane as the deepseek thing where all the info was released weeks before but the markt got nervous only when they released an app. I mean info about Gemini 3 is out quite a while now and of course they trained it using TPUs, I didn’t even think that was in question.


I didn't know they only used TPUs.


If it’s on par in code quality, it would be a way better model for coding because of its huge context window.


Sonnet can also work on 1M context. Its extreme speed is the only thing Gemini has on others.


Can it now in Claude Code and Claude Desktop? When I was using it a couple of months ago it seemed only the API had 1M


I don’t know. I‘m always a bit appalled that getting privacy in firefox requires you to disable so many flags in the user.js or use something like arkenfox. It feels kind of dishonest of them that they don‘t surface those settings when they‘re enabled by default. Of course there is librefox, but still I feel like there shouldn’t even have to be reason for an extra fork like that.


You mean the standards that Google adds to chrome and then turns into standards?


From a non-american perspective it seems to me that in the US the problem of homelessness often gets mistaken for a problem with the homeless, maybe changing that narrative is a starting point?


With drugs in the mix things get complicated. Many cities tried giving these people free homes/rooms but because drug laws were strictly enforced there the homeless chose to stay on the streets. You’re not going to get rid of the fentanyl and meth addicted homeless unless you do it by force.


Are they? I mean I wouldn't say they are strictly deterministic, but with a temperature and topk of 0 and topp of 1 you can at least get them to be deterministic if I'm correct. In my experience if you need a higher temp than 0 in a prompt that is supposed to be within a pipeline, you need to optimize your prompt rather than introduce non determinism. Still of course that doesn't mean some inputs won't give unexpected outputs.


In the hard, logically rigorous sense of the word, yes they are deterministic. Computers are deterministic machines. Everything that runs on a computer is deterministic. If that wasn’t the case, computers wouldn’t work. Of course I am considering the idealized version of a computer that is immune to environmental disturbances (a stray cosmic ray striking just the right spot and flipping a bit, somebody yanking out a RAM card, etc etc).

LLMs are computation, they are very complex, but they are deterministic. If you run one on the same device, in the same state, with exactly the same input parameters multiple times, you will always get the same result. This is the case for every possible program. Most of the time, we don’t run them with exactly the same input parameters, or we run them on different devices, or some part of the state of the system has changed between runs, which could all potentially result in a different outcome (which, incidentally, is also the case for every possible program).


> Computers are deterministic machines. Everything that runs on a computer is deterministic. If that wasn’t the case, computers wouldn’t work.

GPU operations on floating point are generally not deterministic and are subject to the whims of the scheduler


If the state of the system is the same, the scheduler will execute the same way. Usually, the state of the system is different between runs. But yeah that’s why I qualified it with the hard, logically rigorous sense of the word.


> Are they? I mean I wouldn't say they are strictly deterministic, but with a temperature and topk of 0 and topp of 1 you can at least get them to be deterministic if I'm correct.

the mathematics might be

but not on a GPU, because floating point numbers are an approximation, and their operations are not commutative

if the GPUs internal scheduler reorders the operations you will get a different outcome

remember GPUs were designed to render quake, where drawing pixels slightly off is imperceptible


I‘d consider DSPy to be one. While the prompts it is using are not the most elaborate, they are well tested and reliable


In my opinion the author of this article makes the same error as Nietzsche did. Seeing it as a book about psychology that revolves around theological themes, when it is a book about theology that revolves around psychological themes.

What I find quite problematic is how it fully inverts what the Grand Inquisitor is about through its oversimplification. Its not an indictment of Christ, it’s about an indictment of Christ, but in fact it‘s Christ who is exposing the institutional church , how Ivan views it, with the only thing it ironically doesnt ‘t expect of him: a deeply Christian act

Alyosha is quite literally introduced as the hero of the novel, so saying his presence pales in comparison to Dimitry and Ivan is kind of weak. I‘d agree that his spectacle pales in comparison to his brothers, but his presence not at all.


> Seeing it as a book about psychology that revolves around theological themes, when it is a book about theology that revolves around psychological themes.

I don't think Dostoevsky's Christianity is genuine. It feels about as genuine as Hegel's Christianity. To both of them it's just a convenient prop where their actual ideas take center stage

Every nihilist main character that he writes follows this pattern where they do something really bad, then destroy themselves as some kind of act of penance. This is the only way that conversion happens in his books. But in this case, it's obviously just a way of processing guilt (and reenacting the author's trauma from near execution most likely)

Maybe I'm psychologizing religion too much, but I don't think religious belief is genuine if it's rooted in some kind of (obvious) psychological trauma

The one thing that stands out in Dostoevsky is the psychological depth of the characters, especially in Demons


That is very interesting to read as I view him as very genuinely Christian. I would not agree with your description of the nihlist characters. I see it that mostly the (self)destructive acts are earlier nihilist actions and only then after facing and having accepted the consequences, the repentance emerges like in Crime and Punishment. The ones that stay nihilistic and don’t repent at all still face destruction.


I tend to agree with bayareapsycho that Dostoyevsky doesn't feel genuinely christian. In his books God is derived out of ultimate necessity, it's the only sane way to survive. If you follow that line of thought you can come to a conclusion that if God didn't exist people would have to invent him. And if that's the case then maybe they DID invent him after all. It's just that it doesn't matter.

Dostoyevsky himself never goes that far in his books but I feel that the direction is set pretty clearly. It could be that I'm reading my thoughts into his works though.


So what counts as a genuine Christian in your view?

I can agree with that his books often suggest that God is the only sane way to survive, but I don‘t agree that this reduces him to only a useful necessity.

Ironically the conclusion you are making aligns very closely with what the Grand Inquisitor is preaching to Christ. And as the Grand Inquisitor is Ivan’s story, and not a plot in the book, I feel like Dostoyevsky is tackling that exact topic very prominently in the book on multiple levels. Especially through the response of the kiss.


> So what counts as a genuine Christian in your view?

I assumed that among other things it requires just accepting that God exists whether we need him or not. The practicality of having God around seems off to me, but in the end I'm not a genuine Christian myself so it's hard to judge.


Can you maybe give an example you’ve encountered of an algorithm or a data structure that LLMs cannot handle well?

In my experience implementing algorithms from a good comprehensive description and keeping track of data models is where they shine the most.


Example (expanding on [1]): I want to design a strongly typed fluent API interface to some role/permissions based authorization engine functionality. Even knowing how to shape the fluent interface so that is powerful but intuitive, as strongly typed as possible but also and maintainable, is a deep art.

One reason I know LLM can't come close to my design is this: I've written something that works (that a typical senior engineer might write), but this not enough. I have evaluated it critically (drawing on my experience with long lived software), rewritten it again to better meet the targets above, and repeated this process several times. I don't know what would make an LLM go: now that kind of works, but is this the most intuitive, well typed, and maintainable design that there could be?

1. https://news.ycombinator.com/item?id=45728183


Funny you should use role/permissions as an example here, I spent the weekend using Claude Code to rewrite my own permissions engine to a new design that uses SQL queries to solve the problem "list all of the resources that this actor can perform this action on".

My previous design required looping through all known resources asking "can actor X action Y on this?". The new design gets to generate a very complex by thoroughly tested SQL query instead.

Applying that new design and updating the hundred of related tests would have taken me weeks. I got it done in two days.

Here's a diff that captures most of the work: https://github.com/simonw/datasette/compare/e951f7e81f038e43...


Working on permissions for a large saas app, I can also confirm that the best LLM of the market have maybe a 10% success rate writing code in this area.


What % of the total amount of software (lessay lines of code or time invested) in the world is like that?


Converting an algorithm implementation from recursive to iterative: it got the concept broadly right, but was quite bad at making the logic actually match up, often refusing to fix mistakes or reverting fixes two edits later. Still a positive experience though, since it was fixable issues and reduced the amount of tedious copies I had to type


Did you have it write tests and give it the ability to iterate & validate its implementation without you in the loop?

Anything less is setting it up for failure...


Yes, but it got 99% of those then got stuck on why the others made no sense to it


It’s important to understand the tests it’s written yourself.

If you’d like some help I’d be glad to, just drop me an email.

My email’s in my profile.


Claude added a self re-calling timeout to my Typescript game loop to track time. Manually by adding 1000ms every time it's called.

I removed it and it later just added it again.

It's this small weird things where it can mess up a lot of code.


There are severe edge cases. Here are some of the last days.

Eg. Just updating bootstrap to angular bootstrap. It didn't transfer how I placed the dropdowns ( basically using dropdown-end). So everything was out of view in desktop and mobile.

It forgot the transloco I used everywhere and just used default English ( happens a lot).

Suggested code that fixed 1 bug ( expression property recursion), but now linq to SQL was broken.

Upgrade to angular 17 in a asp.net core app. I knew it used vite now. But it also required a browser folder to deploy. 20 changes down the road, I noticed something on my ui wasn't updated in dev ( fast commits for my side project, I don't build locally), it didn't deploy anything related to angular no more...

I had 2 files named ApplicationDbContext and it took the one from wrong monolith module.

It adds files in the wrong directory sometimes. Eg. Some modules were made with feature folders.

It sometimes forgets to update my ocelot gateway or updates the compressed version. ...

Note: I documented my architecture in eg. cline. But I use multiple agents to experiment with.

Tldr: it's an expert beginner programmer.


Do you have any automated tests for that project?

I'm bringing to suspect a lot of my great experiences with coding agents come from the fact that they can run tests to confirm they haven't broken anything.


The test loop is integral.

It’s kind of annoying hearing all this skepticism from people putting in the least effort into optimally using the tool. There is a learning curve. Every month I’ve gotten better results than the last because I’m constantly context building and refining, understanding how, what and when to prompt.

It’s like hearing someone say database suck but they haven’t bothered to learn about or use indexes or foreign keys.


Most of the mentioned issues wouldn't be catched by a test loop unless you have 100% automated tests (unit tests, ...)

Which isn't always plausible ( time ). The AI makes makes different mistakes than humans that are sometimes harder to catch.


It's a lot more plausible now you can get LLMs to help write those tests in the first place.


Most of these examples build ( shallow test in my LLM on the end of the task ) and produced new edge cases


Too little.

Things moved as fast as possible to migrate from .net framework to .net core 8, angular 8 to 18 and bootstrap 4.5 to 5.x


I mean I totally get what you are saying about pull requests that are secretly AI generated.

But otherwise, writing code with LLM‘s is more than just the prompt. You have to feed it the right context, maybe discuss things with it first so it gets it and then you iterate with it.

So if someone has done the effort and verified the result like it‘s their own code, and if it actually works like they intended, what’s wrong with sending a PR?

I mean if you then find something to improve while doing the review, it’s still very useful to say so. If someone is using LLMs to code seriously and not just to vibecode a blackbox, this feedback is still as valuable as before, because at least for me, if I knew about the better way of doing something I would have iterated further and implemented it or have it implemented.

So I don‘t see how suddenly the experience transfer is gone. Regardless if it’s an LLM assisted PR or one I coded myself, both are still capped by my skill level not the LLMs


Nice in theory, hard in practice.

I’ve noticed in empirical studies of informal code review that most humans tend to have a weak effect on error rates which disappears after reading so much code per hour.

Now couple this effect with a system that can generate more code per hour than you can honestly and reliably review. It’s not a good combination.


Took me a minute to realize this is not about Chonkie. I would be interested in how this compares to the other's semantic chunking approach


you can read the labels this (-y) uses modernBERT and even has an eval comparison to the (-ie) in it's GitHub so you can see the improvement as tested -- although if you want to do vanilla rules based chinking for whatever reason your data needs then (-ie) is still good.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: