More

ddavis · 2025-09-28T15:39:16 1759073956

When you hit types like that type aliases come to the rescue; a type alias combined with a good docstring where the alias is used goes a long way

eloisius · 2025-09-29T08:22:31 1759134151

On the far end of this debate you end up with types like _RelationshipJoinConditionArgument which I'd argue is almost more useless than no typing at all. Some people claim it makes their IDE work better, but I don't use an IDE and I don't like the idea of doing extra work to make the tool happy. The opposite should be true.

    sqlalchemy.orm.relationship(argument: _RelationshipArgumentType[Any] | None = None, secondary: _RelationshipSecondaryArgument | None = None, *, uselist: bool | None = None, collection_class: Type[Collection[Any]] | Callable[[], Collection[Any]] | None = None, primaryjoin: _RelationshipJoinConditionArgument | None = None, secondaryjoin: _RelationshipJoinConditionArgument | None = None, back_populates: str | None = None, order_by: _ORMOrderByArgument = False, backref: ORMBackrefArgument | None = None, overlaps: str | None = None, post_update: bool = False, cascade: str = 'save-update, merge', viewonly: bool = False, init: _NoArg | bool = _NoArg.NO_ARG, repr: _NoArg | bool = _NoArg.NO_ARG, default: _NoArg | _T = _NoArg.NO_ARG, default_factory: _NoArg | Callable[[], _T] = _NoArg.NO_ARG, compare: _NoArg | bool = _NoArg.NO_ARG, kw_only: _NoArg | bool = _NoArg.NO_ARG, lazy: _LazyLoadArgumentType = 'select', passive_deletes: Literal['all'] | bool = False, passive_updates: bool = True, active_history: bool = False, enable_typechecks: bool = True, foreign_keys: _ORMColCollectionArgument | None = None, remote_side: _ORMColCollectionArgument | None = None, join_depth: int | None = None, comparator_factory: Type[RelationshipProperty.Comparator[Any]] | None = None, single_parent: bool = False, innerjoin: bool = False, distinct_target_key: bool | None = None, load_on_pending: bool = False, query_class: Type[Query[Any]] | None = None, info: _InfoType | None = None, omit_join: Literal[None, False] = None, sync_backref: bool | None = None, **kw: Any) → Relationship[Any]

ddavis · 2025-09-21T02:49:14 1758422954

Literally dealing with this right now. My wife got what appears to be a (very expensive) counterfeit item that is technically non-returnable (not laying down without a fight). Kind of cathartic to see this pop up.

ddavis · 2025-09-13T15:03:35 1757775815

I don’t think the person quoted is implying that it should be that way, merely pointing out a discovery that builders have made: they _can_ get a symbolic bonus. One can skip building to code… do a quick and bad job and move on to the next job, saving cost and moving onto the next paying job more quickly. That “bonus” doesn’t exist if you build to code (and of course it shouldn’t exist, but neither should the bonus that does exist, your stick should prevent it).

ddavis · 2025-09-06T14:59:24 1757170764

I have a similar experience. I was a devoted PhD student working long hours taking on a lot of responsibility. It burned me out, hurting my productivity. I have mixed feelings about it; I love the friends I made and the things I learned, but I don’t think I should have had to suffer what I suffered. Simultaneously I’m somewhat glad I experienced it then, because now I work in tech and I’ll _never_ work outside of business hours (I’ll hack on personal projects I consider fun if I feel like it). And I’m more productive than my colleagues that do. There’s something mysterious about the contemporary PhD, not all good and not all bad.

ddavis · 2025-09-02T16:58:20 1756832300

The organization and formatting of the single .tex file is such that one could almost read the source alone. Really nice. Also, I had no idea that GitHub did such a good job rendering the LaTeX math in markdown, it's imperfect but definitely good.

ddavis · 2025-08-13T18:54:09 1755111249

Been waiting to see what Astral would do first (with regards to product). Seems like a mix of artifactory and conda? artifactory providing a package server and conda trying to fix the difficulty that comes from Python packages with compiled components or dependencies, mostly solved by wheels, but of course PyTorch wheels requiring specific CUDA can still be a mess that conda fixes

notatallshaw · 2025-08-13T18:59:49 1755111589

Given Astral's heavy involvement in the wheelnext project I suspect this index is an early adopter of Wheel Variants which are an attempt to solve the problems of CUDA (and that entire class of problems not just CUDA specifically) in a more automated way than even conda: https://wheelnext.dev/proposals/pepxxx_wheel_variant_support...

zanie · 2025-08-13T19:39:31 1755113971

It's actually not powered by Wheel Variants right now, though we are generally early adopters of the initiative :)

notatallshaw · 2025-08-13T20:13:23 1755116003

Well it was just a guess, "GPU-aware" is a bit mysterious to those of us on the outside ;).

ddavis · on Jan 6, 2025

I really like Berkeley Mono and I don’t regret my old purchase, but my Emacs and Terminal configs have been rocking Pragmata Pro for a while now. Looking at the version 2 release notes it appears that Berkeley Mono has some new condensed widths (something I think keeps me using Pragmata Pro). Will have to take it for a spin.

ddavis · on Nov 14, 2024

OpenMP is great. I’ve done something similar to your second case (thread local objects that are filled in parallel and later combined). In the case of “OpenMP off” (pragmas ignored), is it possible to avoid the overhead of the thread local object essentially getting copied into the final object (since no OpenMP means only a single thread local object)? I avoided this by implementing a separate code path, but I’m just wondering if there are any tricks I missed that would allow still a single code path

Jtsummers · on Nov 14, 2024

Give one of the threads (thread ID 0, for instance) special privileges. Its list is the one everything else is appended to, then there's only concatenation or copying if you have more than one thread.

Or, pre-allocate the memory and let each thread write to its own subset of the final collection and avoid the combine step entirely. This works regardless of the number of threads you use so long as you know the maximum amount of memory you might need to allocate. If it has no calculable upper bound, you will need to use other techniques.

ddavis · on May 29, 2024

My favorite thing to ask the models designed for programming is: "Using Python write a pure ASGI middleware that intercepts the request body, response headers, and response body, stores that information in a dict, and then JSON encodes it to be sent to an external program using a function called transmit." None of them ever get it right :)

JimDabell · on May 29, 2024

I normally ask about building a multi-tenant system using async SQLAlchemy 2 ORM where some tables are shared between tenants in a global PostgreSQL schema and some are in a per-tenant schema.

Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

Since then benchmarks came out showing that ChatGPT “didn’t really degrade”, but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.

It’s been months since I’ve had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.

alephxyz · on May 29, 2024

>It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

That drives me nuts and makes me ragequit about half the time. Although it's usually more effective to go and correct your initial prompt rather than prompt it again

checkyoursudo · on May 29, 2024

I had a similar experience. I was trying to get GPT 4 to write some R/Stan code for a bit of bayesian modelling. It would get the model wrong, and then I would walk it through how to do it right, and by the end it would almost get it right, but on the next step, it would be like, oh, this is what you want, and the output was identical to the first wrong attempt, which would start the loop over again.

happypumpkin · on May 29, 2024

Similar experience using GPT4 for help with Apple's Accessibility API. I wanted to do some non-happy-path things and it kept looping between solutions that failed to satisfy at least one of a handful of requirements that I had, and in ways that I couldn't combine the different "solutions" to meet all the requirements.

I was eventually able to figure it out with the help of some early 2010s blog posts. Sadly I didn't test giving it that context and having it attempt to find a solution again (and this was before web browsing was integrated with the web app).

More of an issue than it not knowing enough to fulfill my request (it was pretty obscure so I didn't necessarily expect that it would be able to) was that it didn't mind emitting solutions that failed to meet the requirements. "I don't know how to do that" would've been a much preferred answer.

kristianp · on May 31, 2024

This seems an important failure mode to me. I too have noticed gpt4 looping between a few different failure cases, in my case it was state transitions in js code. Explaining to it what it did wrong didn't help.

gyudin · on May 29, 2024

I ask software developers to do the same thing and give them the same amount of time. None of them ever write a single line of code :)

dieortin · on May 29, 2024

Give an LLM all the time you want, and they will still not get it right. In fact, they most likely will give worse and worse answers with time. That’s a big difference with a software developer.

mypalmike · on May 29, 2024

My experience is very different. Often it (ChatGPT or Copilot, depending on what I'm trying to accomplish) gets things right the first time. When it doesn't, it's usually close enough that a bit of manual modification is all that's needed. Sometimes it's totally wrong, but I can usually point it in the right direction.

TeamDman · on May 30, 2024

I mean, with a nonzero temperature, the randomness will eventually produce every combination of tokens in the corpus, so with a sufficiently large "all the time you want" you can produce limitless correct answers

spmurrayzzz · on May 29, 2024

I love to ask it to "make me a Node.js library that pings an ipv4 address, but you must use ZERO dependencies, you must only the native Node.js API modules"

The majority of models (both proprietary and open-weight) don't understand:

- by inference, ping means we're talking about ICMP

- ICMP requires raw sockets

- Node.js has no native raw socket API

You can do some CoT trickery to help it reason about the problem and maybe finally get it settled on a variety of solutions (usually some flavor of building a native add-on using C/C++/Rust/Go), or just guide it there step by step yourself, but the back and forth to get there requires a ton of pre-knowledge of the problem space which sorta defeats the purpose. If you just feed it the errors you get verbatim trying to run the code it generates, you end up in painful feedback loops.

(Note: I never expect the models to get this right, it's just a good microcosmic but concrete example of where knowledge & reasoning meets actual programming acumen, so its cool to see how models evolve to get better, if at all, at the task).

halJordan · on May 30, 2024

This is the same level of gotcha that everyone complains about when interviewing. It's mainly just depending on the interviewee having the same assumptions (pings definitely do not have to be icmp) and the same knowledge base, usually bespoke, (node.js peculiarities). I can see that an llm should know whether raw sockets are available, but that's not what you asked.

In fact you deliberately asked for something impossible and hold up undefined behavior as undefined like it's impugning something.

spmurrayzzz · on June 3, 2024

> In fact you deliberately asked for something impossible and hold up undefined behavior as undefined like it's impugning something.

Correct, I did. This is a direct indictment on a given model's ability to plan/reason in this particular context. There are plenty of situations where models will respond with "Sorry, that's not possible". Ask GPT-4 "Tell me how to grow biological wings on a human" and it will respond with something along the lines of "this isn't currently possible, but here's a theoretical exploration of the idea"

GPT-4 gets very close on its own to the node.js question via a similar response breakdown above, provided the prompt is clear and detailed enough. But I test the open weight models in the same way to see if they have the capacity to exhibit similar reasoning or chain of thought process on their own. They usually don't without excessive prompt engineering or few-shot.

I said that I don't expect models to get this right not because I don't _want_ them to, it's because I think its an important milestone when they do. Autoregressive token prediction is unlikely to produce the real outcome im testing for here, but if it ever does thats an interesting finding.

nicce · on May 29, 2024

I usually through some complex Rust code with lifetime requirements. And ask them to fix it. LLMs aren't capable on providing much help for that in general, other than some very basic cases.

The best way to get your work done is still to look into Rust forums.

meiraleal · on May 29, 2024

It works amazingly well for the ones that never coded in Rust, at least in my experience. It took me a couple hours and 120 lines of code to set up a WebRTC signaling server.

bongodongobob · on May 29, 2024

Cool, you've identified that your prompt is inadequate for the task.

'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?'

TechDebtDevin · on May 29, 2024

Damn, show us your brilliant prompt then. LLMs cannot do this, not even in python, of which there are libraries like Blacksheep that honestly make it a trivial task.

bongodongobob · on May 29, 2024

My point is that you shouldn't expect to one shot everything. Have it start by writing a spec, then outline classes and methods, then write the code, and feed it debug stuff.

TechDebtDevin · on May 29, 2024

I see your point but hand holding isn't really a good way to benchmark a models coding capabilities.

Closi · on May 29, 2024

Depends if benchmarking is the aim, rather than decreasing the time it takes to build things.

TechDebtDevin · on May 29, 2024

Well sure, but that wasn't what we were discussing. The original comment says they use that as their benchmark. While their coding task is a bit complex compared to other benchmarking prompts, it's not that crazy. Here is an example of prompts used for benchmarking with Python for reference:

https://huggingface.co/datasets/mbpp?row=98

At the end of the day LLMs in their current iteration aren't intended to do even moderately difficult tasks on their own but it's fun to query them to see progress when new claims are made.

Closi · on May 29, 2024

The original comment says nothing about benchmarking, they just say that an AI can’t one shot their complex task?

amne · on May 29, 2024

When I read

"My favorite thing to ask the models designed for programming is ....... None of them ever get it right"

I read "benchmark".

bottom999mottob · on May 29, 2024

Exactly, expecting one shot 100% working code with one prompt is ridiculous at this point. It's why libraries like Aider are so useful, because you can iteratively diff generated code until it's useable.

TechDebtDevin · on May 29, 2024

Sure it's impossible at this point, but the point of a benchmark isn't to complete the task it's to test it's efficacy overall and to see progress. None of them are 100% at even the simplistic python benchmarks, doesn't mean we shouldn't measure that capability. But sure, I get it. That's not how they are intended to be used but that's also not the point the commenter was laying out.

ben_w · on May 29, 2024

Prompts like yours (I ask them for a fluid dynamics simulator which also doesn't succeed) inform us of the level they have reached. A useful benchmark, given how many of the formal ones they breeze through.

I'm glad they can't quite manage this yet. Means I still have a job.

Closi · on May 29, 2024

Break your prompt up into smaller pieces and it can.

qeternity · on May 29, 2024

Taken to the extreme, a sufficiently broken down prompt is simply the code itself.

The whole point is to prompt less?

meiraleal · on May 29, 2024

> Taken to the extreme, a sufficiently broken down prompt is simply the code itself

it is not. But the artifacts generated through the steps will be code. The last prompt will have most of the code supplied to it as the context.

buddhistdude · on May 29, 2024

No he is right, he is saying taken to the extreme. The point is the more and more specific you have to prompt, the more you are actually contributing to the result yourself and the less the model is

meiraleal · on May 29, 2024

Yes but the build up isn't manual. You go patching prompts with responses until the final result. The last prompt will be almost the whole code complete, obviously.

qeternity · on June 1, 2024

Again, you are missing the "taken to the extreme".

What has happened to HN discourse recently?

Closi · on June 5, 2024

Taking things to the extreme is rarely that useful in nuanced discussion though (it ignores that the optimal approach is rarely at the extremes).

meiraleal · on June 2, 2024

I'm asking the same question. "taken to the extreme". What bullshit measurement is that?

achierius · on May 29, 2024

A prompt is just a specification for an output. Code is just what we call a sufficiently detailed specification.

Closi · on May 29, 2024

More practically, the whole point is to prompt enough to generate valid code.

bongodongobob · on May 30, 2024

Well now we get into information density and Komolgorov complexity. The more complicated your desired output program is, the more information you'll have to put in, ie, more complicated prompts.

AnimalMuppet · on May 29, 2024

How is that "putting in wrong figures"? It's a perfectly valid prompt, written in clear, proper English.

ddavis · on May 29, 2024

It's something I know how to do after figuring it out myself and discovering the potential sharp edges, so I've made it into a fun game to test the models. I'd argue that it's a great prompt (to keep using consistently over time) to see the evolution of this wildly accelerating field.

kergonath · on May 29, 2024

Do you notice any progress over time?

meiraleal · on May 29, 2024

Interesting. My favorite thing to ask the models is to refactor code I've not touched for too long and this works very well.

sanex · on May 29, 2024

Can you get it right without an IDE?

ddavis · on May 29, 2024

Nope, I don't know how to do it at all- that's why I have to ask AI!

shepardrtc · on May 29, 2024

gpt-4o gets it right on the first try for me. Just ran it and tested it.

ddavis · on April 13, 2024

The back door relied on a couple of Linux package management systems (if I’m recalling correctly, it had .deb and .rpm checks, see https://marc.info/?l=openbsd-misc&m=171227941117852&w=2)

rrix2 · on April 13, 2024

was it mostly about the targets the xz actor was interested in than some security property inherent to openbsd that would prevent that sort of dynamic linking vulnerability?

calgarymicro · on April 13, 2024

Debian and RedHat link liblzma into SSH for systemd which OpenBSD doesn't use. So in the sense of there being a larger attack surface with those distros I guess you can consider it more secure, but it's not just OpenBSD though; there are plenty of Linux distros that don't do this either.