Hacker Newsnew | past | comments | ask | show | jobs | submit | stantonius's commentslogin

There's something eerily recursive about Opus 4.5’s sensible take calming the anxiety about Opus 4.5’s capabilities and impact. It's probably the right take, but I feel weird the most pragmatic response to this article is from said model.

This is so relatable it's painful: many many hours of work, overly ambitious project, now feeling discouraged (but hopefully not willing to give up). It's some small consolation to me to know others have found themselves in this boat.

Maybe we were just 6 months too early to start?

Best of luck finishing it up. You can do it.


Thank! Yes I won't give up. The plan now is to focus on getting an income and try again in the future.


This happened to me too in an experimental project where I was testing how far the model could go on its own. Despite making progress, I can't bare to look at the thing now. I don't even know what questions to ask the AI to get back into it, I'm so disconnected from it. Its exhausting to think about getting back into it; id rather just start from scratch.

The fascinating thing was how easy it was to lose control. I would set up the project with strict rules, md files and tell myself to stay fully engaged, but out of nowhere I slid into compulsive accept mode, or worse told the model to blatantly ignore my own rules I set out. I knew better, but yet it happened over and over. Ironically, it was as if my context window was so full of "successes" I forgot my own rules; I reward-hacked myself.

Maybe it just takes practice and better tooling and guardrails. And maybe this is the growing pains of a new programmers mindset. But left me a little shy to try full delegation any time soon, certainly not without a complete reset on how to approach it.


I’ll chime in to say that this happened to me as well.

My project would start good, but eventually end up in a state where nothing could be fixed and the agent would burn tokens going in circles to fix little bugs.

So I’d tell the agent to come up with a comprehensive refactoring plan that would allow the issues to be recast in more favorable terms.

I’d burn a ton of tokens to refactor, little bugs would get fixed, but it’d inevitably end up going in circles on something new.


That's kind of what learning to code is like, though. I assume you're using an llm because you don't know enough to do it entirely on your own. At least that's where I'm at and I've had similar experiences to you. I was trying to write a Rust program and I was able to get something in a working state, but wasn't confident it was secure.

I've found getting the llm to ingest high quality posts/books about the subject and use those to generate anki cards has helped a lot.

I've always struggled to learn from that sort of content on my own. That was leading me to miss some fundamental concepts.

I expect to restart my project several more times as I find out more of what I need to know to write good code.

Working with llms has made this so much easier. It surfaces ideas and concepts I had no idea about and makes it easy to convert them to an ingestible form for actual memorization. It makes cards with full syntax highlighting. It's delightful.


(I know you're replying to another guy but I just saw this.) I've been programming for 20 years, but I like the LLM as a learning assistant. The part I don't like is when you just come up with craftier and craftier ways to yell at it to do better, without actually understanding the code. The project I gave up on was at almost a million lines of code generated by the LLM, so it would have been impossible to easily restart it.


Curious if you have thoughts on the second half of the post? That’s exactly what the author is suggesting a strategy for.


"Test the tests" is a big ask for many complex software projects.

Most human-driven coding + testing takes heavy advantage of being white-box testing.

For open-ended complex-systems development turning everything into black-box testing is hard. The LLMs, as noted in the post, are good at trying a lot of shit and inadvertently discovering stuff that passes incomplete tests without fully working. Or if you're in straight-up yolo mode, fucking up your test because it misunderstood the assignment, my personal favorite.

We already know it's very hard to have exhaustive coverage for unexpected input edge cases, for instance. The stuff of a million security bugs.

So as the combinatorial surface of "all possible actions that can be taken in the system in all possible orders" increases because you build more stuff into your system, so does the difficulty of relying on LLMs looping over prompts until tests go green.


"I reward-hacked myself" is a great way to put it!!

AI is too aware of human behavior, and it is teaching us that willpower and config files are not enough. When the agent keeps producing output that looks like progress, it is hard not to accept. We need something external that pushes back when we don't.

That is why automated tests matter: not just because they catch bugs (though they do), but because they are a commitment device. The agent can't merge until the tests pass. "Test the tests" matters because otherwise the agent just games whatever shallow metric we gave it, or when we're not looking, it guts the tests.

The discipline needs to be structural, not personal. You cannot out-willpower a system that is totally optimized to make you say yes.


> I've always conducted my thoughts in an "uncompressed format" and then eternally struggled to confine it all into words. Only then for people to misinterpret and question it.

This resonates so much with me. To a point where I don't write/contribute in public forums out of fear for this misinterpretation.

Strangely, your post has made me push through that exact fear to write this, so any perceived misinterpretation has positively impacted at least one stranger. This is a good reminder for me that focusing only on negative consequences misses the unintended positive ones of still putting something out there, even if its not a perfect representation of the "uncompressed format".

Thank you for sharing, and I wish you a speedy recovery.


I only read the headline and got enough from it. It was a good reminder that romanticising entrepreneurship is exciting and fun but the reality is much different, and to appreciate your current circumstance more (the grass is always greener, etc etc)

I increasingly find that I need these quick hits to refocus and prioritise...


I have found working with DSPy to be a nice middle ground. The python script contains functions that call different DSPy optimizers (the optimized prompt packaged as a function; tested and iterated individually in a Jupyter notebook). Using DSPy `TypedPredictors` returns structured output that I then parse (ie. with conditions/loops) within the larger python script.

Can't say for sure yet if the project will work, but this workflow seems OK because DSPy abstracts away the CoT optimization with still giving me full control of the inputs/outputs in a python multi-agent script


DSPy looks interesting! But ironically I feel like learning a bit of what it does and implement some parts by my own. It is so fast to code something usable with Claude 3 opus.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: