Hacker Newsnew | past | comments | ask | show | jobs | submit | bird0861's commentslogin

You seem like the type of coworker I would accept less pay to work with. Actually at a crossroads right now, did my research on my prospects and have narrowed it down to two places I most expect to be surrounded by good coworkers and managers. Cheers.

I've been asking around the last week about Go vs Elixir vs Zig, I'd love to get feedback here too. I only have time for one and I'm looking for something that can replace a lot of the stuff I do with Python. I don't have time to wait for Mojo.

I fully agree with this POV but for one detail; there is a problem with sunsetting frontier models. As we begin to adopt these tools and build workflows with them, they become pieces of our toolkit. We depend on them. We take them for granted even. And then the model either changes (new checkpoints, maybe alignment gets fiddled with) and all of the sudden prompts no longer yield the same results we expected from them after working on them for quite some time. I think the term for this is "prompt instability". I felt this with Gemini 3 (and some people had less pronounced but similar experience with Sonnet releases after 3.7) which for certain tasks that 2.5Pro excelled at..it's just unusable now. I was already a local model advocate before this but now I'm a local model zealot. I've stopped using Gemini 3 over this. Last night I used Qwen3 VL on my 4090 and although it was not perfect (sycophancy, overuse of certain cliches...nothing I can't get rid of later with some custom promptsets and a few hours in Heretic) it did a decent enough job of helping me work through my blindspots in the UI/UX for a project that I got what I needed.

If we have to perform tuning on our prompts ("skills", agents.md/claude.md, all of the stuff a coding assistant packs context with) every model release then I see new model releases becoming a liability more than a boon.


That study is garbo and I suspect you didn't even read the abstract. Am I right?


I've heard this mentioned a few times. Here is a summarized version of the abstract:

    > ... We conduct a randomized controlled trial (RCT)
    > ... AI tools ... affect the productivity of experienced
    > open-source developers. 16 developers with moderate AI
    > experience complete 246 tasks in mature projects on which they
    > have an average of 5 years of prior experience. Each task is
    > randomly assigned to allow or disallow usage of early-2025 AI
    > tools. ... developers primarily use Cursor Pro ... and
    > Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing
    > AI will reduce completion time by 24%. After completing the
    > study, developers estimate that allowing AI reduced completion time by 20%.
    > Surprisingly, we find that allowing AI actually increases
    > completion time by 19%—AI tooling slowed developers down. This
    > slowdown also contradicts predictions from experts in economics
    > (39% shorter) and ML (38% shorter). To understand this result,
    > we collect and evaluate evidence for 21 properties of our setting
    > that a priori could contribute to the observed slowdown effect—for
    > example, the size and quality standards of projects, or prior
    > developer experience with AI tooling. Although the influence of
    > experimental artifacts cannot be entirely ruled out, the robustness
    > of the slowdown effect across our analyses suggests it is unlikely
    > to primarily be a function of our experimental design.
So what we can gather:

1. 16 people were randomly given tasks to do

2. They knew the codebase they worked on pretty well

3. They said AI would help them work 24% faster (before starting tasks)

4. They said AI made them ~20% faster (after completion of tasks)

5. ML Experts claim that they think programmers will be ~38% faster

6. Economists say ~39% faster.

7. We measured that people were actually 19% slower

This seems to be done on Cursor, with big models, on codebases people know. There are definitely problems with industry-wide statements like this but I feel like the biggest area AI tools help me is if I'm working on something I know nothing about. For example: I am really bad at web development so CSS / HTML is easier to edit through prompts. I don't have trouble believing that I would be slower trying to make an edit to code that I already know how to make.

Maybe they would see the speedups by allowing the engineer to select when to use the AI assistance and when not to.


it doesnt control for skill using models/experience using models. this looks VERY different at hour 1000 and hour 5000 than hour 100.


Lazy from me to not check if I remember well or not, but the dev that got productivity gains was a regular user of cursor.

I can't emphasize this enough, it doesn't matter how good a model is or what CLI I'm using, use git and chroot (at the least, container is easier though).

Always make the agent write a plan first and save it to something like plan.md, and tell it to update the list of finished tasks in status.md as it finishes each task from plan.md and to let you review the change before proceeding to next task.


Check out Mask Banana - you might have better luck with using masks to get image models to pay attention to what you want edited.


SDXL and FLUX models with LoRAs can and do vastly outperform at tons of things singular big models can't or won't do now. Various subreddits and civitAI blogs describe comfyui workflows and details on how to maximize LoRA effectiveness and are probably all you need for a guided tour of that space.

This is not my special interest though but the DIY space is much more interesting than the SaaS offerings; this is something about generative AI more generally that also holds, the DIY scene is going to be more interesting.


Agreed. People can do things at home that couldn't be done with any of the overcapitalized-but-thoroughly-nerfed commercial models.


Pretending 16 samples is authoritative is absolutely hilarious and wild, copium this pure could kill someone. Also working on a codebase you already know biases results in the first place -- they missed out on what has become a cornerstone of this stuff for AISWE people like me: repo tours; tree-sitter feeds the codebase to the LLM and I get to find all the stuff in the code I care about by either a single well formatted meta prompt or by just asking questions when I need to.

I'll concede one thing to the authors of the study, Claude Code is not that great. Everyone I know has moved on since before July. I personally am hacking on my own fork of Qwen CLI (which is itself a Gemini fork) and it does most of what I want with the models of my choice which I swap out depending on what I'm doing. Sometimes they're local on my 4090 and sometimes I use a frontier or larger openweights model hosted somewhere else. If you're expecting a code assistant to drop in your lap and just immediately experience all of its benefits you'll be disappointed. This is not something anyone can offer without just prescribing a stack or workflow. You need to make it your own.

The study is about dropping just 16 people into a tooling they're unfamiliar with, have no mechanical sympathy for, and aren't likely to shape and mold it to their own needs.

You want conclusive evidence go make friends with people who hack their own tooling. Basically everyone I hang out with has extended BMAD, written their own agents.md for specific tasks, make their own slash commands, "skills" (convenient name and PR hijacking of a common practice but whatever, thanks for MCP I guess). Literally what kind of dev are you if you're not hacking your own tools???

You got four ingredients here you have to keep in mind when thinking about this stuff: the model, the context, the prompt, and the tooling. If you're not intervening to set up the best combination of each for each workflow you are doing then you are just letting someone else determine how that workflow goes.

Universal function approximators that can speak english got invented and nobody wants to talk to them is not the scifi future I was hoping for when I was longing for statistical language modeling to lead to code generation back in 2014 as a young NLP practitioner learning Python for the first time.

If you can't make it work fine, maybe it's not for you, but I would probably turn violent if you tried to take this stuff from me.


I have no idea why people still think it's a viable platform.


It seems highly unlikely the line from BERT to ASI threads itself with anyone responsible for Llama 4, especially almost back to back.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: