Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Found funny that for something that is pretty much a commodity at this point, adoption seems to be the most important metrics.

Yes, there are differences between the models, and yes some may work better.

But picking the model at this point is just picking the cheapest option. For most use cases any model will do.



That is not my experience at all.

Models are still leapfrogging each other every month in e.g. coding or research capability, or even in more mundane tasks such as summerizing long multi topic texts.

Depending on which side of the issue you fall, you're hoping this will go on for a long time to come, or praying that it will end asap.

I'm not using the cheapest in neither my own support, nor in my production systems.


from what I've seen the "leapfrogging" is very very incremental.

They all seems to racing to the plateau... It doesn't look like there will ever be a "stand out" leader and the product that each company presents to the market appears to be essentially the same product that everyone else presents to the market. Maybe with some slight twist to it that is easily recruitable or exceedable within a few months.

This is the issue really. at some point the investors are all going to realize that non of their investments are going to be market leaders. When they get to that stage the bubble will well and truly pop.


> They all seems to racing to the plateau.

To me it feels there is no plateau and the models are already very useful and impactful.

I believe there is no plateau because there is nothing objectively special or magical about the human mind and it all can and will be eventually solved, one hack at a time.


There seems to be some part of the LLM capabilities being lost by hacking some benchmarks.

Claude 3.7 is a great example of a model clearly beating 3.5 in all benchmarks, but slowly destroying my code base by adding lots of extra lines or hacking around my instructions (adding ,,if'' statements when I want it to change the code to handle a case instead of understanding what change is really needed to be done for it).

I still prefer o1 pro and a lot of those leapfrogging in benchmarks don't translate to being smarter anymore.


There are a ton of users that just want help generating emails, or adding some stock photo-like images to a blog post.

If the choice is between something that costs $10 a month or $20 a month, and both solve those use cases, it's rational to pick the cheap one.


And there will probably also be choice of something free and something trial... Which mean even less money spend.


Adoption is critical for these LLM corporations, because unlike in other industries, here free tier users incur almost the same costs as the paid tier users. They really can't degrade free tier experience too much, or their customers will flee to the competitors. I've read one guy calculating expenses of these corpos and they are truly insane by now and are constantly rising.


> and yes some may work better.

Isn't that where the cost lies? Data, annotation, and model generation all have mostly linear responses to changes in spending.

> For most use cases any model will do.

They'll operate. They will not produce reliable results. Adoption is one metric, but intentional avoidance should be another.


Full agree!

Being close to the edge of AI usage, it's important to realize that most AI use cases are not "fully autonomous AI software engineer" or "deep research into a niche topic" but way more innocuous: Improve my blog post, what's the capital of France, what are some nice tourist sites to see around my next vacation destination.

For those non-edge use cases, costs are an issue, but so are inertia and switching costs. A big reason OpenAI and ChatGPT are so huge is that it's still their go-to model for all of these non-edge use cases as it's well known, well adopted, and quite frankly very efficiently priced.


How do you compose the cheapest models to create a software engineer?


You don't have to create a real software engineer, you just have to create one that looks close enough to get some executive his bonus and won't fall over before he's moved on to another company...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: