More

Tanjreeve · 2025-09-13T05:11:25 1757740285

This might be the most annoying habit of corporarte AI that it might be one of the few industries that goes around demanding everyone else provides clear use cases and proof of efficacy for it.

1. If the benchmarks are just testing the ability to get the answers from history then something is clearly wrong with the benchmark.

2. If that's even a possibility then that's going to lower confidence in the ability to deal with the vast majority of problems where you don't already have the answer written down.

3. That's not the customers problem to solve on behalf of the vendor.

Tanjreeve · 2025-09-13T04:59:10 1757739550

Why does this matter if these models are a super intelligence with reasoning etc and don't need the answers sucked off the internet?

Tanjreeve · 2025-09-13T04:51:55 1757739115

This reads like you're ridiculing people for being proved right?

ripped_britches · 2025-09-13T16:24:04 1757780644

No the point of the comment is that there is no meaningful difference between model performance improvements from before and after this news of a benchmark weakness (spoiler alert, almost all of the benchmarks contain serious problems). The models are improving every quarter whether HN likes it or not.

Tanjreeve · 2025-06-09T01:27:23 1749432443

The first two statements would upset a lot of people but I think you'd find theyre arguably true. Most software products are various flavours of configuration. Unless you're genuinely leveraging some novel algorithm/hardware etc it's very hard to argue it's R&D if it's just branding on a collected bag of software various OSS/commercial companies developed. Claiming all software is R&D because you leverage OSS and put a known algorithm on top of some components would be like a supermarket claiming to be a research company because they have a different mix of products + customer experience to their rivals.

I think the third statement is a bit personal so will leave that alone.

simoncion · 2025-06-09T03:35:35 1749440135

So, it seems that you and I are definitely in agreement about the rough proportion of novel research done to "figure out how to fit preexisting things together and patch over the areas where the parts mate poorly" in the software field. I'd argue that the latter activity does qualify for inclusion in the development half of R&D, but -because I don't know [0] the relevant legal definition of the term, I won't strongly argue for the position.

The unfortunate thing that kicked off this discussion was that you talked about "...[selling] software IP...". Thanks to active work by copyright maximalists [1] over the past 20+ years, the term "Intellectual Property" applies just as well to plug-and-chug Enterprise CRUD software that sells for megabucks as it does to leading-edge research projects that -like- actually have Key Personnel and die dead if those folks go away. Anyone who is capable of paying attention and has been in the industry for more than a year or three is quite aware that plug-and-chug CRUD is far more valuable than the overwhelming majority of the people who make it.

So, yeah. From that arose the confusion.

[0] ...and because I can't be arsed to go look it up...

[1] My position in brief: copyright and patents are absolutely essential, copyright terms are insanely long, and patents frequently granted when they should not

Tanjreeve · 2025-05-27T07:24:31 1748330671

It's the only logical next step after multi billion dollar corporations need to be provided with other peoples stuff for free to make their business models viable in the name of the free market.

Tanjreeve · 2025-05-27T07:17:25 1748330245

The only actual hard data cited in this article is the opposite conclusion to the headline.

>Still, language-learning app Duolingo and fintech app Klarna have recently walked back aggressive stances on replacing humans with AI.

>Some studies have also shown AI isn’t panning out as much as hoped, so far. An IBM survey found that 3 in 4 AI initiatives fail to deliver their promised ROI. And a National Bureau of Economic Research study of workers in AI-exposed industries found that the technology had next to no impact on earnings or hours worked.

The data in favour of the articles conclusion is "some linkedin influencer said so" and

>But Indeed’s findings show that “for about two-thirds of all jobs, 50% or more of those skills are things that today’s generative AI can do reasonably well, or very well.”

And if you read THAT article that's linking it's some MBAs speculating again at a conference. Which isn't inherently a bad thing not everything in the world can or should be quantified in a clear statistical conclusion. But appropriating the language and implying it and then the source is "some ceo dudebro said so" should be treated with at least some dubiousness at this point when a lot more of job market trends can be explained far less nebulously by a generalised slowdown in hiring and economic shocks. If I'm being less generous I can say this is yet another example of the linkedin media complex trying to rewrite reality again.

Tanjreeve · 2025-03-14T14:02:51 1741960971

Probably never. The complexity is borne by Amazon. Even before any of the development begins if you want a RAID setup with some sort of decent availability you've already multiplied your server costs by the number of replicas you'd need. It's a Sisyphean task that also has little value for most people.

Much like twitter it's conceptually simple but it's a hard problem to solve at any scale beyond a toy.

Tanjreeve · 2025-02-26T08:47:55 1740559675

Probably same reason as anything you work on might have undocumented stuff. Combo of lack of time and/or not wanting to imply support for unstable/experimental features. If you're only screwing over the team on the next desk or whatever it's a lot easier to change things.

Tanjreeve · 2025-02-25T07:48:50 1740469730

If you're scared of SQL/have a massive operations team to throw infrastructure problems over the fence then that would be a positive to push all complexity into the application code as you aren't the one paying that cost.

Tanjreeve · 2025-02-25T07:36:41 1740469001

Reputation matters. If someone comes to market with a shoddy product or missing features/slideware then it's a self created problem that people don't check the product release logs every week for the next few years waiting for them rectifying it. And even once there is an announcement people are perfectly entitled to have scepticism that it isn't a smoke and mirrors feature and not spend hours doing their own due diligence. Again self created problem.