Hacker Newsnew | past | comments | ask | show | jobs | submit | atomicnumber3's commentslogin

Ah yes, the 4chan retirement plan. Die of a preventable cause at age 42 while waiting for your captcha.

If you accept that cancer is a death sentence, it’s not absurd to “self fund” your insurance with a nest egg.

You can shop around quite a bit for non urgent care, and get good cash discount.


If you accept cancer as a death sentence, you're an idiot. I had cancer at age 41. If I left it untreated, sure I'd be dead, probably by age 43. But I'm not an idiot, I had good health insurance, I was treated, and now that health event is over twenty years in the past.

Had I self-funded with a (non-existent) nest egg, I would still be in debt over $600k. Instead, my insurance had to deal with that...


While I won’t argue insurance wasn’t overall beneficial in your case…

There is no way that $600,000 is the cash price for cancer treatment (especially 20 years ago, but also today).

The average cost of cancer treatment is $150k [1], and lower with cash price + shopping.

[1] https://treatcancer.com/blog/cost-of-cancer/


I stopped cataloging the invoices after it hit $1.6M. Granted that's what the providers would bill my insurance, and we all know those are funny numbers, and while I'm sure that a concentrated effort to negotiate cheaper cash prices might have been productive, there's still the fact that I would have had to have $600k or so readily available. HYSA yields were pretty low for most of this time period, and if I had kept that kind of money in a stock portfolio, taxes would have killed me.

And it's beside the point. 99% of Americans can't afford to build a $600k nest egg just to cover medical expenses. THAT'S WHAT INSURANCE IS FOR!


Also, I wonder if this is skewed by more affordable treatments for things like basal cell carcinoma or prostate cancer that doesn't require surgical intervention. In my case, I had full on chemo, rad treatment, surgery, and more chemo. Wouldn't wish it on my worst enemy, but I'm sure as hell glad I had good insurance. Dealing with the medical side was traumatic enough, I don't think I or anyone in my family had the bandwidth to deal with negotiating cash deals with multiple providers.

600k once in 40 years is cheap compared to the total cost of insurance, especially when you consider the compound interest you could have made on premiums not paid, plus with the freedom to get cancer care cheaper someplace privately outside the US.

Your insurance company got the last laugh by a long shot. A typical family on insurance would pay $600,000 (between their take-home and the reduced wages paid by employers to cover insurance) in just 25 years, and that's before considering the opportunity cost of lost investments/yield.


Are you really suggesting that a family should not have insurance at all and save the money?

I have been working for 30 years and have never once paid more than $10K a year for insurance across 10 jobs 15 of those years were a family plan.

Hell one of those jobs was with Amazon - the company with the shittiest benefit package in all of BigTech and even then I only $12K with a family plan. Right now we pay around $10K - my wife myself and my adult but under 26 (step)son


You've likely paid at least $18k if not more like $25k for that insurance in the form of wage income moved to benefit income. The government's tax and regulatory environment post WWII just ensures that unless you choose to take it as 1099 income, your potential 1099 income gets reflected in reduced W2 wages that are paid out in benefits.

You might claim that if your employer didn't offer that benefit they'd just pay nothing, but required health benefits function much as payroll taxes which economists have showed are largely reflected in the form of reduced incomes. That is, you are paying it ~all one way or another.


We know exactly how much your employer pays for their share of your health benefits. That was also part of the ACA to disclose it to employees.

You’re not wrong - I think it’s around 2/3rds so for me it would be around $36K a year all in if I had to do COBRA.


My annual premium for insurance was roughly $2400/year. Since then, it's gone to about $6k per annum. Even compounded at whatever the S&P500 returns for a 40 year interval, I'm pretty sure I'm ahead of the game. If you think I've lost $600K by having work provided insurance, we're not dealing with the same level of reality.

How much of a nest egg do you think would let you afford a major operation like heart surgery or cancer care?

Read the qualifier.

And heart surgery is ~$60k. [1]

That's <36 months of insurance premiums according to the earlier poster.

[1] https://cost.sidecarhealth.com/ts/heart-bypass-surgery-cost-...


I have never in my 30 year career paid more than $10K a year for health care across 10 jobs and that’s including working at Amazon with their shitty benefit package

In my 15 year career, I have never paid less than $10k per year for just me and my wife for health insurance. And I basically try to pick the most sensible and affordable option, not luxury plans.

In the last 15 years I’ve worked for: General Electric when it was still a F10 company and more recently Amazon along with a 60 person startup where the family plan was $150 a month. (2018-2020) and two mid size companies in between and now I work for a mid size 1000+ person consulting company

And if you want data instead of anecdotes

https://www.business.com/articles/health-insurance-costs-thi...


Makes sense, I have never worked for a company with more than 200 employees or so. NYC area.

It cost $30k for a loved one just to go to the hospital when their heart "felt weird" but absolutely nothing turned out to be wrong and all they did was run a couple quick scans and tests. I do agree with the overall idea of what you're saying that usually the premiums are way more than what you could get care for if you just saved the money, but the numbers on the website seem very wrong. I realize it's a total anecdote but from loved one's bills it is $20-30k just to get in the door and that is if actually nothing is wrong and there is no heart attack yet they're quoting $30k for an actual heart attack care.

That's the pricing when you have insurance. It is cheaper if you don't.

This is one of the most insane things that I have realized... I have terrible insurance (in case), but I generally don't present it as it is much cheaper and faster to pay cash...

I think we can all agree that the current system is just... ridiculous

I used to be fearful of health concerns, but now I'm a carnivore and just feel great.


Even a minor one.

What are you smoking? My parents are in their 80s, both 15+ and 20+ years cancer free. At least in my mom's case (colon), not having surgery + chemo probably WOULD have been a death sentence. In Canada, their total out-of-pocket costs (other than transportation to/from the hospital) was like $15 for some painkillers.

"The value of human thinking is going down."

No! I fundamentally reject this.

The value of unoriginal thinking has gone down. Thinking which is quotidian and pedestrian has become even more worthless than it already was.

The value of true, original human thinking has gone up even higher than it ever has been.

Do we think no new companies will ever succeed now? Of course not. Who, then, will succeed? It will be innovators and original thinkers and those with excellent taste.

Why did stripe make big inroads in developer spaces even if they are in an ultra competitive low margin market? They had excellent taste in developer ergonomics. They won big not because they coded well or fast (though I know pc thinks their speed is a big factor, I think he is mostly incorrect on that) but because they had an actual sense of originality and propriety to their approach! And it resonated.

So many other products are similar. You can massively disrupt a space simply by having an original angle on it that nobody else has had. Look at video games! Perhaps the best example of this is how utterly horribly AAA games have been doing, while indie hits produce instantly timeless entries.

And soon this will be the ONLY thing that still differentiates. Artistic propriety, originality, and taste.

(And, of course, the ever-elusive ability to actually execute that I also don't think LLMs will help with.)


You need money to explore ideas.

Or an absurd amount of skill.

Say Amy is a great NodeJS and front end programer.

She can work on her own projects , but her kids can't eat hope and dreams.

Or she can get a job at SoulCrusher Solutions trying to maximum advertising revenue.


This is a compelling assertion. But who among us has truly original thoughts? How much new stuff can there be? If all the same-y stuff is losing value (but most stuff has value because it's FOUND not because it's unique) then isn't net value decreasing?

>The value of unoriginal thinking has gone down. Thinking which is quotidian and pedestrian has become even more worthless than it already was.

Imagine American manufacturing industry workers saying the same thing of the (at the time) soon-to-be import only products. Original thoughts are valued more than non-original ones but maybe, the market doesn't require that many original thoughts to extract max profit...


No no, but you don't understand! Of course I would be in the guild, I'm me. Everyone else is just you, so naturally they probably won't be. If they don't allow me, who would they let in??

(Sarcasm of course.) Isn't it weird how much of a quirk this mentality is? Almost reminds me of how everything though they were part of a select few who got to be secret agents in Club Penguin. "I believe every word you just said, because it's exactly what I wanted to hear."


Did you actually fix the issue, or did you fix the issue and introduce new bugs?

The problem is the asymmetry of effort. You verified you fixed your issue. The maintainers verified literally everything else (or are the ones taking the hit if they're just LGTMing it).

Sorry, I am sure your specific change was just fine. But I'm speaking generally.

How many times have I at work looked at a PR and thought "this is such a bad way to fix this I could not have come up with such a comically bad way if I tried." And naturally couldn't say this to my fine coworker whose zeal exceeded his programming skills (partly because someone else had already approved the PR after "reviewing" it...). No, I had to simply fast-follow with my own PR, which had a squashed revert of his change, with the correct fix, so that it didn't introduce race conditions into parallel test runs.

And the submitter of course has no ability to gauge whether their PR is the obvious trivial solution, or comically incorrect. Therein lies the problem.


This is why open source projects need good architecture and high test coverage.

I'd even argue we need a new type of test coverage, something that traces back the asserts to see what parts of the code are actually constrained by the tests, sort of a differential mutation analysis.


This could have happened before AI agents though, but yes that's another step in that direction.

Except we know how these work. There's no number sense. It's predicting tokens. It is able to recount the mathematical foundations because in its training dataset, that often happens, both in instructional material and in proofs.

I picked two random numbers between one and one million. The chances of it having seen that specific problem in its training set seem very low.

"When was the last time you reviewed the machine code produced by a compiler?"

Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

Meanwhile AI can't be trusted to give me a recipe for potato soup. That is to say, I would under no circumstances blindly follow the output of an LLM I asked to make soup. While I have, every day of my life, gladly sent all of the compiler output to the CPU without ever checking it.

The compiler metaphor is simply incorrect and people trying to say LLMs compile English into code insult compiler devs and English speakers alike.


> Compilers will produce working output given working input literally 100% of my time in my career.

In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180


These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.

It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.


> It's a little like using Bitcoin to replace currencies [...]

At least, Bitcoin transactions are deterministic.

Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).


Sure bitcoin is at least deterministic, but IMO (an that of many in the finance industry) it's solving entirely the wrong problem - in practice people want trust and identity in transactions much more than they want distributed and trustless.

In a similar way LLMs seem to me to be solving the wrong problem - an elegant and interesting solution, but a solution to the wrong problem (how can I fool humans into thinking the bot is generally intelligent), rather than the right problem (how can I create a general intelligence with knowledge of the world). It's not clear to me we can jump from the first to the second.


By eliminating the second one.


Bitcoin transactions rely on mining to notarize, which is by design (due to the nature of the proof-of-work system) incredibly non-deterministic.

So when you submit a transaction, there is no hard and fast point in the future when it is "set in stone". Only a geometrically decreasing likelihood over time that a transaction might get overturned, improving by another geometric notch with every confirmed mined block that has notarized your transaction.

A lot of these design principles are compromises to help support an actually zero-trust ledger in contrast to the incumbent centralized-trust banking system, but they definitely disqualify bitcoin transactions as "deterministic" by any stretch of the imagination. They have quite a bit more in common with LLM text generation than one might have otherwise thought.


Not sure I agree, the only axis on which Bitcoin is non-deterministic is that of time - the time to confirmation is not set in stone. Outcomes are still predictable though and follow strict rules.

It’s a fundamentally different product, LLMs are fuzzy word matchers and produce different outcomes even for the same input every time, they inject variance to make them seem more human. I think we’re straying off topic here though.


> I've personally reported 17 bugs in GCC over the last 2 years

You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.


I've been using c++ for over 30 years. 20-30 years ago I was mostly using MSVC (including version 6), and it absolutely had bugs, sometimes in handling the language spec correctly and sometimes regarding code generation.

Today, I use gcc and clang. I would say that compiler bugs are not common in released versions of those (i.e. not alpha or beta), but they do still occur. Although I will say I don't recall the last time I came across a code generation bug.


I knew one person reporting gcc bugs, and iirc those were all niche scenarios where it generated slightly suboptimal machine code but not otherwise observable from behavior


Right - I'm not saying that it doesn't happen, but that it's highly unusual for the majority of C(++) developers, and that some bugs are "just" suboptimal code generation (as opposed to functional correctness, which the GP was arguing).


This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.


I'm not arguing that LLMs are at a point today where we can blindly trust their outputs in most applications, I just don't think that 100% correct output is necessarily a requirement for that. What it needs to be is correct often enough that the cost of reviewing the output far outweighs the average cost of any errors in the output, just like with a compiler.

This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.


If natural language is used to specify work to the LLM, how can the output ever be trusted? You'll always need to make sure the program does what you want, rather than what you said.


Just create a very specific and very detailed prompt that is so specific that it starts including instructions and you came up with the most expensive programming language.


It's not great that it's the most expensive (by far), but it's also by far the most expressive programming language.


How is it more expressive? What is more expressive than Turing completeness?


This is a non-sequitur. Almost all programming languages are Turing complete, but I think we'd all agree they vary in expressivity (e.g. x64 assembly vs. TypeScript).

By expressivity I mean that you can say what you mean, and the more expressive the language is, the easier that is to do.

It turns out saying what you mean is quite easy in plain English! The hard part is that English allows a lot of ambiguity. So the tradeoffs of how you express things are very different.

I also want to note how remarkable it is that humans have built a machine that can effectively understand natural language.


>"You'll always need to make sure the program does what you want, rather than what you said."

Yes, making sure the program does what you want. Which is already part of the existing software development life cycle. Just as using natural language to specify work already is: It's where things start and return to over and over throughout any project. Further: LLM's frequently understand what I want better than other developers. Sure, lots of times they don't. But they're a lot better at it than they were 6 months ago, and a year ago they barely did so at all save for scripts of a few dozen lines.


That's exactly my point, it's a nice tool in the toolbox, but for most tasks it's not fire-and-forget. You still have to do all the same verification you'd need to do with human written code.

You trust your natural language instructions thousand times a day. If you ask for a large black coffee, you can trust that is more or less what you’ll get. Occasionally you may get something so atrocious that you don’t dare to drink, but generally speaking you trust the coffee shop knows what you want. It you insist on a specific amount of coffee brewed at a specific temperature, however, you need tools to measure.

AI tools are similar. You can trust them because they are good enough, and you need a way (testing) to make sure what is produced meet your specific requirements. Of course they may fail for you, doesn’t mean they aren’t useful in other cases.

All of that is simply common sense.


More analogy.

What’s to stop the barista putting sulphuric acid in your coffee? Well, mainly they don’t because they need a job and don’t want to go to prison. AIs don’t go to prison, so you’re hoping they won’t do it because you’ve promoted them well enough.


* prompted

> All of that is simply common sense.

Is that why we have legal codes spanning millions of pages?


The person I'm replying to believes that there will be a point when you no longer need to test (or review) the output of LLMs, similar to how you don't think about the generated asm/bytecode/etc of a compiler.

That's what I disagree with - everything you said is obviously true, but I don't see how it's related to the discussion.


I don't necessarily think we'll ever reach that point and I'm pretty sure we'll never reach that point for some higher risk applications due to natural language being ambiguous.

There are however some applications where ambiguity is fine. For example, I might have a recipe website where I tell a LLM to "add a slider for the user to scale the number of servings". There's a ton of ambiguity there but if you don't care about the exact details then I can see a future where LLMs do something reasonable 99.9999% of the time and no one does more than glance at it and say it looks fine.

How long it is until we reach that point and if we'll ever reach that point is of course still up for debate, but I dnt think it's completely unrealistic.


That's true, and I more or less already use it that way for things like one off scripts, mock APIs, etc.

I don't think the argument is that AI isn't useful. I think the argument is that it is qualitatively different from a compiler.


The challenge not addressed with this line of reasoning is the required sheer scale of output validation on the backend of LLM-generated code. Human hand-developed code was no great shakes at the validation front either, but the scale difference hid this problem.

I’m hopeful what used to be tedious about the software development process (like correctness proving or documentation) becomes tractable enough with LLM’s to make the scale more manageable for us. That’s exciting to contemplate; think of the complexity categories we can feasibly challenge now!


the fact that the bug tracker exists is proving GP's point.


Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.


Absolutely this. I am tired of that trope.

Or the argument that "well, at some point we can come up with a prompt language that does exactly what you want and you just give it a detailed spec." A detailed spec is called code. It's the most round-about way to make a programming language that even then is still not deterministic at best.


And at the point that your detailed specification language is deterministic, why do you need AI in the middle?


Exactly the point. AI is absolutely BS that just gets peddled by shills. It does not work. It might work for some JS bullcrao. But take existing code and ask it to add capsicum next to an ifdef of pledge. Watch the mayhem unfold.


This is obviously besides the point but I did blindly follow a wiener schnitzel recipe ChatGPT made me and cooked for a whole crew. It turned out great. I think I got lucky though, the next day I absolutely massacred the pancakes.


I genuinely admire your courage and willingness (or perhaps just chaos energy) to attempt both wiener schnitzel and pancakes for a crew, based on AI recipes, despite clearly limited knowledge of either.


Recent experiments with LLM recipes (ChatGPT): missed salt in a recipe to make rice, then flubbed whether that type of rice was recommended to be washed in the recipe it was supposedly summarizing (and lied about it, too)…

Probabilistic generation will be weighted towards the means in the training data. Do I want my code looking like most code most of the time in a world full of Node.js and PHP? Am I better served by rapid delivery from a non-learning algorithm that requires eternal vigilance and critical re-evaluation or with slower delivery with a single review filtered through an meatspace actor who will build out trustable modules in a linear fashion with known failure modes already addressed by process (ie TDD, specs, integration & acceptance tests)?

I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.


There was a guy a few months ago who found that telling the AI to do everything in a single PHP file actually produced significantly better results, i.e. it worked on the first try. Otherwise it defaulted to React, 1GB of node modules, and a site that wouldn't even load.

>Am I better served

For anything serious, I write the code "semi-interactively", i.e. I just prompt and verify small chunks of the program in rapid succession. That way I keep my mental model synced the whole time, I never have any catching up to do, and honestly it just feels good to stay in the driver's seat.


Pro-tip: Do NOT use LLMs to generate recipes, use them to search the internet for a site with a trustworthy recipe, for information on cooking techniques, science, or chemistry, or if you need ideas about pairings and/or cooking theory / conventions. Do not trust anything an LLM says if it doesn't give a source, it seems people on the internet can't cook for shit and just make stuff up about food science and cooking (e.g. "searing seals in the moisture", though most people know this is nonsense now), so the training data here is utterly corrupt. You always need to inspect the sources.

I don't even see how an LLM (or frankly any recipe) that is a summary / condensation of various recipes can ever be good, because cooking isn't something where you can semantically condense or even mathematically combine various recipes together to get one good one. It just doesn't work like that, there is just one secret recipe that produces the best dish, and the way to find this secret recipe is by experimenting in the real world, not by trying to find some weighting of a bunch of different steps from a bunch of different recipes.

Plus, LLMs don't know how to judge quality of recipes at all (and indeed hallucinate total nonsense if they don't have search enabled).


> I don't even see how an LLM (or frankly any recipe) that is a summary / condensation of various recipes can ever be good

It's funny, I actually know quite a few (totally non tech) people who uses (and like using) LLMs for recipes/recipes ideas.

They probably have enough experience to push back when there's a bad idea, or figure out missing steps/follow up.

Thinking about it, it sounds a bit like LLM usage for coding where an experienced programmer can get more value out of it.


If you have lots of experience from years of serious cooking, like I do, almost everything the LLM suggests or outputs re: cooking is false, bad or at best incredibly sub-par, and you will spend far more time correcting it and/or pushing it toward what you already know for it to be actually helpful / productive in getting you anything actually true. I also think it just messes up incredibly basic stuff all the time. I re-iterate it is only good for the things I said.

Whether or not you think you can get "good" recipes out of it will also depend on your experience with cuisine and cooking, and your own pickiness. I am sure amateurs or people who cook only occasionally can get use out of it, but it is not useful for me.

Cooking is a very different world from coding: recipes aren't composable like code (within-recipe ratios need to be maintained, i.e. recipes written in bakers ratios/proportions, steps are almost always sequentially dependent, and ingredients need to complement each other) and most sources besides the few good empirical ones actually verify anything they make, which is a problem, because the training data for cooking is far more poisoned.


I guess different people have different experiences when using those tools :)

(I was talking about people who cook daily for their households and enjoy doing it, I guess they found a way to make LLMs useful for them)


I also cook daily at home, for fun (though I have catered a couple times for some large 50+ people family events too). Just, in my case, cooking is my passion, and has been more than just a minor hobby for me. I.e. there have been many years of my life where I spent 3-5 hours of every day cooking, and this has been the case for about 15 years now. If "professional home cook" was a thing, I'd be that, but, alas.

So my standards are admittedly probably a bit deranged relative to most...


Everything more complex than a hello-world has bugs. Compiler bugs are uncommon, but not that uncommon. (I must have debugged a few ICEs in my career, but luckily have had more skilled people to rely on when code generation itself was wrong.)

Compilers aren't even that bad. The stack goes much deeper and during your career you may be (un)lucky enough to find yourself far below compilers: https://bostik.iki.fi/aivoituksia/random/developer-debugging...

NB. I've been to vfs/fs depths. A coworker relied on an oscilloscope quite frequently.


I had a fun bug while building a smartwatch app that was caused by the sample rate of the accelerometer increasing when the device heated up. I had code that was performing machine learning on the accelerometer data, which would mysteriously get less accurate during prolonged operation. It turned out that we gathered most of our training data during shorter runs when the device was cool, and when the device heated up during extended use, it changed the frequencies of the recorded signals enough to throw off our model.

I've also used a logic analyzer to debug communications protocols quite a few times in my career, and I've grown to rather like that sort of work, tedious as it may be.

Just this week I built a VFS using FUSE and managed to kernel panic my Mac a half-dozen times. Very fun debugging times.


”I've never personally found a compiler bug.”

I remember the time I spent hours debugging a feature that worked on Solaris and Windows but failed to produce the right results on SGI. Turns out the SGI C++ compiler silently ignored the `throw` keyword! Just didn’t emit an opcode at all! Or maybe it wrote a NOP.

All I’m saying is, compilers aren’t perfect.

I agree about determinism though. And I mitigate that concern by prompting AI assistants to write code that solves a problem, instead of just asking for a new and potentially different answer every time I execute the app.


Compilers don't change output assemby based on what markdown you provide them via .claude.

Or what tone of voice in prompt you gave them. Or if Saturn is in Aries or Sagittarius.


> Meanwhile AI can't be trusted to give me a recipe for potato soup.

This just isn't true any more. Outside of work, my most common use case for LLMs is probably cooking. I used to frequently second guess them, but no longer - in my experience SOTA models are totally reliable for producing good recipes.

I recognize that at a higher level we're still talking about probabilistic recipe generation vs. deterministic compiler output, but at this point it's nonetheless just inaccurate to act as though LLMs can't be trusted with simple (e.g. potato soup recipe) tasks.


Compilers and processors are deterministic by design. LLMs are non-deterministic by design.

It's not apples vs. oranges. They are literally opposite of each other.


Just to nitpick - compilers (and, to some extent, processors) weren't deterministic a few decades ago. Getting them to be deterministic has been a monumental effort - see build reproducibility.


I'm trying to track down a GCC miscompilation right now ;)


I feel for you :D


> The compiler metaphor is simply incorrect

If an LLM was analogous to a compiler, then we would be committing prompts to source control, not the output of the LLM (the "machine code").


> Meanwhile AI can't be trusted to give me a recipe for potato soup.

Because there isn’t a canonical recipe for potato soup.


There's also no canonical way to write software, so in that sense generating code is more similar to coming up with a potato soup recipe than compiling code.


That is not the issue, any potato soup recipe would be fine, the issue is that it might fetch values from different recipes and give you an abomination.


This exactly, I cook as passion, and LLMs just routinely very clearly (weighted) "average" together different recipes to produce, in the worst case, disgusting monstrosities, or, in the best case, just a near-replica of some established site's recipe.


> ... some established site's recipe.

At least with the LLM, you don't have to wade through paragraph after paragraph of "I remember playing in the back yard as a child, I would get hungry..."

In fact LLMs write better and more interesting prose than the average recipe site.


It's not hard to scroll to the bottom of a page, IMO, but regardless, sites like you are mentioning have trash recipes in most cases.

I only go with resources where the text is actual documentation of their testing and/or the steps they've made, or other important details (e.g. SeriousEats, Whats Cooking America / America's Test Kitchen, AmazingRibs, Maangchi for Korean, vegrecipesofindia, Modernist series, etc) or look for someone with some credibility (e.g. Kenji Lopez, other chef on YouTube). In this case the text or surrounding content is valuable and should not be skipped. A plain recipe with no other details is generally only something an amateur would trust.

If you need a recipe, you don't know how to make it by definition, so you need more information to verify that the recipe is done soundly. There is also no reason to assume / trust that the LLMs summary / condensation of various recipes is good, because cooking isn't something where you can semantically condense or even mathematically combine various recipes together to get one good one. It just doesn't work like that, there is just one secret recipe that produces the best dish, and LLMs don't know how to judge quality of recipes, mostly.

I've never had an LLM produce something better or more trustworthy than any of those sites I mentioned, and have had it just make shit up when dealing with anything complicated (i.e. when trying to find the optimal ratio of starch to flour for Korean fried chicken, it just confidently claimed 50/50 is best, when this is obviously total trash to anyone who has done this).

The only time I've ever found LLMs useful for cooking is when I need to cook something obscure that only has information in a foreign language (e.g. icefish / noodlefish), or when I need to use it for search about something involving chemistry or technique (it once quickly found me a paper proving that baking soda can indeed be used to tenderize squid - but only after I prompted it further to get sources and go beyond its training data, because it first hallucinated some bullshit about baking soda only working on collagen or something, which is just not true at all).

So I would still never trust or use the quantities it gives me for any kind of cooking / dish without checking or having the sources, instead I would rely on my own knowledge and intuitions. This makes LLMs useless for recipes in about 99% of cases.


> This makes LLMs useless for recipes in about 99% of cases.

But you spent a whole lot of time essentially describing how 99% of recipe sites are also useless. What's the difference?


The difference is that I have a few sites and resources I already know that are NOT useless. With an LLM output, I have to check and verify every time, and, since LLMs are based on junk, almost always produce junk. But with the trusted sites, I do not have to check, and almost always get something decent and/or close to authentic!

The difference is between a trusted source that is good most of the time, vs. an LLM recipe that is trash 99% of the time.

EDIT: If you haven't visited any / all of the sites / sources I mentioned, check them out! They are really good, especially SeriousEats if the recipe is from Kenji Lopez. Maybe just avoid AmazingRibs, unless you have uBlock installed: they were way ahead of their time, but haven't updated in forever, and clearly have become desperate...


You're correct, and I believe this is only a matter of time. Over time it has been getting better and will keep doing so.


It won’t be deterministic.


The input to LLMs is natural language. Natural language is ambiguous. No amount of LLM improvements will change that.


Maybe. But it's been 3 years and it still isn't good enough to actually trust. That doesn't raise confidence that it will ever get there.


You need to put this revolution in scale with other revolutions.

How long did it take for horses to be super-seeded by cars?

How long did powertool take to become the norm for tradesmen?

This has gone unbelievably fast.


I think things can only be called revolutions in hindsight - while they are going on it's hard to tell if they are a true revolution, an evolution or a dead-end. So I think it's a little premature to call Generative AI a revolution.

AI will get there and replace humans at many tasks, machine learning already has, I'm not completely sure that generative AI will be the route we take, it is certainly superficially convincing, but those three years have not in fact seen huge progress IMO - huge amounts of churn and marketing versions yes, but not huge amounts of concrete progress or upheaval. Lots of money has been spent for sure! It is telling for me that many of the real founders at OpenAI stepped away - and I don't think that's just Altman, they're skeptical of the current approach.

PS Superseded.


*superseded

It comes from the Latin "supersedēre", which taken literally, means "sit on top of". "Super" = above, on top of. "Sedēre" = to sit.

"Super" is already familiar to English speakers. "Sedēre" is the root of words like sedentary, sedan, sedate, reside, and preside.

The more metaphorical meaning of "supersede" as "replace" developed over time and across languages, but the literal meaning is already fairly close.


>super-seeded

Cute eggcorn there.


> Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

First compilers were created in the fifties. I doubt those were bug-free.

Give LLMs some fifty or so years, then let's see how (un)reliable they are.


What I don't understand about these arguments is that the input to the LLMs is natural language, which is inherently ambiguous. At which point, what does it even mean for an LLM to be reliable?

And if you start feeding an unambiguous, formal language to an LLM, couldn't you just write a compiler for that language instead of having the LLM interpret it?


1) Determinism isn't the same as reliability.

Compilers are deterministic (modulo bugs), but most things in life are not, but can still be reliable.

The opposite also holds: "npm install && npm run build" can work today and fail in a year (due to ecosystem churn) even though every single component in that chain is deterministic.

2) Reliability is a continuum, not a discreet yes/no. In practice, we want things to be reliable enough (where "enough" is determined per domain).

I don't presume this will immediately change your mind, but hopefully will open your eyes to looking at this a bit differently.


> I don't presume this will immediately change your mind

I'm not saying that AI isn't useful. I'm just claiming it's not analogous to a compiler. If it was, you would treat your prompts as source code, and check them into source control. Checking the output of an LLM into source control is analogous to committing the machine code output from a compiler into source control.

My question still stands though. What does it mean for a tool to be reliable when the input language is ambiguous? This isn't just about the LLM being nondeterministic. At some point those ambiguities need to be resolved, either by the prompter, or the LLM. But the resolution to those ambiguities doesn't exist in the original input.


Hasn't AI been annoying the piss out of us for well over a year now? I've definitely been hearing "10x productivity" for that long.

So - where is it all? Where's all the 10 years of software dev that happened over the last year? Where's the companies blowing their competitors out of the water by compressing a decade of production into a year?

The proof is in the pudding. Or lack thereof. If the claims were anything like they say, we'd see something by now.


I think agents and claude code was only last summer. It does feel like a long time though.


I'd use digital ocean over AWS for any SMB or lean startup (so... anyone not attached to an infinite money hose that has to either scale to NEED AWS, or die trying) just because of 1) their UI not being broken glass you have to crawl over and 2) not having eight trillion features that make doing simple things hard and 3) pricing


Same. For small projects, I always recommend Vultr or Digital Ocean. Vultr has some neat network features, like BGP support.


This is more because the barrier to entry is so much lower.

Android: have laptop that can do virtualization (...so basically ever laptop that can also do this:) and have enough ram to do run Android studio. Then you theoretically also need an Android device but even that's just because I assume you want to use the app you're making. That's it.

iOS: $100/yr entry fee, plus you need Apple hardware, plus a "server" mode Apple hardware (Mac mini?) if you want to alt store and I assume your main device is a laptop.

Just the money thing and the hardware thing is a huge stumbling block. I know it's rounding error for any even semi serious business but also let's be real, a ton of very important software is basically run on the budget of "the software devs main job and/or EU welfare state benefits".


The www wins. All you need is something that can run a browser. You edit a line, save, refresh and there it is, the real finished product, not emulation.

Apps have terrible reliability too. I just wanted to order a pizza, the restaurant website offered a button for the play store and app store.

There it said the app was for an outdated version of Android.

Perhaps it had been like that for a long time? But lets imagine it happened today. Where are you to get your orders from? Ahh yes, the website.

If apps didn't get the icon on the home screen 90% wouldn't have a reason to exist.

Bunch of pictures with descriptions and an add to cart button. One shouldn't even need to write code, it should be as simple and obvious as serving a document. In stead you need a full time carpenter to keep the store running. The counter and shelves spontaneously collapse, doors regularly get stuck, light fixtures rain down from the ceiling.

People trying to sell pizza deserve better, we can do better.


Unfortunately, the people who are "pro-AI" are so often because it lets them skip the understanding part with less scrutiny


The good news here is that their code is of such a poor quality it doesn't properly work anyway.

I have recently tried to blindly create a small .dylib consolidation tool in JS using Claude Code, Opus 4.5 and AskUserTool to create a detailed spec. My god how awful and broken the code was. Unusable. But it faked* working just good enough to pass someone who's got no clue.


> The good news here is that their code is of such a poor quality it doesn't properly work anyway.

This is just wishful thinking. In reality it works just well enough to be dangerous. Just look at the latest RCE in OpenCode. The AI it was vibe-coded with allowed any website with origin * to execute code, and the Prompt Engineer™ didn't understand the implications.


> it works just well enough to be dangerous

Excellent. I for one fully welcome Prompt Engineers™ into the world of software development.


I assume you don't understand some of the words in the rest of my comment. Or you're a nihilist and enjoy watching everything burn to the ground.

It's all fun and games until actual lives are at stake.


I'm watching the voters around the world electing charismatic leaders and then cheering the consequences.

Thus companies electing to replace software developers with AI slop are not of a much surprise to me.

It doesn't matter whether people will die because of AI slop. What matters is keeping Microsoft shareholders happy and they are only happy when there is a growing demand for slop.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: