Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>I don't need ECC

We all need ECC. It should be a standard feature by now.



While I agree, can you explain more about why you think we all need ECC?


https://danluu.com/why-ecc/ is a decent read.


I'd also be curious of numbers from people who DO run ECC about how many times it's saved them. Some things it's really necessary for (financial transactions comes to mind). That said, it should be much easier to get ECC in consumer hardware. Major props to AMD for their recent chips that allow it. Hopefully Intel follows suit.


I run ECC everywhere possible. I know of two instances when it mattered (detected a failing chip), and suspect it was correcting single-bit errors for a while before that in those cases. I've also resurrected someone's laptop by determining it failed due to bad RAM and replacing that. (Easy enough diagnosis - intermittent, random-seeming hard-lockups and corrupted data on disk.)

ECC also thwarts Rowhammer and similar attacks, if that matters to you.

Note that it isn't enough for AMD to restore ECC support in consumer kit; you also need motherboard support, and the MB makers are also complicit in raising ECC costs.


The problem is that it is unsupported, so getting a board where the BIOS can enable it & you can count on it working is a bit of a crap shoot. I had the same problem ~8 years ago when a friend and I built new desktops. You've got to do a lot of manual reading, forum reading and review reading before buying a board. Or buy it locally from a place with a good return policy. This is quite a bit different from server grade kit where ECC is fully supported.


Also, it's quite depressing when you need to change the motherboard especially if you are obsessed with cable management and invested so much time ensuring great air flow.


Didn't AMD CPUs already supported ECC for many years?


I had a K6/2 with ECC back in the late 90's. So, yes.

But for whatever reason, AMD never seems to put ECC in the bullet lists for why you should buy their parts. I guess as someone else mentioned its because the motherboard manufactures don't enable/qualify it even though its likely just a matter of firmware tweaks ever since the DRAM controller moved on chip. I got it working on a cheap phenom II/gigabyte motherboard (IIRC) some time ago as well. In that case I don't think the motherboard even advertised it, but I had some unbuffered ECC DIMMs lying around and I plugged them in, and they worked. Of course the only real indication besides booting the machine that it was actually working was a kernel blurb during boot about it. I don't think I got the EDAC reporting to give me soft error rates at the time.


I don't have the numbers handy but here's a basic explanation. Due to the amount of RAM we all run today the probability of having a RAM error is surprisingly high. IIRC it's at least once per year.


I feel like it would have to be much more frequent than that (at least monthly with perceivable consequences) to get a typical user to care.


The Wikipedia [0] page for ECC RAM states that Cassini-Huygens spacecraft had a fairly static count of 280 errors/day.

[0] https://en.wikipedia.org/wiki/ECC_memory


Most desktop computers aren't in space.


That same wiki article references Google's experienced numbers, with a high end of "about 5 single bit errors in 8 Gigabytes of RAM per hour"


Google at the time was buying memory chips that had failed manufacturer QA, stuffing them on to DIMMs themselves, and then running whatever seemed to pass.


That number was consistent on-the-ground pre-launch and post-launch (with the exception of a short period of higher error instances due to a solar flare).


Eeek, at those rates surely there are some undetected triple flips.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: