Hacker Newsnew | past | comments | ask | show | jobs | submit | more barrkel's commentslogin

I'm reminded of how Carmack talked about the extra efficiencies available when targeting consoles, because you knew exactly what hardware was available.

It's great that the efficiencies available can be shown to be extractable. The real, much harder, trick is putting together a sufficiently smart compiler to enable them for heterogeneous compute setups.


The demoscene also is an example of how much you can do if you can be absolutely sure exactly what hardware you’re running on.

The problem is that even for things like consoles, it's usually more "cost efficient" to write normal fast-to-write code that isn't maximally effective, let the compiler do its magic, and call it good enough.

Sometimes I dream of what the world would do if we were mystically stuck on exactly the processors we have today, for twenty years.


I've wondered sometimes what software would look like if a crisis took out the ability to build new semiconductors and we had to run all our computing infrastructure on chips salvaged from pregnancy tests, shoplifting tags, cars, old PCs, and other consumer electronics. We'd basically move backwards about 20 years in process technology, and most computers would have speeds roughly equivalent to 90s/00s PCs.

But then, this still wouldn't incentivize building directly to the hardware, because of the need to run on a large variety of different hardware. You're still better off preferencing portability over performance, and then making it up by cutting scope and ease of development.


You might enjoy Dusk OS and its more extreme subling Collapse OS: https://duskos.org https://collapseos.org


Funny you say this... This exact thought experiment was going on last month! Laurie Wired [0] a cybersec youtuber asked it on twitter and got some interesting replies too!

[0]: https://www.youtube.com/watch?v=L2OJFqs8bUk


> I've wondered sometimes what software would look like if a crisis took out the ability to build new semiconductors and we had to run all our computing infrastructure on chips salvaged from pregnancy tests, shoplifting tags, cars, old PCs, and other consumer electronics. We'd basically move backwards about 20 years in process technology, and most computers would have speeds roughly equivalent to 90s/00s PCs.

Don't forget disposable vapes: https://news.ycombinator.com/item?id=45252817


This sounds kind of similar to what I've heard about Cuba's relationship with cars and probably technology after the U.S embargo. Not sure how true it was/is though.


Be careful what you wish for...


Optimizing for the hardware you are on is demonstrably an effort and skill issue. Everyone understands that with enough time and engineers, any piece of software could be optimized better. If only we had large volumes of inexpensive "intelligence" to throw at the problem.

This is one of my back-of-mind hopes for AI. Enlist computers as our allies in making computer software faster. Imagine if you could hand a computer brain your code, and ask it to just make the program faster. It becomes a form of RL problem, where the criteria are 1) a functionally equivalent program 2) that is faster.


This is what I was thinking, too. For so long, the default mode of operating a software company has been:

"Developer time is so expensive, we need to throw everything under the bus to make developers fast."

The kinds of things often thrown under the bus: Optimizations, runtime speed, memory footprint, disk image size, security, bug fixing, code cleanliness / lint, and so on. The result is crappy software written fast. Now, imagine some hypothetical AI (that we don't have yet) that makes developer time spent on the project trivial.

Optimistically: There might be time for some of these important software activities.

Pessimistically: Companies will continue to throw these things under the bus and just shit out crappy software even faster.


My favorite part of this phenomenon is every company that interviews developers on data structures and algorithms, then puts out a calculator app that takes half a gigabyte of storage and nearly as much RAM to run.

I have not had to use Windows in ages but every time I touch it I am amazed at the fact that it takes like 10-15GB for a bare installation of the latest version, while it does about the same amount of work as XP was able to do in under 1GB. Yes I am aware assets are a thing but has usability increased as a result of larger assets?


To be fair, windows has so much backwards compatibility, I'm sure there's a ton of stuff there that's not used by 99.9% of people.

That's a good or a bad thing depending on your perspective


I am fairly certain that if you install every Debian package available it will still be less than 16GB. Windows 10 is a bare OS at that size.


The latest iOS update (!) is more than 16gb… a mobile OS…


It ships with just as many features as Windows 10 which is also in that range, so it's not too surprising.


> functionally equivalent

Who confirms what is functionally equivalent?


You can, with some programming languages, require a proof of this (see: Rocq, formerly 'coq').

I think a more interesting case might be showing functional equivalence on some subset of all inputs (because tbh, showing functional equivalence on all inputs often requires "doing certain things the slow way").

An even more interesting case might be "inputs of up to a particular complexity in execution" (which is... very hard to calculate, but likely would mean combining ~code coverage & ~path coverage).

Of course, doing all of that w/o creating security issues (esp. with native code) is an even further out pipe dream.

I'd settle for something much simpler, like "we can automatically vectorize certain loop patterns for particular hardware if we know the hardware we're targeting" from a compiler. That's already hard enough to be basically a pipe dream.


Yeah restructuring for autovectorization with otherwise equivalent results would be a great example and step


Notably for example C/C++ code is not necessarily functionally equivalent, when it's compiled on different platforms.


It's not even guaranteed to be functionally equivalent when compiled on the same hardware with the same compiler etc. Undefined behaviour can do what it wants. (And implementation defined behaviour also has a lot of leeway.)

However, if you stick to only defined behaviour, they are 'functionally equivalent', if your compiler doesn't have a bug.


The Magic does, of course!


It's not just being sure exactly what the hardware is, in demos you have the additional luxury of not being interactive. So you can plan everything exactly out in advance.


This is true of inference too.


Consoles are pretty heterogeneous IRL too, though. You have multiple SKUs (regular and Pro, for example), not to mention most games will also target multiple consoles (PlayStation + Xbox + Switch is a common combo).

So in reality the opportunities to really code against a specific piece of hardware are few and far between...

Heck, then you get into multiple operating modes of the same hardware - the Nintendo Switch has a different perf profile if it's docked vs. not.


This used to be less true at the time Carmack said it :>


The original Switch was launched in 2016 that's plenty of time with a stable platform. The multiple operating modes in practice can be approached by coding against un-docked ( handheld) and then adding bonus quality for docked.


This is exactly what I've been doing when optimizing games for the switch.


A handful of variants for consoles is not nearly as bad as the almost limitless variety on PC.


>Sometimes I dream of what the world would do if we were mystically stuck on exactly the processors we have today, for twenty years.

Reminds me of the old American cars in Cuba - https://en.wikipedia.org/wiki/Yank_tank


Cubans benefited from the cars being older, simpler, and robust. Imagine freezing car tech now, with so many electronics, far more parts and built to be replaced relatively quickly!


These older cars broke down all the time. There's a reason old American sit-coms have at least some characters always tinkering with their cars: you needed to do that. Nowadays, cars just work.


> it's usually more "cost efficient" to write normal fast-to-write code that isn't maximally effective, let the compiler do its magic, and call it good enough.

For the last six years my full time job has largely been optimizing games where most of the team has been working with this mindset. Sometimes someone spends a few days of just getting things done, followed by others building on top of it. This leads to systems which are not fast enough and take me weeks or even months to optimize.

We even got together at my last job and created a series of lectures on performance and best practices for everyone, including artists, to get ahead of this type of issues. It was apparently very appreciated, especially among the non technical staff who said it was valuable and they had no idea.


That's what you got with BeOS...throw out backward compatibility and build to current Best practices...it's ability to extract performance out of an 133 Mhz processor was amazing.


Even better was the BeBox running BeOS. That was a cool use of a fast dual CPU PowerPC platform with great graphics. Amiga vibes. But turns out that humans need software applications more than they need efficient use of the hardware.


It was the story with so many things "back then" - even Itanium was a beast on custom-coded perfect applications.


> The problem is that even for things like consoles, it's usually more "cost efficient" to write normal fast-to-write code that isn't maximally effective, let the compiler do its magic, and call it good enough.

Given all the time and money, there's also a skills gap.


You can use money and time to buy skills.


Unlimited time and money will not make someone like me a John Carmack level programmer. There are a finite number of individuals operating at his level or above and having them hyper optimize code is a poor use of their time.


Oh, I meant more like: if you have enough money, you can employ John Carmack (or similar) for a while.


  The problem is that even for things like consoles, it's usually more "cost efficient" to write normal fast-to-write code that isn't maximally effective, let the compiler do its magic, and call it good enough.
This wasn't always the case. I have a friend who used work on games for the original Playstation. I remember him telling me that part of his job (or maybe his whole job) was optimizing the machine code output by the C compiler.


And don't forget that Sony and Microsoft have compilers teams, working on specialised GCC and LLVM backends, and sometimes upstreaming general improvements.


For hardware that isn't pre-framebuffer, demos seem to be mostly about hyperoptimizing the code in a portable way, much less optimizing to specific hardware timings and quirks.


Rust jobs would actually touch more hard tech rather than being concentrated in crypto scams.


Unless it happens to something like a PS3 or Saturn.


Despite sentiments around Mojo being negative on HN due to the stack not being OSS, this is the ultimate goal of Modular.

https://signalsandthreads.com/why-ml-needs-a-new-programming...


I listened to that episode, by chance, last week. It was well worth the time to listen.


It's hard to take those benchmarks too seriously. ZFS, btrfs and I guess bcachefs - which I've never used and don't have any opinion on - do things XFS and EXT4 don't and can't do.

I know more about ZFS than the others. It wasn't specified here whether ZFS had ashift=9 or 12; it tries to auto-detect, but that can go wrong. ashift=9 means ZFS is doing physical I/O in 512 bytes, which will be an emulation mode for the nvme. Maybe it was ashift=12. But you can't tell.

Secondly, ZFS defaults to a record size of 128k. Write a big file and it's written in "chunks" of 128k size. If you then run a random read/write I/O benchmark on it with a 4k block size, ZFS is going to be reading and writing 128k for every 4k of I/O. That's a huge amplification factor. If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O. And ZFS makes this easy, since child filesystem creation is trivially cheap and the recordsize can be tuned per filesystem.

And then there's the stuff things like ZFS does that XFS / EXT4 doesn't. For example, taking snapshots every 5 minutes (they're basically free), doing streaming incremental snapshot backups, snapshot cloning and so on - without getting into RAID flexibility.


I don't think any of that means the benchmarks shouldn't be taken seriously. GP didn't say they expect Bcachefs to perform like EXT4/XFS, they said they expected more like Btrfs or ZFS, to which it has more similar features.

On the the configuration stuff, these benchmarks intentionally only ever use the default configuration – they're not interested in the limits of what's possible with the filesystems, just what they do "out of the box", since that's what the overwhelming majority of users will experience.


Anyone who uses zfs out of the box in a way substitutable with xfs, shouldn't. So I guess they serve a purpose that way. But that argument doesn't need any numbers at all.


> Anyone who uses zfs out of the box in a way substitutable with xfs, shouldn't.

Substitutable how? Like, I'm typing this on a laptop with a single disk with a single zpool, because I want 1. compression, 2. data checksums, 3. to not break (previous experiments with btrfs ended poorly). Obviously I could run xfs, but then I'd miss important features.


> If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O.

You probably don't want to do that because that'll result in massive metadata overhead, and nothing tells you that the app's I/O operations will be nicely aligned, so this cannot be given as general advice.


It's antithesis. And it's really overused by ChatGPT.


Would a Silicon Valley 30-50% smaller be better for American workers?

That's a rough estimate for how many engineers etc. that have or had a H-1B at some point.

If you think of value creation as requiring fuel, humans are an input to that. The size of the industry and the liquidity of the labor market all contribute to enabling Americans in the Bay Area to acquire plenty of wealth.


> requiring fuel

I think you mean cheap fuel, and the H1-B program creates artificially cheap fuel to the detriment of American citizens whose parents paid years of taxes, fought in foreign wars, etc. Our children are owed a fair deal. The H1-B program is a moral wrong.


You don't outsource; you set up an office instead.

The problem with outsourcing is similar to the problem with contracting. You have limited ability to direct the work, the terms need to be specified up front, the contractors can substitute in the B team once they've got the project under way, etc.

With an office and local hiring, you have full control. You can second employees on-site for some time to get the culture and work practices in shape. The bigger issue becomes stuff like timezones. So you try and carve off whole responsibilities, so the team is mostly working locally.


That's a lot of evidence free speculation. "Many people believe", and you doubt any studies that show the opposite.


Meanwhile China is listening to their experts

https://www.china-briefing.com/news/chinas-entry-exit-k-visa...

https://www.youtube.com/watch?v=jC8f1qs3TGs

That's life, I guess. There's an ancient Chinese saying: Wealth does not last beyond three generations. Too much easy money probably makes you stupid, like trust fund babies.


Using grammar constrained output in llama.cpp - which has been available for ages and I think is a different implementation to the one described here - does slow down generation quite a bit. I expect it has a naive implementation.

As to why providers don't give you a nice API, maybe it's hard to implement efficiently.

It's not too bad if inference is happening token by token and reverting to the CPU every time, but I understand high performance LLM inference uses speculative decoding, with a smaller model guessing multiple tokens in advance and the main model doing verification. Doing grammar constraints across multiple tokens is tougher, there's an exponential number of states that need precomputing.

So you'd need to think about putting the parser automaton onto the GPU/TPU and use it during inference without needing to stall a pipeline by going back CPU.

And then you start thinking about how big that automaton is going to be. How many states, pushdown stack. You're basically taking code from the API call and running it on your hardware. There's dragons here, around fair use, denial of service etc.


A guide on llama.cpp's grammars (nine hours and not a single mention of "GBNF"? HN is slipping) is here:

https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

There's also a grammar validation tool in the default llama.cpp build, which is much easier to reason about for debugging grammars than having them bounce off the server.


If your masking is fast enough, you can make it easily work with spec dec too :). We manage to keep this on CPU. Some details here: https://github.com/guidance-ai/llguidance/blob/main/docs/opt...


Democracy is little more than a mechanism for non-violent transfer of power among elites.

We rely on things other than democracy to protect minorities. Institutions, laws, restraints on power; things the committed democrat believes are unjust constraints on the Will of the People.


I had to choose between California and Germany. It is a thing.


Did you have to choose? Or did you have the option? I would wager to bet that a significant amount of people in the US cant afford to move to another state.


Can you distinguish option and needing to choose here? Having an option would necessarily cause a need to choose.


I can confidently say yes. Choosing between working in two different countries separated by an entire ocean is an option. Moving to a different state is expensive for many, but moving to another continent is only afforded to a privileged minority.


Why is it relevant, though?


That is my underlying question, by trying to find out what the difference is in the posters mind.


Interestingly, Germany is the third, and California the fourth, largest economies in the world.


I've used Linux laptops for work since 2013. I finally switched to Linux on the desktop earlier this year, after getting a laptop and experiencing Windows 11.

The laptop isn't running Linux yet, I'm not confident the battery lifetime story is great.

But, I settled on KDE as well. Gnome just wasn't configurable enough. There were a number of rough edges that I couldn't find a setting in Gnome to fix, so I switched over.

I'm running zfs on root, so I can have snapshots (every 5 minutes) and incremental backups to my NAS, also running zfs. Using zfsbootmenu. Which was interesting to set up, I learned a lot more about UEFI, framebuffer drivers, kexec kernel handoffs etc. than I ever expected to.


> The laptop isn't running Linux yet, I'm not confident the battery lifetime story is great.

Depending on the laptop, you may be surprised. My HP EliteBooks (800 g8 series, AMD and Intel) are an absolutely better experience on Linux than Windows, it's not even close. I'm thinking specifically about sleep, of all things.

The other day, my 2020 845g8 (amd) laptop crapped out during sleep while on windows, but was not actually dead, since it was hot to the point that it heated a different laptop which was lying underneath (a 14" mbp, so a pretty chunky piece of metal). I had to forcefully power it off. I was under the impression that some windows or driver update had fixed this, but apparently not. This never happened on linux, ever, which is my main os for this particular machine since day one and I never turn off the laptop, only reboot it for a kernel update. The Intel one is fairly reliable on Windows, but it did crash a few times (garbled screen).

Battery life on the Intel model is better under linux (around +25%). On the Amd I can't comment, since I rarely use it on windows, and basically never on battery.

At the office I have a 27" 5k screen which I have to use at 200%. Windows is basically always a blurry mess for some reason, although it recognizes the correct resolution. The only way to be sure to have sharp output is by booting it up with the screen attached. Which then goes to hell when the screen shuts off (think going to the toilet). Wayland on Linux (sway / arch) just works and is always sharp.

I also can basically not connect my sony bluetooth headphones when running Windows. They connect instantly with LDAC under linux.


My Lunar Lake Zenbook has over 10 hours battery on Windows. Maybe Linux is ok, but I don't get the impression it's in the same category, much less ahead. But I don't use my laptop for much more than browsing and wsl2, and most of that is ssh.


I had a 2020 dell XPS 13 on which I installed Fedora. Battery life was an atrocious 4 hours at best, I tried out all the power management tools as well. Windows was much better. Not that I used it, I just kept it plugged in and running fedora. Still glad I returned the laptop eventually though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: