Hacker Newsnew | past | comments | ask | show | jobs | submit | KatrKat's commentslogin

I think some domain-specific considerations include:

1. You need a really big in-memory data set that you touch ~all of several times for each request, so you really want to e.g. memory-map it and make sure it actually fits in memory on the machine.

2. If using a GPU, you have to make sure the GPU is hooked up to the serving process. You probably want your processes to be heavier-weight than they otherwise would be.

3. You might want to batch requests from several users for processing in the same stream of commands to the GPU. So you need to collect the right number of requests before processing any of them, without making any requests wait too long. You might need to sort these out by what inference parameters they want to override, and send them to different servers, because they might need to be batched accordingly.

4. You might want to stream the output more or less character by character. Possibly to several users, from one live run on a GPU, after having batched up enough requests to justify a run.

5. Content moderation when you are sending data to the user before you have even seen all of it yourself is an unsolved problem.


These still don't seem unique to LLMs.

1. There are many GPU based applications already in production, I've seen work queues, which are used in any system where the load exceeds the capacity, GPU or not.

2. Content moderation is not unique to LLMs

3. Training and serving users at inference time are different beasts


It is, of course, tensor.


When PayPal goes to make the withdrawal, the bank doesn't have to let them and give the account a negative balance. Usually you can turn off overdraft and withdrawals will bounce if the funds aren't there.

This doesn't stop PayPal from reversing a previous inbound transfer they decide shouldn't have happened, but it ought to stop PayPal unilaterally deciding they would like some of your money.


So why not turn off overdraft from one's main bank account?


Yep! Twitter licenses your data from you, so they can run Twitter with it.


> Hey what should we train our AI to play? Cities: Skylines so it can fix traffic?

> Nah bro, let's teach it to shoot people in CS:Go!


How to make lots of money selling stuff that runs off a 30-60 watt power brick.


It doesn't have much in the way of graphics capability, though, so you need to hang a bloody great external GPU off it.

Without that, they're no better than my elderly Thinkpad that runs off a 25-watt power brick.


Not all proprietary software currently insists that you get into a TPM-mediated Dom/sub relationship with its developers. Its currently possible, and might be ethically necessary, not to buy those ones, and instead to buy the other ones.

But it's also probably important to pursue a political avenue as well. The government should absolutely not be using this stuff, and shouldn't be advising citizens to do it to access government services. We could even pass a law requiring purchased hardware and software to meet a fiduciary standard towards its users.


The solution is political action. You're describing a workaround that only a few of the most concerned people will be willing to use.


Agreed, the solution is to work with others toward a solution.

In my defense, I don't see it as a workaround - I wouldn't use it any other way.

There was a time when one didn't need to do this, but the internet has become more cumbersome to use. Easy-breezy... just make it muscle memory, and don't look back!


> I don't believe any entirely locked down firmware ever made it into any x86 board.

There are some Android x86 devices that won't boot unsigned firmware and won't let you change the signing keys. But I've only seen that in non-BIOS, non-UEFI devices.


iMessage can run over data, right? It sounds like the bugs exploited here were iMessage and WhatsApp holes, not weird mystery-baseband flaws (which are harder to patch but only ever affect a fraction of the phones you want to sell the ability to compromise). So similar Android exploits would just go right through the hotspot and compromise the Android device that does everything.

The only way out of this mess is actually correct code on actually correct hardware. Maybe you have to run Linux and Android at the top to run existing apps, but somewhere below there you need a supervisor that makes security guarantees that are actually true. You can't just port a monolithic C kernel onto hardware that's struggling to be faster than the competition and call it good.

Journalists need to buy communications equipment that doesn't come with that "NO WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE" line in the EULA. Sadly, it is not for sale.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: