I'm excited to see this. Have been using LiteLLM but it's honestly a huge mess once you peek under the hood, and it's being developed very iteratively and not very carefully. For example. for several months recently (haven't checked in ~a month though), their Ollama structured outputs were completely botched and just straight up broken. Docs are a hot mess, etc.
To be fair, step 4 isn't a real step, step 1 is just buying the "hub" or "border router" or whatever, and step 2 & 3 are the same for Zigbee and Matter, the button is just somewhere else.
A typical consumer has bought a zigbee hub (like they need to buy a thread border router), then use their phone to press a button in the app and then they press a button on the device. Still dead simple and doesn't require flaky bluetooth from their phone, which in 2025 most androids still suffer from.
Using KV in the caching context is a bit confusing because it usually means key-value in the storage sense of the word (like Redis), but for LLMs, it means the key and value tensors. So IIUC, the cache will store the results of the K and V matrix multiplications for a given prompt and the only computation that needs to be done is the Q and attention calculations.
Do you mean that they provide the same answer to verbatim-equivalent questions, and pull the answer out of storage instead of recalculating each time? I've always wondered if they did this.
I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.
They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.
Could you not cache the top k outputs given a provided input token set? I thought the randomness was applied at the end by sampling the output distribution.
> If you want youtube (or any other platform) to not suck...pay for it.
If we are going for solutions where your individual decision makes no impact on the system in place, then let's go big: ditch youtube and host your content on one of the alternatives.
I really liked the first chapter: the juxtaposition between ideas on how "AI"/algorithms/etc work and how humans work. I enjoyed how it was flipped around.
The second chapter was very disappointing and lost the intrigue I had built from the first.
Does this not require one to trust the hardware? I'm not an expert in hardware root of trust, etc, but if Intel (or whatever chip maker) decides to just sign code that doesn't do what they say it does (coerced or otherwise) or someone finds a vuln; would that not defeat the whole purpose?
I'm not entirely sure this is different than "security by contract", except the contracts get bigger and have more technology around them?
We have to trust the hardware manufacturer (Intel/AMD/NVIDIA) designed their chips to execute the instructions we inspect, so we're assuming trust in vendor silicon either way.
The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).
Hi Nate. Routinely your various networking-related FOSS tools. Surprising to see you now work in the AI infrastructure space let alone co-founding a startup funded by YC! Tinfoil looks über neat. All the best (:
I think the OP's point here was that if it's a PR and it's ignored: you spent a bunch of time writing a PR (which may or may not have been valuable to you, e.g. if you maintain a fork now). On the other hand, if it was an esoteric contribution process, you spent a lot of time figuring out how to get the patch in there, but that obviously has 0 value outside contributing within that particular open source project.
I agree, and I also share your experience (guess I was a bit earlier with PHP).
I think what's left out though is that this is the experience of those who are really interested and for whom "it's not satisfying" to stay there.
As tech has turned into a money-maker, people aren't doing it for the satisfaction, they are doing it for the money. That appears to cause more corner cutting and less learning what's underneath instead of just doing the quickest fix that SO/LLM/whatever gives you.