More

amanda99 · 2025-07-22T19:24:22 1753212262

Being battle tested is the only good thing I can say about LiteLLM.

scosman · 2025-07-22T19:31:30 1753212690

You can add in it's still 10x better than LangChain

amanda99 · 2025-07-22T19:22:37 1753212157

I'm excited to see this. Have been using LiteLLM but it's honestly a huge mess once you peek under the hood, and it's being developed very iteratively and not very carefully. For example. for several months recently (haven't checked in ~a month though), their Ollama structured outputs were completely botched and just straight up broken. Docs are a hot mess, etc.

amanda99 · 2025-07-09T16:10:43 1752077443

Look, I agree with you and as someone with Home Assistant, I much prefer Zigbee.

But if you imagine a typical consumer, not a tech nerd, I think "smartphone and bluetooth" is by far preferable to your 4-step process.

snickerdoodle12 · 2025-07-09T16:26:06 1752078366

To be fair, step 4 isn't a real step, step 1 is just buying the "hub" or "border router" or whatever, and step 2 & 3 are the same for Zigbee and Matter, the button is just somewhere else.

A typical consumer has bought a zigbee hub (like they need to buy a thread border router), then use their phone to press a button in the app and then they press a button on the device. Still dead simple and doesn't require flaky bluetooth from their phone, which in 2025 most androids still suffer from.

amanda99 · 2025-06-10T18:28:18 1749580098

You would use a KV cache to cache a significant chunk of the inference work.

xmprt · 2025-06-11T00:50:03 1749603003

Using KV in the caching context is a bit confusing because it usually means key-value in the storage sense of the word (like Redis), but for LLMs, it means the key and value tensors. So IIUC, the cache will store the results of the K and V matrix multiplications for a given prompt and the only computation that needs to be done is the Q and attention calculations.

biophysboy · 2025-06-10T18:36:24 1749580584

Do you mean that they provide the same answer to verbatim-equivalent questions, and pull the answer out of storage instead of recalculating each time? I've always wondered if they did this.

Traubenfuchs · 2025-06-10T19:46:10 1749584770

I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.

They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.

koakuma-chan · 2025-06-10T18:49:55 1749581395

The prompt may be the same but the seed is different every time.

biophysboy · 2025-06-10T20:11:24 1749586284

Could you not cache the top k outputs given a provided input token set? I thought the randomness was applied at the end by sampling the output distribution.

amanda99 · 2025-06-06T20:57:27 1749243447

> If you want youtube (or any other platform) to not suck...pay for it.

If we are going for solutions where your individual decision makes no impact on the system in place, then let's go big: ditch youtube and host your content on one of the alternatives.

amanda99 · 2025-05-15T18:21:00 1747333260

I really liked the first chapter: the juxtaposition between ideas on how "AI"/algorithms/etc work and how humans work. I enjoyed how it was flipped around.

The second chapter was very disappointing and lost the intrigue I had built from the first.

amanda99 · 2025-05-15T18:12:16 1747332736

Does this not require one to trust the hardware? I'm not an expert in hardware root of trust, etc, but if Intel (or whatever chip maker) decides to just sign code that doesn't do what they say it does (coerced or otherwise) or someone finds a vuln; would that not defeat the whole purpose?

I'm not entirely sure this is different than "security by contract", except the contracts get bigger and have more technology around them?

natesales · 2025-05-15T18:29:00 1747333740

We have to trust the hardware manufacturer (Intel/AMD/NVIDIA) designed their chips to execute the instructions we inspect, so we're assuming trust in vendor silicon either way.

The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).

Maybe one day we’ll have truly open hardware ;)

ignoramous · 2025-05-15T20:34:37 1747341277

Hi Nate. Routinely your various networking-related FOSS tools. Surprising to see you now work in the AI infrastructure space let alone co-founding a startup funded by YC! Tinfoil looks über neat. All the best (:

> Maybe one day we'll have truly open hardware

At least the RoT/SE if nothing else: https://opentitan.org/

julesdrean · 2025-05-15T21:00:36 1747342836

Love Open Titan! RISC-V all the way babe! The team is bunker: several of my labmates now work there

perching_aix · 2025-05-16T16:08:20 1747411700

Isn't this not the case for FHE? (I understand that FHE is not practically viable as you guys mention in the OP.)

FrasiertheLion · 2025-05-16T16:55:55 1747414555

Yeah not the case for FHE. But yes, not practically viable. We would be happy to switch as soon as it is.

rkagerer · 2025-05-15T21:04:21 1747343061

I agree, it's lifting trust to the manufacturer (which could still be an improvement over the cloud status quo).

Another (IMO more likely) scenario is someone finds a hardware vulnerability (or leaked signing keys) that let's them achieve a similar outcome.

amanda99 · 2025-05-13T12:32:50 1747139570

I think the OP's point here was that if it's a PR and it's ignored: you spent a bunch of time writing a PR (which may or may not have been valuable to you, e.g. if you maintain a fork now). On the other hand, if it was an esoteric contribution process, you spent a lot of time figuring out how to get the patch in there, but that obviously has 0 value outside contributing within that particular open source project.

amanda99 · on Feb 21, 2025

Here is an interesting diff:

https://github.com/cisagov/dotgov-data/compare/57e66bcb0fccc...

beams_of_light · on Feb 21, 2025

dei.gov redirects to waste.gov. It's a PHP site with only a password entry form.

ffgh · on Feb 21, 2025

WordPress nevertheless.

iAMkenough · on Feb 21, 2025

whitehouse.gov is also WordPress

amanda99 · on Feb 11, 2025

I agree, and I also share your experience (guess I was a bit earlier with PHP).

I think what's left out though is that this is the experience of those who are really interested and for whom "it's not satisfying" to stay there.

As tech has turned into a money-maker, people aren't doing it for the satisfaction, they are doing it for the money. That appears to cause more corner cutting and less learning what's underneath instead of just doing the quickest fix that SO/LLM/whatever gives you.