Hacker Newsnew | past | comments | ask | show | jobs | submit | est's commentslogin

I wonder if there's any reverse zip-bombs? e.g. A realy big .zip file, takes long time to unzip, but get only few bytes of content.

Like bomb the CPU time instead of memory.


Trivially. Zip file headers specify where the data is. All other bytes are ignored.

That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself.

This is specific to zip files, not the deflate algorithm.


There are also deflate-specific tricks you can use - just spam empty non-final blocks ad infinitum.

    import zlib
    zlib.decompress(b"\x00\x00\x00\xff\xff" * 1000 + b"\x03\x00", wbits=-15)
If you want to spin more CPU, you'd probably want to define random huffman trees and then never use them.

I had claude implement the random-huffman-trees strategy and it works alright (~20MB/s decompression speed), but a minimal huffman tree that only encodes the end symbol works out even slower (~10MB/s), presumably because each tree is more compact.

The minimal version boils down to:

    bytes.fromhex("04c001090000008020ffaf96") * 1000000 + b"\x03\x00"

That would be a big zip file, but would not take a long time to unzip.

Isn't that mathematically impossible?

I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount.

Why's that? I'm not really sure how DEFLATE works but I can imagine a crappy compression that's like "5 0" means "00000". So if you try to compress "0" you get "1 0" which is longer than the input. In fact, I bet this is true for any well-compressed format. Like zipping a JpegXL image will probably yield something larger. Much larger.. I don't know how you do that.

This is a surprising good read of how LLM works in general.

It’s funny, I didn’t set out for that to be the case. When I pitched the idea internally, I wanted to scratch my own itch (what on earth is a cached token?) and produce a good post. But then I realised I had to go deeper and deeper to get to my answer and accidentally made a very long explainer.

Thanks for the post, it's near perfect in focus, detail and how it's written.

EDIT: You have some minor typos in the post (psuedocode)


I am out of the loop, but what AI features does Firefox offer these days?

could `Sec-Fetch-Dest: image` mitigate this?

I might get heavily downvoted, but since official "extension stores" are handicapped, people should re-invent dll inject ad blockers.

> An API you rely on changes, is deprecated, etc

Formal verification will eventually lead to good, stable API design.

> Users use something in unexpected ways

> Complex behavior between interconnected systems

It happens when there's no formal verification during the design stage.

Formal verification literally means cover 100% state changes and for every possible input/output, every execution branch should be tested.


Formal verification has nothing to do with the quality of the API.

Given the spec, formal verification can tell you if your implementation follows the spec. It cannot tell you if the spec if good


Thats something I agree with.

I am right now working on an offline api client: https://voiden.md/. I wonder if this can be a feature.


> It cannot tell you if the spec if good

I beg to differ, if a spec is hard to verify, then it's a bad sign.


All non-trivial specs, like the one for seL4, are hard to verify. Lots of that complexity comes from interacting with the rest of the world which is a huge shared mutable global state you can't afford to ignore.

Of course, you can declare that the world itself is inherently sinful and imperfect, and is not ready for your beautiful theories but seriously.


> Of course, you can declare that the world itself is inherently sinful and imperfect, and is not ready for your beautiful theories

i see we are both familiar with haskellers (friendly joke!)


it can tell you if your spec is bad, but it can't tell you if your spec is good

That is one problem of many solved, isn't that good?

That the spec solves the problem is called validation in my domain and treated explicitly with different methods.

We use formal validation to check for invariants, but also "it must return a value xor an error, but never just hang".


> Formal verification will eventually lead to good, stable API design.

Why? Has it ever happened like this? Because to me it would seem that if the system verified to work, then it works no matter how API is shaped, so there is no incentive to change it to something better.


> if the system verified to work, then it works no matter how API is shaped

That's the case for one-off integrations, but the messy part always comes when system goal changes

Let's say formal verification could help to avoid some anti-patterns.


> Let's say formal verification could help to avoid some anti-patterns.

I'd still like to hear about the actual mechanism of this happening. Because I personally find it much easier to believe that the moment keeping the formal verification up to date becomes untenable for whatever reason (specs changing too fast, external APIs to use are too baroque, etc) people would rather say "okay, guess we ditch the formal verification and just keep maintaining the integration tests" instead of "let's change everything about the external world so we could keep our methodology".


> I'd still like to hear about the actual mechanism of this happening

I am not an expert on this, but the worst API I've seen is those with hidden states.

e.g. .toggle() API. Call it old number of times, it goes to one state, call it even number of times, it goes back.

And there's call A before you call B types of APIs, the client has to keep a strict call order (which itself is a state machine of some kind)


> I am not an expert on this, but the worst API I've seen is those with hidden states.

> e.g. .toggle() API. Call it old number of times, it goes to one state, call it even number of times, it goes back.

This is literally a dumb light switch. If you have trouble proving that, starting from lights off, flicking a simple switch twice will still keep lights off then, well, I have bad news to tell you about the feasibility of using the formal methods for anything more complex than a dumb light switch. Because the rest of the world is a very complex and stateful place.

> (which itself is a state machine of some kind)

Yes? That's pretty much the raison d'être of the formal methods: for anything pure and immutable, normal intuition is usually more than enough; it's tracking the paths through enormous configuration spaces that our intuition has problem with. If the formal methods can't help with that with comparable amount of effort, then they are just not worth it.


At that point you create an entirely new API, fully versioned, and backwardly compatible (if you want it to be). The point the article is making is that AI, in theory, entirely removes the person from the coding process so there's no longer any need to maintain software. You can just make the part you're changing from scratch every time because the cost of writing bug-free code (effectively) goes to zero.

The theory is entirely correct. If a machine can write provably perfect code there is absolutely no reason to have people write code. The problem is that the 'If' is so big it can be seen from space.


Isn’t this where the Eiffel design by contract people speak up about code reuse?

100% of state changes in business software is unknowable on a long horizon, and relies on thoroughly understanding business logic that is often fuzzy, not discrete and certain.

Formal verification does not gurantee business logic works as everybody expected, nor its future proof, however, it does provide a workable path towards:

Things can only happen if only you allow it to happen.

It other words, your software may come to a stage where it's no longer applicable, but it never crashes.

Formal verification had little adoption only because it costs 23x of your original code with "PhD-level training"


The reason it doesn't work is businesses change faster than you can model every detail AND keep it all up to date. Unless you have something tying your model directly to every business decision and transaction that happens, your model will never be accurate. And if we're talking about formal verification, that makes it useless.

Google is doing code review on extensions?

I’m not sure, but whenever I cut a new release I upload my extension code and it goes through a review period before they publish.

can you just host it on Github Pages instead of an 8-minute Youtube video?

Yes, that would be great

Perhaps in the future, coding is done in rich text editors.

As Hinton said during an podcast, humans can only learn at the rate of few bits per second. Memory and natural language are of very limited bandwidth.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: