Trivially. Zip file headers specify where the data is. All other bytes are ignored.
That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself.
This is specific to zip files, not the deflate algorithm.
I had claude implement the random-huffman-trees strategy and it works alright (~20MB/s decompression speed), but a minimal huffman tree that only encodes the end symbol works out even slower (~10MB/s), presumably because each tree is more compact.
I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount.
Why's that? I'm not really sure how DEFLATE works but I can imagine a crappy compression that's like "5 0" means "00000". So if you try to compress "0" you get "1 0" which is longer than the input. In fact, I bet this is true for any well-compressed format. Like zipping a JpegXL image will probably yield something larger. Much larger.. I don't know how you do that.
It’s funny, I didn’t set out for that to be the case. When I pitched the idea internally, I wanted to scratch my own itch (what on earth is a cached token?) and produce a good post. But then I realised I had to go deeper and deeper to get to my answer and accidentally made a very long explainer.
All non-trivial specs, like the one for seL4, are hard to verify. Lots of that complexity comes from interacting with the rest of the world which is a huge shared mutable global state you can't afford to ignore.
Of course, you can declare that the world itself is inherently sinful and imperfect, and is not ready for your beautiful theories but seriously.
> Formal verification will eventually lead to good, stable API design.
Why? Has it ever happened like this? Because to me it would seem that if the system verified to work, then it works no matter how API is shaped, so there is no incentive to change it to something better.
> Let's say formal verification could help to avoid some anti-patterns.
I'd still like to hear about the actual mechanism of this happening. Because I personally find it much easier to believe that the moment keeping the formal verification up to date becomes untenable for whatever reason (specs changing too fast, external APIs to use are too baroque, etc) people would rather say "okay, guess we ditch the formal verification and just keep maintaining the integration tests" instead of "let's change everything about the external world so we could keep our methodology".
> I am not an expert on this, but the worst API I've seen is those with hidden states.
> e.g. .toggle() API. Call it old number of times, it goes to one state, call it even number of times, it goes back.
This is literally a dumb light switch. If you have trouble proving that, starting from lights off, flicking a simple switch twice will still keep lights off then, well, I have bad news to tell you about the feasibility of using the formal methods for anything more complex than a dumb light switch. Because the rest of the world is a very complex and stateful place.
> (which itself is a state machine of some kind)
Yes? That's pretty much the raison d'être of the formal methods: for anything pure and immutable, normal intuition is usually more than enough; it's tracking the paths through enormous configuration spaces that our intuition has problem with. If the formal methods can't help with that with comparable amount of effort, then they are just not worth it.
At that point you create an entirely new API, fully versioned, and backwardly compatible (if you want it to be). The point the article is making is that AI, in theory, entirely removes the person from the coding process so there's no longer any need to maintain software. You can just make the part you're changing from scratch every time because the cost of writing bug-free code (effectively) goes to zero.
The theory is entirely correct. If a machine can write provably perfect code there is absolutely no reason to have people write code. The problem is that the 'If' is so big it can be seen from space.
100% of state changes in business software is unknowable on a long horizon, and relies on thoroughly understanding business logic that is often fuzzy, not discrete and certain.
Formal verification does not gurantee business logic works as everybody expected, nor its future proof, however, it does provide a workable path towards:
Things can only happen if only you allow it to happen.
It other words, your software may come to a stage where it's no longer applicable, but it never crashes.
Formal verification had little adoption only because it costs 23x of your original code with "PhD-level training"
The reason it doesn't work is businesses change faster than you can model every detail AND keep it all up to date. Unless you have something tying your model directly to every business decision and transaction that happens, your model will never be accurate. And if we're talking about formal verification, that makes it useless.
Like bomb the CPU time instead of memory.
reply