Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The world's smallest PNG (evanhahn.com)
112 points by thunderbong on Feb 3, 2024 | hide | past | favorite | 44 comments


The article kind of glosses over the PNG signature. There is actually a lot of thought behind using these exact 8 bytes:

The first byte is 0x89, meant to tell any text editor that this isn't a text file, and to prevent text files from being recognized as PNG. It also detects if you mistakenly transferred the file in some ASCII file mode that clears the top bit.

The next three bytes are PNG in ASCII, meant for humans looking at it in a hex editor.

The next two bytes are a Windows-style line ending (CR LF, or 0D 0A). This breaks the signature if the file mistakenly undergoes Windows->Unix line conversion

The next byte is 0x1A - End of File. This causes DOS tools to stop printing the file.

The last byte is a Unix-style line break, to detect Unix->Windows line conversions

To make the validation process complete, a process can check the integrity of the last chunk (IEND). Not only does it provide a fixed sequence of 12 bytes you can use to check that the file is complete, the fact that its length field is 0 allows you to verify that zero bytes were transferred correctly. Of course you would likely notice anyways because of the chunk checksums, but IEND allows you to fail fast and makes it easier to diagnose the actual issue.

http://libpng.org/pub/png/spec/1.0/PNG-Rationale.html#R.PNG-...


I thought this was brilliant when I first learned it, but I've since come to think it's ridiculous. JPEG doesn't have any of these error-detection features, have you ever missed them? It seems to be the same mindset that gave us double-checksummed IDATs.

Perhaps there was a world once where this made sense, but we aren't living in it now.


PNG definitely came out in a world where this made sense. A FTP client was set in the wrong mode could easily corrupt everything.


Interesting, clever and beautiful. Now, why did they have to be so defensive against corruption? What purpose does it serve?

If I had to detect corruption I would probably use something like CRC, it seems more complete and more robust.


Not all modem standards had built-in error detection. If you got garbage or drop outs because of line noise or someone else trying to use the parallel phone, that was all you got. The sending side did know nothing about that. For reliable transfer of binary files, software on both sides had to implement X/Y/Zmodem or some other protocol which added metadata for correctness checks. Files that were cut too soon when copying over the modem or from one floppy to the other, or because the process moving data died for some reason, were also not uncommon.

Historical 7 bit protocols were still used. Even though systems of their origin were a thing of the past, and links had 8 bit characters, existing software handling those protocols often mangled the “spare” highest bit for its own reasons.

In simple terms, the PNG header allowed to check that the file had always been handled in binary mode, and maybe even to revert some types of the damage. Those were the last days of distinction between terminal-style “text mode” processing (with hardware or software components in the middle silently converting the code pages, line endings, and control codes if they were different on two sides), and “binary mode” direct copying, soon to become the only expected option. People writing binary data to files today almost never invent the means to escape certain “special” values in anticipation of possible transfer over this or over that.


Reply from a sibling comment of yours: https://news.ycombinator.com/item?id=39246642

> A FTP client was set in the wrong mode could easily corrupt everything.

When PNG came out that was actually a super common thing that happens. And specifically it would corrupt the newline characters.


At first I thought this was kind of pointless, but I actually think this is a decent way to get to the basic principles of a file type. A sort of minimal-working-example of a file, giving you an overview of how the file works and what is required to get something valid, if uninteresting.


Yeah, this is usually the kind of thing I end up implementing on a first pass for something where I don't want to just use a library (usually "for fun" hacking projects).


I use it to get rid of "favicon.ico not found" warning in console.


1TB favicons dumped in the root is an interesting problem for crawlers. Did they make the effort of dropping the connection in a timeout or proceed to gobble up the entire thing?


So the one thing this article doesn't explicitly justify is whether the ten-byte zlib string is truly the shortest possible. You could imagine that it might be possible to hand-craft a DEFLATE block which is only three bytes instead of four, and yet still decompresses to at least two bytes (the minimum decompressed size of a PNG scanline).

But by exhaustive search of all DEFLATE blocks that are one to three bytes in length, I can confirm the article is correct. All of them decompress to (at most) one byte of data.

Which isn't a surprising result, but I wasn't actually sure of it before trying it out.


> this article doesn't explicitly justify is whether the ten-byte zlib string is truly the shortest possible

I know the DEFLATE standard intimately and implemented a decoder and encoder (open source on my website/GitHub).

We know that the smallest image requires 2 bytes - 1 for the row filter and 1 for the pixel(s). For simplicitly, we'll assume the data is (hex) 00 00.

A DEFLATE compressed stream must have at least one block. There are three choices for each block's type - uncompressed, fixed Huffman code, and custom Huffman code. Uncompressed requires 1 byte for {type (2 bits), finality (1 bit), padding (5 bits)} and 4 bytes for the length. So that uses 5 bytes already. Using a fixed Huffman code block lets us encode those 2 bytes as 4 bytes. Using a custom Huffman code requires many, many bytes just to specify the Huffman table. Hence, the OP article's approach is justified.


See also: The Biggest Smallest PNG https://news.ycombinator.com/item?id=38908956

And "The smallest 256x256 single-color PNG file, and where you've seen it" https://www.mjt.me.uk/posts/smallest-png/


This diagram would serve as a helpful companion to the article:

https://www.nayuki.io/res/dumb-png-output-java/png-file-form...

> I won’t go in depth on DEFLATE here (in part because I am not an expert^4)

> 4. If you are an expert, please contact me. I want to learn!!

You already linked to "An Explanation of the DEFLATE Algorithm", which shows that you've been searching on the web already. I'm not sure if it'll make a difference, but here are a few more resources: https://www.euccas.me/zlib/ , https://en.wikipedia.org/wiki/Deflate


DEFLATE is basically huffman(lz(X)), which is obvious enough. The part that no one ever seems to motivate is how precisely you fit those together, ie. there is one tree for literals/length, another for distances, plus the extra bits.


> how precisely you fit those together

I mean, read the DEFLATE spec; it's rather short compared to modern formats (LZMA, Brotli, Zstandard, etc.).

The RFC version is "16 pages" long: https://www.rfc-editor.org/rfc/rfc1951.html . And here's my fancy HTML version: https://www.nayuki.io/page/deflate-specification-v1-3-html .


I know how it works. Like I said, I don't know the motivation for why that is the best way to connect LZ and Huffman coding together.


I should note that it's hardly the best way, but it's easier to think DEFLATE as a layered algorithm: you catch repetitions via LZSS and code the remaining information with Huffman. You have two kinds of code because they have a very different distribution so it's beneficial to split them (and it's not surprising to have tens or hundreds of distributions in more sophiscated formats).

And extra bits are there because longer distances in LZSS are typically opportunistic so individual values have a low frequency (i.e. Zipfian). So exact distance 1280 and 1281 can appear only once, but maybe distances 1200--1299 appear frequently enough that you can have a distinct code for that plus two-digit adjustments. There are much more other ways to model distance distributions; for example, a set of codes for most recently used distances is common but DEFLATE is too old for that.


> Every single PNG, including this one, starts with the same 8 bytes.

This might be common on other platforms as well, but I know that Apple uses a kind of series of tests to determine the type an image is rather than trusting the extension.

As an experiment, change a PNG to have a .JPEG extension on MacOS and double-click. Preview, unsurprisingly, launches — but if you get Info on the image in Preview it indicates it is a PNG. Of course.


UNIX systems, like Mac OS and Linux, use file[0] (or libmagic behind it), to determine the format of the file. (or shebang, if the file has one.)

[0] https://en.m.wikipedia.org/wiki/File_(command)


File extensions were the primary way to determine file type in VMS and CP/M. Consequently MS-DOS and Windows inherited that behavior.

As you said, UNIX always went with heuristics based on the file content instead (and is the primary reason many file formats have fixed bytes to aid that process).

And the web standarized on mime types transferred in the metadata, with how to set them correctly left as an exercise to the reader. Which usually means they are set based on file extension because that's faster.


There's a real-world application for this sort of things - many raster slippy maps send represent empty ocean with carefully optimised single-colour PNG files [1]

(of course, vector maps are a different matter)

[1] https://www.mjt.me.uk/posts/smallest-png/


I wonder how many of these are actually carefully optimized single-colour PNG files vs... just PNG files. Going into Photopea, creating a 256x256 solid blue image, and hitting save gave me a 137 byte image. That'd be a 2nd place in that list. Doing the same in MS Paint gets 5th, beating out the last 2.

What seems more likely to me is someone didn't consider the compression settings problem worth optimizing at all (the last 2) or put some work into making sure the images were generally encoded with decent settings (the first 4). After all, optimising your best case which happens to be one of the very few images that can be cached to be reused across multiple tiles to be a few bytes smaller is probably the least efficient way to spend your "efficiency increasing time".

Similarly I wonder if saving a few bytes to encode a 1x1 you have the browser upscale is actually an anti-optimisation in that a few extra bytes transferred might be the faster overall method. Would need benchmarking to tell, but again it'd be benchmarking to test how to make the least problematic performance ever so slightly better.


I once wrote a PL/pgSQL function to return a 1x1 BMP file given arbitrary R, G, and B values, to display a background color for FileMaker records. Still use it to this day as a matter of fact.


I wonder if there is some leeway in common implementations to shave off even more bytes. Like php allows to omit the end tag.


Good example, but in the case of PHP, emitting the ending tag is completely acceptable by the standard and is a common best practice as well, especially if your final content in the file is PHP code and not, for example, HTML.

I think what the you're suggesting is more like stretching the rules and perhaps going outside of the standard a little, but in a way that most implementations would be able to handle.


Right, non valid, strictly speaking, yet perfectly parsed by most interpreters. That’s where the magic happens


I could see implementations tolerating a lack of DEFLATE Adler-32, lack of PNG chunk CRC-32, and lack of PNG IEND chunk.


'Behold my masterpiece: a single pixel of the purest black. I call it "You Suffer" '. https://www.youtube.com/watch?v=ybGOT4d2Hs8

( https://en.wikipedia.org/wiki/You_Suffer )


> You have some wiggle room but chunks have a specific order. For example, the image metadata chunk has to appear before the pixel data chunk.

Probably to make streaming easier. If you know the size early, HTML layout can anticipate the data that should eventually follow.


Yes, the PNG chunk ordering constraints ( https://www.w3.org/TR/2003/REC-PNG-20031110/#5ChunkOrdering ) were crafted to streamline streaming.

Having read the source code of libpng and several others, I see that as chunks come in, their data is parsed and put into a fixed-size structure that contains all the possible relevant metadata. By the time image decoding starts, all the critical information like bit depth, palette, transparency, etc. have been handled.

Likewise, the CRC-32 is at the end of the block and not at the beginning because you can save an epsilon of memory. When the checksum is at the end, all you have to do when reading is to keep a running checksum in one variable, then allocate a second variable after the block is ended to check that the actual checksum matches the declared checksum. If the checksum is at the beginning, then you first have to store that in a variable, then have a second variable to compute a running checksum.

(Actually, there's an even sneakier way to handle a postfix checksum with only one variable, where you incorporate it into the checksum calculation and then confirm that it matches some fixed reference value.)


Not only that, but so the decoder knows how much memory to allocate.


I wonder if it's possible to detect features in an image, and encode those to compress an image, a bit like what image-to-SVG converters do, but better.

Feature detection is a big subject, and I don't know if it would work for image compression.


JPEG XL has support for drawing vector splines (with varying thickness, too!), but I don't think anybody has written an encoder that uses those yet. Could be great for text or hair though, with a sufficiently ~~magic~~ smart encoder.


Detecting noise, de-noising, and substituting with synthetic noise is fairly common.

Traditionally encoders struggle with other kinds of features. It's hard to find them and know if encoding them separately will be worth it.


Compare with the smallest GIF and other formats: https://news.ycombinator.com/item?id=38878480



That's not too interesting and has been covered too many times (e.g. [1], [2]). Maybe it's much more interesting to compare the smallest possible file for a particular pattern like a 2x2 B&W checkerboard.

[1] https://shkspr.mobi/blog/2024/01/whats-the-smallest-file-siz...

[2] https://github.com/mathiasbynens/small/


AFAICT the spec does not exclude zero-size images, which would make the smallest image 22 bytes (14 bytes header + 8 byte end marker).

Otherwise the pixel in a 1x1 image can be encoded in 1 byte, giving 23 bytes total.


MIF (my own image format) accepts an empty file as a valid 0x0 image. It's very efficient :)


Yes, but 0x0 isn't terribly useful. I think a 1x1 black (or transparent?) image would be better.


0x0 sounds pretty useful for web tracking pixels! It's even more unobtrusive than 1x1.


For tracking pixels there’s also 204 No Content HTTP response which is 0 bytes


Why this article and people in the comments refer to 32-bit words as one byte? One byte is 8 bits. If you have a hex representation of ‘42’ these are 4 bytes, 32 bits. NOT ONE BYTE.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: