Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm wondering if anybody has tried adding PNG filters on top of QOI[0]. I have an idea of how to do it, and it should be fairly easy to build on top of a QOI encoder/decoder, but didn't get around to trying it out myself.

Per the QOI spec it already has the "SUB" filter built in, in the form of QOI_OP_DIFF. Which is great, because that leaves exactly four PNG filters instead of five (meaning they only need two bits to store): NONE, UP, AVERAGE and PAETH.

Here's how to add the filters: on a per-line basis, create a transformed line for each filter (so the original pixels, plus UP, AVERAGE and PAETH transformations). Then see how each of the arrays would compress, then pick the best one by treating it as if it was the original pixel input. In a separate array we keep track of which filter we picked for each line.

When we've done this for the entire image, we basically have a plain QOI image but if we decompress it the output would show each line in its new transformation. To recover the original image we just have to reverse these transformations bottom-to-top.

So we just have to append that array where we kept track of the filters we picked for each line, four picks per byte. Since we know the height of the image we don't need to mark the end of this stream. The added size overhead is effectively one byte per four lines. I expect speed overhead would be roughly 4x for encoding (since we brute-force try four transformations), and something very tiny for decoding (since all we need to do in the end is one single in-place pass over all pixels to undo the transformations).

Also, because SUB is covered by QOI_OP_DIFF, this approach would implicitly support mixing each of these per-line filters with SUB, so it would be possible to have SUB+UP, SUB+AVERAGE and SUB+PAETH, and it would be on a per-pixel basis. Might be beneficial in some cases.

[0] https://www.w3.org/TR/PNG-Filters.html

[1] https://qoiformat.org/qoi-specification.pdf



I'm not sure why your whole scanline would need a predictor that is top pixel, left pixel, or average. Shouldn't this be for 16 or 32 pixel max?

The most popular QOI modification has been to take the average of top and left decoded pixels as predictor.

I've tried to put more bits into selecting 4 possible pixels as predictors (for a single pixel), but it doesn't work that well because the previously encoded colors are already in the table. Predictor+diff encoding weren't as successful as one would expect, probably because there pixels are processed one by one. One design principle of QOI is to read input once to encode, so not sure if people would want to try 4x the scanline.


> Predictor+diff encoding weren't as successful as one would expect.

Interesting, I'd love to see more details about your experiments! Are they online somewhere? And can you link the other QOI modification you mentioned?

> I'm not sure why your whole scanline would need a predictor that is top pixel, left pixel, or average.

Well, it's all just a thought experiment at this stage anyway right? My reasoning was that when filters get more sophisticated it can become a hard problem to figure out when to apply which one. So I figured an approach that lets us brute-force try all of them seemed simplest. It worked well enough for PNG to be an OK-ish lossless compression format after all.

Actually, you just reminded me of another reason I wanted to try the scanline approach: it makes it easier to give each filter its own thread, so we can try out all four options in parallel. In that case the bottleneck is the slowest of the four options, so it would still be slower, but hopefully not that much, and faster than a 4x slowdown.

Another reason for the scan-line approach was that it seemed like it would be a way to add them on top of QOI without messing too much with the latter's specs. As mentioned, the output would be a regular QOI image where each line just happens to optionally have one of three transformations applied to it, followed by an array that tells you which transformation was applied to each line.

If I understand correctly, QOI images have an end marker (seven 0x00 bytes followed by a single 0x01 byte), meaning adding such an extra array of bytes should be ignored by properly implemented QOI decoders.

If so, that would mean that implementing a decoder for this filter-approach could be a minimal program you run after running any other QOI decoder that takes the output and undoes the transformations. That would make it trivial to build a decoder on top of other people's hard work to implement an efficient decoder for "regular" QOI images..

Think of it like how tar and gz work together. I suppose this trick wouldn't be limited to scanlines though, I'm sure others have thought of it too

> One design principle of QOI is to read input once to encode, so not sure if people would want to try 4x the scanline.

True, but my reasoning was that this would be for a particular niche: when the decoding speed matters most (which would be minimally impacted by this) but encoding speed can be a little slower if the compression is significantly better. No idea how much better that would be though, if at all.


I did not kept track of every experiment, just saying that if you do make a new opcode in QOI that encode a predictor with a few bits + a few bits of diff, then it's hard to make an improvement at all. It's easier if you remove the 8-bit boundaries constraint.

> And can you link the other QOI modification you mentioned?

https://github.com/nigeltao/qoi2-bikeshed/issues/34 then adopted by virtually all QOI-inspired codecs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: