You're being unnecessarily argumentative. The web is filled with badly-encoded/r...

rspeer · on Aug 17, 2014

So, actually, it seemed to make sense at first to limit it to, say, 2 or 3 passes. But then I read about Spotify's username exploit [1]. That made it pretty clear to me that any Unicode-transforming function should be idempotent whenever possible, so that you never end up with inconsistent answers about whether strings are equivalent.

I have also seen text that was encoded six times in UTF-8 (and decoded five times in Windows-1252). Although ftfy had to leave it as is; it didn't successfully decode because it was truncated.

[1] http://labs.spotify.com/2013/06/18/creative-usernames/

teddyh · on Aug 17, 2014

> This library is taking the only possible approach

It is taking the only possible approach if we assume that it must use one and only one algorithm for all uses. Otherwise, it seems to me that a lot of careful tuning and configuration would be needed in order for this library to make the best guesses it possibly can make for a specific user’s situation and data.

> might take a very long time to converge

A limit there might be appropriate – otherwise there might exist a “billion laughs” style attack.