Unicode is Kind of Insane

soraminazuki · on May 27, 2015

One of the things I hate the most about Unicode is the CJK unified ideographs. While the Unicode consortium are happy with mapping all those ridiculous emojis in Unicode, they refuse to separate letters of distinct languages. Now, people have a harder time configuring fonts so that Japanese texts don't look Chinese.

balls2you · on May 27, 2015

sounds like something designed without consulting the CJK users...

raiph · on May 29, 2015

RNNs wouldn't fall for confusables... ;)

The venerable Perl 5 and its ecosystem has long aimed at Unicode compliance with acceptable performance and is one of the best toolkits available in 2015.

Perl 6 aims to outdo Perl 5 (and other langs) by making Unicode as simple as it can be while retaining acceptable performance and correctness. A simple example is that the "character" abstraction is an extended grapheme cluster yet strings can remain compact if they can be (for good RAM performance) and indexing in to strings that aren't compact is an O(1) operation (unlike, say, the quadratic slowdowns with Swift).

Is there something that might entice you to make a few visits to the freenode IRC channel #perl6 [1] to chat about making the long term Unicode roadmap for Perl 6 be the best it can be?

[1] https://kiwiirc.com/client/irc.freenode.net/perl6

monk_e_boy · on May 26, 2015

Or actually, it's really kind of amazing.

benfrederickson · on May 27, 2015

Its amazing and insane =). I was trying to highlight the complexities behind something that many people people naively think is simple - I probably should have made the clearer based on the reaction elsewhere though

nofollow · on May 29, 2015

the unicode consortium addresses this: http://unicode.org/reports/tr39/#Confusable_Detection

edit: a quick search found implementations of the described algorithms in c++ (ICU library) and perl (Unicode::Security).