Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Unicode is Kind of Insane (benfrederickson.com)
32 points by benfrederickson on May 26, 2015 | hide | past | favorite | 6 comments


One of the things I hate the most about Unicode is the CJK unified ideographs. While the Unicode consortium are happy with mapping all those ridiculous emojis in Unicode, they refuse to separate letters of distinct languages. Now, people have a harder time configuring fonts so that Japanese texts don't look Chinese.


sounds like something designed without consulting the CJK users...


RNNs wouldn't fall for confusables... ;)

The venerable Perl 5 and its ecosystem has long aimed at Unicode compliance with acceptable performance and is one of the best toolkits available in 2015.

Perl 6 aims to outdo Perl 5 (and other langs) by making Unicode as simple as it can be while retaining acceptable performance and correctness. A simple example is that the "character" abstraction is an extended grapheme cluster yet strings can remain compact if they can be (for good RAM performance) and indexing in to strings that aren't compact is an O(1) operation (unlike, say, the quadratic slowdowns with Swift).

Is there something that might entice you to make a few visits to the freenode IRC channel #perl6 [1] to chat about making the long term Unicode roadmap for Perl 6 be the best it can be?

[1] https://kiwiirc.com/client/irc.freenode.net/perl6


Or actually, it's really kind of amazing.


Its amazing and insane =). I was trying to highlight the complexities behind something that many people people naively think is simple - I probably should have made the clearer based on the reaction elsewhere though


the unicode consortium addresses this: http://unicode.org/reports/tr39/#Confusable_Detection

edit: a quick search found implementations of the described algorithms in c++ (ICU library) and perl (Unicode::Security).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: