More

matvore · 2025-05-19T16:43:54 1747673034

It is sorted FIRST by radical and SECOND by stroke order. This is roughly equivalent to the Unicode codepoint sort if you stay in the basic multilingual plane. The order also puts literary chinese afer wu Chinese, which breaks with a pure stroke-count sort:

中文 - 中 = 丨 + 3 strokes

吴语 - 吴 = 口 + 4 strokes

文言 - 文 = 文 + 0 strokes

日本語 - 日 = 日 + 0 strokes

粵語 - 粵 = 米 + 7 strokes

thaumasiotes · 2025-05-19T17:47:34 1747676854

Dictionary lookup is done first by radical and second by stroke count. Collation is not. Stroke count is first.

For example, I have a book of 成语 stories that gives its table of contents in non-alphabetical order. (Since nobody understands the traditional ordering, I also have several such books that put their table of contents in alphabetical order.)

Here is the collation order in the book:

一七八入九人口千小三亡大不专天井见毛月文风为心水四 ...

Note that 三's radical is 一, the first Kangxi radical, and that 一 is listed first. Your theory is wrong. 三 isn't even first among the 3-stroke characters, which start (among these) with 口.

Why did you make up a false answer to this question?

matvore · 2025-05-19T18:16:46 1747678606

The Wikipedia sort for the languages is as I stated above, with Literary Chinese and Japanese between Wu Chinese and Yue Chinese. I explained why it was sorted that way, because radical is considered first. You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.

I didn't say sorting is never done by stroke count alone. But I have seen radical+residual stroke count much more often than stroke count alone. Probably a result of the content I'm accessing. It's mostly Japanese and not intended for children.

The dictionary and non-dictionary sorting distinction that you make doesn't sound like a real thing. The audience, the country, and the number of items sorted are bigger factors. But you're not wrong in that stroke count is sometimes used alone.

thaumasiotes · 2025-05-19T23:14:08 1747696448

> You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.

I can't explain that because it's part of a different logical group, with its name written in a different script.† This puts it parallel to the Chinese options and to Korean.

> The Wikipedia sort for the languages is as I stated above

I took you to be describing the sort order for characters, not for wikipedia. Wikipedia doesn't obey that order either. You can check the page for Jiangsu, where all of the languages mentioned so far appear before the "Latin alphabet" style languages, but 閩南語 and 閩東語 appear after them.

† I also can't explain why wikipedia seems to have chosen 吴语 but 粵語, 客家語, and 贛語. Jiangsu is on the mainland... and so are Jiangxi and Guangdong.

matvore · 2025-05-21T01:27:10 1747790830

  > all of the languages mentioned so far appear before the "Latin alphabet" 
  > style languages, but 閩南語 and 閩東語 appear after them.

Could it have something to do with Minnan and Mindong Chinese articles being written in a Latin script, (despite the language name showing in both Chinese characters and Latin letters) ?

thaumasiotes · 2025-05-21T08:35:01 1747816501

As far as I know, sure, it could.

matvore · on Dec 5, 2024

Chromium is merely Chrome with only the open source parts. Chromium components are still implemented in a Google-controlled repo. So it has Google-oriented features and defaults.

MuffinFlavored · on Dec 5, 2024

1. Google Chrome

2. Chromium

3. Ungoogled Chrome?

4. Ungoogled Chromium?

5. Googled Chromium?

How does it differ from regular Chromium?

matvore · on Dec 5, 2024

Note I work for Google and I've contributed to Chromium, though I'm not necessarily an expert on Chromium forks.

1. Google Chrome

This is offical Chrome you download from google.com and also comes on ChromeOS devices.

2. Chromium

This is what you get when someone builds Chromium from the official repo without access to confidential source.

Source is confidential for various reasons, and some code that seems should be confidential actually isn't, like Android-for-ChromeOS integration, some of which is here: https://crsrc.org/c/chrome/browser/ash/arc/

3. Ungoogled Chrome?

This seems a contradiction of terms. Only Google can build Chrome, so they are not likely to e.g. set Bing as default or remove Google password manager support.

4. Ungoogled Chromium?

A particular project run by a particular team which forks Chromium and removes pro-Google behavior and settings.

5. Googled Chromium?

I don't know the original context of the use of this term, but possibly this just refers to official Chrome.

matvore · on June 6, 2024

This is the default font of my browser-based terminal emulator, http://github.com/google/werm, along with a handful of other retro fonts (uses the int10h.org ttf's--converted to bitmaps--which I suspect has all the same characters as Neue)

matvore · on June 6, 2024

IMO it still looks rather nostalgic. I do remember using cmd.exe or command.exe in Windows 95 and later and this being the default (but it would different if you had a different legacy code page set, IIRC). Of course cmd.exe just rendered the pixels as-is, no emulation or retro effects.

matvore · on May 21, 2024

Yes, let's take dozens of terminal emulator projects out of maintenance mode so app devs can individually and capriciously overload my shift and escape keys, because modern.

johnnyjeans · on May 21, 2024

what if... we could scrolljack.... the terminal

rips bong

matvore · on April 20, 2024

  > Context: Bitcoin miners have just adopted a 50% pay cut for themselves.

Miners don't decide the consensus rules. The nodes validate blocks, and the miners generate them.

The halvening timing was coded a long time ago, and in order to change it, the nodes would need to adopt the new code by installing updated clients, and at that point, you have a hard fork, because there will be nodes on the old rules, either accidentally through not updating or intentionally through using a modified core distro, and you have the new rules' valid blocks is a disjoint set from the old rules'.

The new rules' block set being a subset of the old rules' is a strictening of the consensus rules. A strictening consensus scheme is a soft fork and can keep the network in one piece.

So, there is no real way for the miners to avoid the halvening without a hardfork and a great risk to the network.

matvore · on March 3, 2024

Some say hunky dory comes from "honcho doori" or 本町通り (personally I don't see how ki can come from cho)

xanderlewis · on March 3, 2024

Would be fun if true! Seems to somewhat match semantically as well.

matvore · on Feb 21, 2024

If we're going to talk about unnecessary extra processes like useless cat, we should merge the head and grep commands into a sed, and possibly just merge everything into perl:

    <access.log sed -n '/mail/p; 500q' | perl -e ...

If perl is processing the file line-by-line then filtering lines by regex and stopping at line X is trivial, and you don't even need sed.

jkingsman · on Feb 21, 2024

I think your point is valid and I don't dispute it, but if I had a nickel for every time I said "just do it all in perl" and regretted it, I'd be... well, perhaps not a rich man, but I'd have lunch covered for a few weeks.

matvore · on Feb 21, 2024

Perl has the advantage of only having one implementation, unlike sed and grep (e.g. BSD or GNU) and /bin/sh (can be one of many POSIX shells), so upgrading this pipeline to 100% perl is safer in some respects. The example in the article is light on details so it's hard to comment very deeply.

I have heard snarky Perl putdowns ad nauseam at work and on HN and may have regretted using it a handful of times but I can say worse or similar for other popular tools, languages...

b5n · on Feb 21, 2024

I usually do it all in awk:

    awk '/mail/ && NR <= 500 {...}' access.log

If you want N matched lines:

    awk '/mail/ && i < 500 {i++; ...}' access.log

matvore · on Feb 21, 2024

I wish I knew awk as well as I know perl, since then I wouldn't need to hear recommendations for CPAN modules and spurious style prescriptions.

matvore · on Dec 18, 2023

Don't worry, we enjoy that kind of stuff.

I usually put all the resource freeing at the end of a function under a goto label, or only a few lines within the allocation, so it's easy to visually confirm everything is cleaned up. The way this commit frees the resources inside of if blocks is not how I would have done it. And if I find I let a leak in the code I usually refactor to make the correctness more obvious.

In this example, the fact that m3_ParseModule takes ownership of the wasm pointer is very tricky. It looks like there is still a leak but there is not.

knorker · on Dec 18, 2023

> it's easy to visually confirm everything is cleaned up.

Yes and no.

It's easy to not make a mistake in a function. It's basically impossible to not make this mistake in 1'000 functions.

Especially as code evolves over time, being 99% perfect about this, or even 99.9% perfect, is just not good enough. Not nearly enough.

And this reminds me of something Schneier said, that everyone can make a cryptographic algorithm that they themselves cannot break. That even as a professional, having your algorithm broken is not even embarrassing.

Similarly, very good coders are not even embarrassed when they fail at memory management. People may be better or worse at it, but even the best of us are terrible at it.

I once knew a very good coder who bragged about how apparently he's the only one able to write code that's not buggy like this.

To prove a point, I spent an hour reading his opensource project and found several resource leaks, at least one of which was remotely triggerable.

It's easy to do it right once, usually, but no these mistakes happen all the time.

> if I find [… typo words(?) omitted … ] a leak in the code I usually refactor to make the correctness more obvious.

This seems to contradict that it was easy to avoid in the first place. Yes, refactor can help. But there's a bit of learned helplessness, in that this is actually the language's fault, and it doesn't have to be that way.

matvore · on Dec 18, 2023

  > To prove a point, I spent an hour reading his opensource project
  > and found several resource leaks

Sounds like some interesting case studies. Could you share some?

  > > if I find [… typo words(?) omitted … ] a leak in the code I
  > > usually refactor to make the correctness more obvious.

I should rewrite this as - if I find a leak that I accidentally introduced, I will refactor in the process of correcting it, to make the mistake harder to repeat and the correctness easier to confirm.

There are non-language mechanisms that help code run safely, like Wasm, which is a sandbox. Also msan and asan should be used more.

Thinking that changing the language is the right way to fix all the problems you mentioned still seems like a premature assumption. The fact that 100% perfection is worth pursuing at all costs is theoretical and you could be losing things more valuable in the process - e.g. FFI bindings suck, and the added fragmentation in having so many languages in the craft is a pernicious cost with a multitude of aspects to it.

knorker · on Dec 19, 2023

Can't be more specific without doxing myself, I'm afraid.

There are non language ways to make things safer, sure. Take C++, one can do things other than RAII to avoid resource leaks, and it'll be a good pattern. But RAII is right there.

In C++ one can be disciplined about object ownership. But I find that in Rust "doing the right thing" is not optional.

Static and strong typing have similar virtues.

You can write fine software in assembly. Steve Gibson apparently does.

I'm not saying there are any silver bullets. I do appreciate that five minutes spent now being forced to think about object ownership, can save a six month project down the line refactoring to remove all shared_ptrs.

Not accidentally creating copies, because RVO has many conditionals on when it happens.

No silver bullets. But in my opinion C is always worse than at least a C++ subset deliberately chosen. Plain C is because you enjoy the journey, not because you'll get there faster or better.

In C++ you can refactor to RAII encapsulate, when a resource is leaked. In C you can't.

matvore · on Dec 18, 2023

  > To prove a point, I spent an hour reading his opensource project
  > and found several resource leaks

This is asking a lot, but if you enjoy that, I would be thrilled if you could do the same for some of my C projects - nusort and werm under github.com/matvore.

knorker · on Dec 19, 2023

These days I have less time, and more obligations and side projects than time.

Doesn't hurt to ask, though. :-)

matvore · on Nov 11, 2023

If you store the number of zeros as value z, computing z+1 is O(log n) in the long run for unbounded values of z.