The technology argument is the most convincing one to me. I worked with a Japanese client a few years ago and the internal tools they used were wild by western standards. Like full-on frameset layouts in 2020. But it wasn't ignorance, it was continuity. The tools worked, people knew how to use them, and there was zero appetite for redesigning something that wasn't broken.
The font thing is also underrated as a factor. When you only have a handful of web-safe CJK fonts and you can't rely on weight/size variations to create hierarchy the way you can with Latin text, you compensate with color and density. It's a constraint that pushes you toward a specific aesthetic whether you want it or not.
I think the framing of "peculiar" is a bit western-centric though. Dense information-heavy pages are arguably more respectful of the user's time than the trend of spreading three sentences across five viewport-heights of whitespace.
Been running a handful of dedicated boxes on Hetzner for about 5 years now. Even with the increase, the price/performance ratio is still way better than anything comparable from the big three US clouds. Their AX-series auction servers especially.
What concerns me more than the price hike itself is the trend. Memory prices spiking, hard drives selling out, and now this. If you're running anything with serious storage or RAM needs, it's worth locking in what you can now. I grabbed an extra auction server last month just because the specs were good and I figured prices were only going up.
For anyone panicking about alternatives: OVH and Netcup are decent in Europe but have their own tradeoffs. OVH's network has been flaky for me, and Netcup's support is basically nonexistent. Hetzner's support has been solid every time I've needed it, which is worth something.
The insight that a blank status area means "the house is healthy" is the best part of this whole project imo. Most smart home dashboards try to show you everything all the time and you just end up tuning it all out. This is basically the opposite approach and it makes way more sense for something you glance at 50 times a day.
I tried something similar with a Kindle a few years back for just weather + calendar and ran into the same jailbreak maintenance hell. Ended up giving up. The Visionect displays look great but $1000+ per screen is brutal. Curious if the author has looked at the Waveshare e-paper panels driven by an ESP32, they're like $40-80 for a 7.5" screen and you can do partial refreshes. Obviously way smaller than the Boox but might work as a cheaper bedroom/mudroom option for people who want to build something like this without spending $3k.
Same experience, the job of the display is to help manage attention, instead of information delivery.
If we think about paper calendars hung on a wall, and updated sporatically we know what's there is likely good information, only that it might not be up to date.
If a calendar can be calm by default, surface what's changed or newly relevant, and fade when it resolves. The next level could be understanding who's attention something needs, and when, in a personalized way.
The byte-for-byte identical output requirement is the smartest part of this whole thing. You basically get to run the old and new pipelines side by side and diff them, which means any bug in the translation is immediately caught. Way too many rewrites fail because people try to "improve" things during the port and end up chasing phantom bugs that might be in the old code, the new code, or just behavioral differences.
Also worth noting that "translated from C++" Rust is totally fine as a starting point. You can incrementally make it more idiomatic later once the C++ side is retired. The Rust compiler will still catch whole classes of memory bugs even if the code reads a bit weird. That's the whole point.
> Way too many rewrites fail because people try to "improve" things during the port
I'd say that porting is a great time to "improve" many things, but like you suggest, not a great time to add new features. You can do a lot of improvements while maintaining output parity. You're in the weeds, reading the code, thinking about the routines, and you have all the hindsight of having done it already. Features are great to add as comments that sketch things out but importantly this is a great time to find and recognize that maybe a subroutine is pretty inefficient. I mean the big problem in writing software is that the goal are ever evolving. You wrote the software for different goals, different constraints. So a great time to clean things up, make them more flexible, more readable, *AND TO DOCUMENT*.
I think the last one gets ignored easily but my favorite time to document code is when reading it (but the best time is when writing it). It forces you to think explicitly about what the code is doing and makes it harder for the little things to slip by. Given that Ladybird is a popular project I really do think good documentation is a way to accelerate its development. Good documentation means new people can come in and contribute faster and with fewer errors. It lowers the barrier to entry, substantially. It's also helpful for all the mere mortals who forget things
LLMs are great at producing documentation - ask one "hey can you add a TODO comment about the thing on line 847 that is probably not the best way to do this?" while you're working on the port, and it will craft a reasonably-legible comment about that thing without further thought from you, that will make things easier for the person (possibly future-you) looking at the codebase for improvements to make. Meanwhile, you keep on working on the port that has byte-for-byte identical output.
I guess to have the LLM write a lengthy descrption of why the code is bad? Otherwise it doesn't make sense to ask an LLM instead of typing // TODO: bad coding style.
Why do I want length? Good docs are terse. Longer isn't better. When I read docs I want to get to know the developer and what they're thinking. I want to see their vision, not their calculator. It's not about quantity, it's about quality.
Just give your users and fellow devs some basic respect. I mean would you be okay with me just handing our conversation over to an LLM? Honestly I'd be insulted if you did that. Why is this any different? Docs are how you communicate to users and fellow developers
Concision is one of several virtues in documentation, and it trades off against thoroughness. I don't actually want to get to know the developer when I read a comment in some piece of code I am reading or editing, I want to as quickly and accurately as possible understand what I need to know about that code to make the changes I care about at that time.
Anyway I personally have asked LLMs to create more detailed TODO items for myself, on personal projects where I am the only human programmer who had laid eyes on the code. In fact, I do this frequently, including multiple times earlier today. So I don't take the idea seriously that using an LLM to generate a TODO comment is an inherently disrespectful thing to do.
I didn't say that length was a requirement, I was only thinking about why one would want to use an LLM to write a comment that is only a few words long, while the prompt might be as long as the (short version of the) comment itself.
I hope, with the velocity unlocked by these tools, that more pure ports will become the norm. Before, migrations could be so costly that “improving” things “while I’m here” helped sell doing the migration at all, especially in business settings. Only to lead to more toil chasing those phantom bugs.
One of the biggest point of rewriting is you know better by then so you create something better.
This is a HUUUGE reason code written in rust tended to be so much better than the original (which was probably written in c++).
Human expertise is the single most important factor and is more important than language.
Copy pasting from one language to another is way worse than complete rewrite with actual idiomatic and useful code.
Best option after proper rewrite is binding. And copy-paste with LLM comes way below these options imo.
If you look at real world, basically all value is created by boring and hated languages. Because people spent so much effort on making those languages useful, and other people spent so much effort learning and using those languages.
Don’t think anyone would prefer to work in a rust codebase that an LLM copy-pasted from c++, compared to working on a c++ codebase written by actual people that they can interact with.
I've seen more than a few rewrite attempts fail throughout the years. However, I've never seen a direct language to language translation fail. I've done several of these personally: from perl to ruby, java to kotlin, etc.
Step 1 of any rewrite of a non-trivial codebase should should be parity. You can always refactor to make things more idiomatic in a later phase.
I rewrote two times my personal tool, one to rust, and out of rust with LLM, I mainly "vibe code" for reasons but the tool is more complete and less buggy than most B2B and enterprise business or IT software that I have seen in 20years at different fortune 100.
I feel the effort and frustration to have parity with LLM is more than doing a full rewrite without.
These are two different kinds of rewrites, for two different kinds of codebases, in two different situations. The important thing is to know which kind of rewrite you're doing, and have the whole team onboard.
The sort of rewrite you're talking about can work well at an early stage of a project, in the spirit of Fred Brooks's "plan to throw one away". But for a mature browser like Ladybird that's trying to not break the user experience, it's much better to have a pure translation step, and then try to improve or refactor it later.
That reminds me of the "Strangler Fig" pattern where you replace a service by first sending the requests to both the old and new implementation so you can compare their outputs. Then only when you're confident the new service functions as expected do you actually retire the old service.
The key part of the strangler fig is the facade and gradual migration of capabilities rather than trying to do a rewrite and swap (which never ends well).
I did several web framework conversions exactly like this. Make sure the http output string matches in the new code exactly as the old code and then eventually deleted the old code with full confidence.
Really like this translation approach and I had written about it just couple of days back (more from a testing and validation context). To see folks take that approach to something complex is pretty amazing!
https://balanarayan.com/2026/02/20/gen-ai-time-to-focus-on-l...
Banning "length" from the codebase and splitting the concept into count vs size is one of those things that sounds pedantic until you've spent an hour debugging an off-by-one in serialization code where someone mixed up "number of elements" and "number of bytes." After that you become a true believer.
The big-endian naming convention (source_index, target_index instead of index_source, index_target) is also interesting. It means related variables sort together lexicographically, which helps with grep and IDE autocomplete. Small thing but it adds up when you're reading unfamiliar code.
One thing I'd add: this convention is especially valuable during code review. When every variable that represents a byte quantity ends in _size and every item count ends in _count, a reviewer can spot dimensional mismatches almost mechanically without having to load the full algorithm into their head.
> When every variable that represents a byte quantity ends in _size and every item count ends in _count, a reviewer can spot dimensional mismatches almost mechanically
At that point I'd rather make them separate data types, and have the compiler spot mismatches actually-mechanically o.o
Its easy to code review the interaction between these - the valid code almost writes itself from the names - lots of operations will just obviously read as wrong in ways that they may not with i.e. num_allocs, total_alloc and avg_alloc, or num, total, avg.
I find it useful to use statically typed variable names. My mental syntax checker benefits from the extra type metadata being embedded at each usage, reducing mental indirections/my reliance on the larger context and enabling me to quickly detect many types of mistakes via more efficient local reasoning.
I've always understood "length" to mean what the author calls "count", and would never expect it to refer to byte size; as far as I can tell, it never did. Size is a design-time consideration; caring about it in the code is an exceptional case, for applications like (as you mention) serialization. So that's what deserves the dedicated term. "Length" refers specifically to a total number of elements in many languages preceding Rust.
For that matter, many languages, especially "object-oriented" ones, treat heterogeneous containers as the default. They might not even offer native containers that can store everything inline in a single contiguous allocation, except perhaps for strings. In which case, "number of bytes" is itself ambiguous; are you including the indirected objects or not?
"Count" is also overloaded — it commonly means, and I normally only understand it to mean, the number of elements in a collection meeting some condition. Hence the `.count` method of Python sequences, as well as the jargon "population count" referring to the number of set bits in an integer. Today, Python's integers have both a `.bit_count` and a `.bit_length`, and it's obvious what both of them do; calling either `.bit_size` would be confusing in my mental framework, and a contradiction in terms in the OP's.
I would disagree that even C's `strlen` refers to byte size. C comes from a pre-Unicode world; the type is called `char` because that was naively considered sufficient at the time to represent a text character. (Unicode is still in that sense naive, but it at least allows for systems that are acutely aware of the distinction between "characters" and graphemes.) But notice: C's "strings" aren't proper objects; they're null-terminated sequences, i.e. their length is signaled in-band. So that metadata is also just part of the data, in a single allocation with no indirection; the "size" of a string could only reasonably be interpreted to include that null terminator. Yet the result of `strlen` excludes it! Further, if `strlen` is used on a string that was placed within some allocated buffer, it knows nothing about that buffer.
(Similarly, Rust `str::len` is properly named by this scheme. It gives the number of valid 1-byte-sized elements in a collection, not the byte size of the buffer they're stored within. It's still ambiguous in a sense, but that's because of the convention of using UTF-8 to create an abstraction of "character" elements of non-uniform size. This kind of ambiguity is properly resolved either with iterators, like the `Chars` iterator in Rust, or with views.)
Also consider: C has a `sizeof` operator, influencing Python's `.__sizeof__()` methods. That's because the concept of "size" equally makes sense for non-sequences; neither "count" nor "length" does. So of course "length" cannot mean what the author calls "size".
The part where they blame users for not changing the default password is infuriating but unfortunately very common. I've seen this exact same attitude from companies that issue credentials like "Welcome1!" and then act shocked when accounts get popped.
What really gets me is the legal threat angle. Incremental user IDs + shared default password isn't even a sophisticated attack to discover. A curious user would stumble onto this by accident. Responding to that with criminal liability threats under Maltese computer misuse law is exactly the kind of thing that discourages researchers from reporting anything at all, which means the next person who finds it might not be so well-intentioned.
The fact that minors' data was exposed makes the GDPR Article 34 notification question especially pointed. Would love to know if the Maltese DPA ever followed up on this.
Which seems trivial in English, but gets messy once you support languages with multiple plural categories.
I wasn’t really aware of how nuanced plural rules are until I dug into ICU. The syntax looked intimidating at first, but it actually removes a lot of branching from application code.
I’ve been using an online ICU message editor (https://intlpull.com/tools/icu-message-editor) to experiment with plural/select cases and different locales helped me understand edge cases much faster than reading the spec alone.
Thanks, I'm a decades-long user of gettext from both developer and translator point of view, and have encountered several of the drawbacks to some extent.
It's very good, and has certainly been good enough for most practical purposes, but innovation needs to happen, and things can certainly get better. Thanks for your work in this direction!
Yeah, some sort of pluralization support is pretty much the second most important feature in any message localization tool, right after the ability to substitute externally-defined strings in the first place. Even in a monolingual application, spamming plural formatting logic in application code isn't exactly the best practice.
It's not hard to make a case against gettext, despite its maturity and large ecosystem.
IMHO pluralization is a prime example, with an API that only cleanly handles the English case, requires the developer to be aware of translation gotchas, and honnestly confusing documentation and format. Compare that to MessageFormat's pluralization example (https://github.com/unicode-org/message-format-wg/blob/main/s...) which is very easy to understand and fully in the translator's hands.
> IMHO pluralization is a prime example, with an API that only cleanly handles the English case
That’s not true at all? Gettext is functionally limited to source code being English (or alike). It handles all translation languages just fine, and competently so.
What is doesn’t have is MessageFormat’s gender selectors (useful) or formatting (arguably not really, strays from translations to locales and is better solvable with placeholders and locale-aware formatting code).
> fully in the translator's hands.
That is a problem that gettext doesn’t suffer from. You can’t reasonably expect translators to write correct DSL expressions.
> Gettext is functionally limited to source code being English (or alike). It handles all translation languages just fine, and competently so.
The *ngettext() family of functions take two strings (typically singular/plural) and rely on a language-wide expression to choose the variant (possibly more than 2 variants). There's no good reason for taking two strings, this should be handled in the language file, even without a DSL. Ngettext handling a single countable makes some corner-cases awkward, like gendering a group with possibly mixed-gender elements. The Plural-Forms expression not being per-message means that for example even in English "none/one/many foo" has to be handled in code, and that a language with only a rare 3rd plural has to pay the complexity for all cases.
Arguably, those are all nitpicks, Gettext is adequate for most projects. But quality translations get cumbersome very quickly.
> You can’t reasonably expect translators to write correct DSL expressions.
This feels demeaning. Translators regularly have to check the source code, and often write templates, they're well able for a DSL like MessageFormat's, especially when it's always the same expressions for their language. It saves a trip to the bugtracker to get developers to massage their code into something translatable. You can't reasonably expect a English-speaking developer armed with ngettext to know (and prepare their code for) the subtleties of Gaelic numerals.
Seems like to get it right for every use case / language, you would need functions to translate phrases - so switch statements may be a valid solution. The number of text elements needed for pagination, CRUD operations and similiar UI elements should be finite :)
I checked the spec and don't get that really. Something should specify the formula for choosing the correct form (ie 1 for 21 in Slavic languages) and the format isnt any better compared to the gettext of 30 years ago
Masz {file_count, plural,
one {# plik}
few {# pliki}
other {# pliko'w}
}
The library (and your translators) know that in Polish, the `few` variant kicks in when `i%10 = 2..4 && i%100 != 12..14`, etc. I think the library just knows these rules for each language as part of the standard. Mozilla says that it was an explicit design goal to put "variant selection logic in the hands of localizers rather than developers"
The point is that it's supported, it simplifies developer logic, and your translators know how to work with it.
"the library just knows these rules for each language as part of the standard" sounds great until you try to support a small minority language that the library just doesn't know about and then you're left trying to hack around it by pretending that it's actually a regional variety of another language with similar plural rules.
AFAIK, unlike gettext, MessageFormat doesn't allow you to specify a formula for the plural forms as part of the localization data, so the variant selection logic ended up in the hands of library developers rather than localizers or application developers.
And the standard does get updated occasionally, which can also lead to bugs with localization data written against another version of the standard: https://github.com/cakephp/cakephp/issues/18740
>This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this.
Well, making out of band sure is one way to do to prevent lazy people from doing eval on plural forms from the po file. I hope the library is actually good then.
I tried building an RSS library feature for a side project (https://beavergrow.com), mostly as a way to curate feeds I actually enjoy reading. It quickly highlighted how fragile the RSS ecosystem has become Feedburner gone, Google slowly de-emphasizing RSS, and discovery being the hardest part now.
RSS still feels like one of the few genuinely user-controlled ways to follow the web, but keeping it usable today seems to depend almost entirely on community curation. Curious how others here handle feed discovery now.
I built https://poker.beavergrow.com to solve the problem of playing poker with friends when you have a deck of cards but no physical chips. I wanted something that felt as fast as a native app but required zero setup for the players.
it has no bs, no signup, just create a room and play. players can join instantly by scanning a qr code on the share screen.real time activity feeds that stay perfectly in sync across all devices.
Also includes custom avatars, themes, and fun message/reaction to it.
its a fully functional PWA, so it can be installed on a home screen for a full-screen, native-like experience at the table.
i have made https://codekeep.io for storing snippets, have similar features to evernote. all users will get free pro membership now. if you are thinking about moving , please consider codekeep too.
The font thing is also underrated as a factor. When you only have a handful of web-safe CJK fonts and you can't rely on weight/size variations to create hierarchy the way you can with Latin text, you compensate with color and density. It's a constraint that pushes you toward a specific aesthetic whether you want it or not.
I think the framing of "peculiar" is a bit western-centric though. Dense information-heavy pages are arguably more respectful of the user's time than the trend of spreading three sentences across five viewport-heights of whitespace.
reply