The writeup does not mention Jeff Atwood (Stackoverflow founder) trying to convince Gruber to standardize markdown. Atwood approached him publicly in a series of blog posts, but Gruber kept silent, and if I remember correctly finally declined stating that he didn't want to spend time jumping through other persons' hoops. Although it sucks that markdown is not standardized, I still see this as an inspiring example of a person just doing what he wants to do.
It happened a bit differently; Atwood and friends simply came out with a standard document and called it "standard markdown", which Gruber then refused to endorse. Eventually after the series of blog posts and some back and forth they renamed the project "CommonMark", which it is still called today.
I am not sure (of course), but I think Atwood simply thought standardizing this format was so obviously valuable that he didn't consider Gruber might not want to work with him. In retrospect it's kind of nice that it didn't happen, it really keeps everyone incentivized to keep the format simple.
The linked post contains three cases of Markdown syntax (underscores) leaking into the text, where actual italics were likely intended. This is the most basic Markdown syntax element failing to work. The problem CommonMark is trying to solve is not adding new features (the only one they added to Gruber Markdown is fenced code blocks), but rather specifying how to interpret edge cases to ensure the same Markdown code produces the same HTML everywhere.
I understand the goal of the spec. In my experience once some spec document gets adapted widely enough, there's a strong incentive to add new features to it, which renderers would then be compelled to implement. Before you know it, MD is a complicated spec that doesn't serve its original purpose.
In this case a few minor edge cases is really not a big deal compared to that (in my opinion).
Another take on that is that Gruber is unable to sabotage a markdown standard from coming to exist, no matter how much of a tantrum he wants to have. I have no interest in listening to him about the topic, he's just in the way of the community and everyone is routing around the damage.
What Gruber has done is forced the spec to be called CommonMark, but as far as everyone except Gruber is concerned CommonMark is the markdown spec.
There are flavors that predate it like GFM, and extensions, but IMHO going forward it's CommonMark + possibly your extensions or it's not really Markdown.
I used to host an SSH server at home at port 443, for the same reason! The sysadmin of my employer was so strict that 'solutions' like this were the way of the least resistance. Security gets worse when policies get stricter.
I had a teacher who became angry when a question was asked about a subject he felt students should already be knowledgeable about. "YOU ARE IN xTH GRADE AND STILL DON'T KNOW THIS?!" (intentional shouting uppercase). The fact that you learned it yesterday doesn't mean all humans in the world also learned it yesterday. Ask questions, always. Explain, always.
Such questions can be jarring though. I remember my "Unix Systems Programming" class in college. It's a third year course. The instructor was describing the layout of a process in memory, "here's the text segment, the data segment, etc." when a student asked, "Where do the comments go?"
It states the cargo culted reasons, but not the actual truth.
1) Pronounciation is either solved by a) automatic language detection, or b) doesn't matter. If I am reading a book, and I see text in a language I recognize, I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader. There's no benefit to my screen reader pronouncing Hungarian correctly to me, a person who doesn't speak Hungarian. On the off chance that the screen reader gets it wrong, even though I do speak Hungarian, I can certainly tell that I'm hearing english-pronounced hungarian. But there's no reason that the screen reader will get it wrong, because "Mit csináljunk, hogy boldogok legyünk?" isn't ambiguous. It's just simply Hungarian, and if I have a Hungarian screen reader installed, it's trivial to figure that out.
2) Again, if you can translate it, you already know what language it is in. If you don't know what language it is in, then you can't read it from a book, either.
3) See above. Locale is mildly useful, but the example linked in the article was strictly language, and spell checking will either a) fail, in the case of en-US/en-UK, or b) be obvious, in the case of 1) above.
Your whole comment assumes language identification is both trivial and fail-safe. It is neither and it can get worse if you consider e.g. cases where the page has different elements in different languages, different languages that are similar.
Even if language identification was very simple, you're still putting the burden on the user's tools to identify something the writer already knew.
Language detection (where “language”== one of the 200 languages that are actually used), IS trivial, given a paragraph of text.
And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user. Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.
Do you have any reference of that or are you implying we shouldn't support the other thousands[0] of languages in use just because they don't have a big enough user base?
> And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user.
In the case of Hacker News or other pages with user submitted and multi-language content, you can just mark the comments' lang attribute to the empty string, which means unknown and falls back to detection. Alternatively, it's possible to let the user select the language (defaulting to their last used or an auto-detected one), Mastodon and BlueSky do that. For single language forums and sites with no user-generated content, it's fine to leave everything as the site language.
> Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.
There's also no "screen reader" nor "auto translation" in other written language. Setting the content language helps to improve accessibility features that do not exist without computers.
I wish this comment was true, but due to a foolish attempt to squish all human charactets to 2 bytes as UCS (that failed and turned into the ugly UTF-16 mess), a disaster called Han Unification was unleashed upon the world, and now out-of-band communication is required to render the correct Han characters in a page and not offend people.
> It states the cargo culted reasons, but not the actual truth
This dismisses existing explanations without engaging with the mentioned reasons. The following text then doesn't provide any arguments for this.
> Pronunciation is either solved by a) automatic language detection, or b) doesn't matter.
There are more possibilities than a and b. For example, it may matter for other things than pronunciation only. Also it may improve automatic detection or make automatic detection superfluous.
> If I am reading a book [...] I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader.
A generalization of your own experience to all users and systems. Screen readers aim to convey information accessibly, not mirror human ignorance.
> There's no reason that the screen reader will get it wrong, because <hungarian sentence> isn't ambiguous
This is circular reasoning. The statement is based on the assumption that automatic detection is always accurate - which is precisely what is under debate.
> If you can translate it, you already know what language it is in.
This a non sequitur. Even if someone can translate text, that doesn't mean software or search engines can automatically identify that language.
It is about a company, First Wap, that makes it possible to track individuals. Their USP is a piece of software that operates at phone network level and uses the fact that phone companies still support an old protocol, Signalling System 7:
> Phone networks need to know where users are in order to route text messages and phone calls. Operators exchange signalling messages to request, and respond with, user location information. The existence of these signalling messages is not in itself a vulnerability. The issue is rather that networks process commands, such as location requests, from other networks, without being able to verify who is actually sending them and for what purpose.
> These signalling messages are never seen on a user’s phone. They are sent and received by “Global Titles” (GTs), phone numbers that represent nodes in a network but are not assigned to subscribers.
> The issue is rather that networks process commands, such as location requests, from other networks, without being able to verify who is actually sending them and for what purpose
'Fun' fact: "other networks" includes all foreign networks with a roaming partnership. It's possible to abuse SS7 to track people across borders, from half the world away.
> it’s more than that. it’s any device that can present itself as a possible base station.
can you elaborate on this a bit? what devices are able to to present themselves as possible base stations? do i need any form of entitlement to participate in the network or not? From past encounters with SS7 and its, uhm, capabilities, it seemed the hardest part would be getting access to the network, albeit not hard really, it sounds like you were hinting at possibly gaining access by participating in the network without any official entitlement, by posing as a base station.
I believe he is referring to femtocells which have (are ?) given freely to end users who need cellular signal boosting, etc.
Many of these femtocells, historically, could be trivially altered or updated to participate as literal peers on SS7.
I haven't looked into this for many years but there was a time when operating a certain femtocell granted the owner an enormous amount of leverage on the global telecom network ...
The links in the article are all annotated with a symbol (showing what happens when you click the link?), which breaks the flow of reading imo. I don't feel this helps communicating the message of the article.
I agree it's not helpful. The star marks a 'new' link (ie. seen for the first time recently in the website corpus). It's useful for helping flag recently-modified writing or new additions in my essays, where I might have added a bunch of links 15 years after starting the essay, and then the interesting new paragraph jumps out. They are 'starred' as something new which might be worth looking at even if you've read that page before. (Classic but forgotten Web 1.0 design pattern.) Unfortunately, in cases like this, I haven't ever written about pinball hacking before, so almost all the links are new...
I've thought about how to handle it, but haven't come up with a good answer. It may be that since 'blog' posts come with a clear temporal context, compared to my usual pages, it makes more sense to just disable that particular feature on all blog posts. We'll think about it.
Achmiz argues that it makes more sense to define each page as 'new' vs 'old', and disable the black-stars based on that, rather than on whether it's a blog or not. This is more work, but makes more sense (and we might want to do other things based on that binary variable, like emphasize the created date at the top of the page, to warn readers that something is recent and thus still half-baked). So we're implementing that now.
In example.com/blah, the /blah part is interpreted by the host itself.
And apart from that I would indeed consider DNS records a database.
reply