And the article you’ve linked says “file system treats path and file names as an opaque sequence of WCHARs.” This means no information is lost in the kernel, either.
Indeed, kernel doesn’t validate nor normalize these WCHARs, but should it? I would be very surprised if I ask an OS kernel to create a file, and it silently changed the name doing some Unicode normalization.
I'm sorry if I was unclear but my point was that when you receive a string from the Windows API you cannot make any assumptions about it being valid UTF-16. Therefore converting it to UTF-8 is potentially lossy. So if you then convert it back from UTF-8 to UTF-16 and feed it to the WinAPI you'll get unexpected results. Which is why I feel converting back and forth all the time is risky.
This is one reason why the WTF-8[0] encoding was created as a UTF-8 like encoding that supports invalid unicode.
E.g. the filesystem accepts any sequence of WCHARs, whether or not they're valid UTF-16: https://docs.microsoft.com/en-us/windows/desktop/FileIO/nami...
> the file system treats path and file names as an opaque sequence of WCHARs.
The same is true more generally, there's no validation so anything goes.