> Code is much much harder to check for errors than an email.
Disagree.
Even though performing checks on dynamic PLs is much harder than on static ones, PLs are designed to be non-ambiguous. There should be exactly 1 interpretation for any syntactically valid expression. Your example will unambiguously resolve to an error in a standard-conforming Python interpreter.
On the other hand, natural languages are not restricted by ambiguity. That's why something like Poe's law exists. There's simply no way to resolve the ambiguity by just staring at the words themselves, you need additional information to know the author's intent.
In other words, an "English interpreter" cannot exist. Remove the ambiguities, you get "interpreter" and you'll end up with non-ambiguous, Python-COBOL-like languages.
With that said, I agree with your point that blindly accepting 20kloc is certainly not a good idea.
Tell me you've never written any python without telling me you've never written any python...
Those are both syntactically valid lines of code. (it's actually one of python's many warts). They are not ambiguous in any way. one is a number, the other is a tuple. They return something of a completely different type.
My example will unambiguously NOT give an error because they are standard conforming. Which you would have noticed had you actually took 5 seconds to try typing them in the repl.
> Those are both syntactically valid lines of code. (it's actually one of python's many warts). They are not ambiguous in any way. one is a number, the other is a tuple. They return something of a completely different type.
You just demonstrated how hard it is to "check" an email or text message by missing the point of my reply.
> "Now imagine trying to spot that one missing comma among the 20kloc of code"
I assume your previous comment tries to bring up Python's dynamic typing & late binding nature and use it as an example of how it can be problematic when someone tries to blindly merge 20kloc LLM-generated Python code.
My reply, "Your example will unambiguously resolve to an error in a standard-conforming Python interpreter." tried to respond to the possibility of such an issue. Even though it's probably not the program behavior you want, Python, being a programming language, will be 100% guaranteed to interpret it unambiguously.
I admit, I should have phrased it a bit more unambiguously than leaving it like that.
Even if it's hard, you can try running a type checker to statically catch such problems. Even if it's not possible in cases of heavy usage of Python's dynamic typing feature, you can just run it and check the behavior at runtime. It might be hard to check, but not impossible.
On the other hand, it's impossible to perform a perfectly consistent "check" on this reply or an email written in a natural language, the person reading it might interpret the message in a completely different way.
> Or is a big part of this concept only relevant for strong functional languages with sum types and pattern matching?
It need not strictly be a pure functional language for type-driven style to be usable. Type-driven style only requires the fact that some type cannot be assigned to another type, so it's kind of possible to do even in a language like C, as `int a = (struct Foo) {};` would get rejected by C compilers.
However, I don't think it's doable in languages with structural type systems like Typescript or Go's interface without a massive ergonomic hit for minimal gain. Languages with a structural type system are deliberately designed to remove the intentionality of "type T cannot be assigned to type S" in exchange for developer ergonomics.
> However, is there a similar article but written with more common languages (C#, C++, Java, Go) in mind?
For C#, there's F#-focused article, which I believe some of it can be applied to C# as well:
For modern Java, there is some attempt at popularizing "Data-Oriented Programming" which just rebranded "Type-driven design". Surprisingly, with JDK 21+, type-driven style is somewhat viable there, as there is algebraic data type via `record` + `sealed` and exhaustive pattern match & destructuring.
For Rust, due to the new mechanics introduced by its affine type system, there is much more flexibility in what you could express in Rust types compared to more common languages.
> However, I don't think it's doable in languages with structural type systems like Typescript or Go's interface without a massive ergonomic hit for minimal gain.
Go only has structural interfaces, concrete types are nominative, and this sort of patterns tends to be more on the concrete side.
Typescript is a lot more broadly structural, but even there a class with a private member is not substitutable with a different class with the same private member.
Coming from a more "average imperative" background like C and Java, outside of compiler or serde context, I don't think "parse" is a frequently used term there. The idea of "checking values to see whether they fulfill our expectations or not" is often called "validating" there.
So I believe the "Parse, Don't Validate" catchphrase means nothing, if not confusing, to most developers. "Does it mean this 'parse' operation doesn't 'validate' their input? How do you even perform 'validation' then?" is one of several questions that popped up in my head the first time I read the catchphrase prior to Haskell exposure.
Something like "Utilize your type system" probably makes much more sense for them. Then just show the difference between `ValidatedType validate(RawType)` vs `void RawType::validate() throws ParseError`.
The crucial design choice is that you can't get a Doodad by just saying oh, I'm sure this is a Doodad, I will validate later. You have to parse the thing you've got to get a Doodad if that's what you meant, and the parsing can fail because maybe it isn't one.
let almost_pi: Rational = "22/7".parse().unwrap();
Here the example is my realistic::Rational. The actual Pi isn't a Rational number so we can't represent it, but 22 divided by 7 is a pretty good approximation considering.
I agree that many languages don't provide a nice API for this, but what I don't see (and maybe you have examples) is languages which do provide a nice API but call it validate. To me that naming would make no sense, but if you've got examples I'll look at them.
The point is parse and validate are interchangeable words for the most part. If you’re parsing something you expect to be an int, but it’s a float or the letter “a” is that not invalid? Is this assessment a form of validating expectations? The line between parsing and validating doesn’t exist.
That's true, but then again, don't forget the fact that words might get interpreted as different things by different people. Words like "arrow", "functor", or "validate" might get interpreted slightly differently even between people with the same background.
After all, the meaning of words is just a socially accepted meaning attached to a certain arrangement of symbols. The meaning can be whatever they want it to be. And even though each individual might interpret it slightly differently, as long as the interpretation is "compatible", communication between individuals is possible.
Arguably, it's more useful to distinguish between "parse" & "validate" and I agree with that. But based on my own experience and what I've observed when I'm trying to spread type-driven style, it looks like there's no difference in meaning between the words "parse" and "validate" for most developers. Trying to sell type-driven style via the "catchy catchphrase" "Parse, Don't Validate" will certainly backfire, confusing most people rather than making them appreciate the value of it.
In my opinion, it's not worth it to combat this "parse" & "validate" misconception for the sake of the catchphrase "Parse, Don't Validate". Why? Pure FP and type-driven style already put off most people because of the tendency to go with mathematical jargon. Why add even more unnecessary barriers when the core of it is just "utilize your type system"?
I agree with the point of the "Parse, Don't Validate" article, but I strongly dislike the "catchphrase" marketing part.
The fact that we are in disagreement here proves my point. If you pose this question to 10,000 developers, you will get mixed answers. This ambiguity is why I think the phrasing of this article (not the intent) is incorrect.
In the spirit of "Parse, Don't Validate", rather than encode "validation" information as a boolean to be checked at runtime, you can define `Email { raw: String }` and hide the constructor behind a "factory function" that accepts any string but returns `Option<Email>` or `Result<Email,ParseError>`.
If you need a stronger guarantee than just a "string that passes simple email regex", create another "newtype" that parses the `Email` type further into `ValidatedEmail { raw: String, validationTime: DateTime }`.
While it does add some "boilerplate-y" code no matter what kind of syntactical sugar is available in the language of your choice, this approach utilizes the type system to enforce the "pass only non-malformed & working email" rule when `ValidatedEmail` type pops up without constantly remembering to check `email.isValidated`.
This approach's benefit varies depending on programming languages and what you are trying to do. Some languages offer 0-runtime cost, like Haskell's `newtype` or Rust's `repr(transparent)`, others carry non-negligible runtime overhead. Even then, it depends on whether the overhead is acceptable or not in exchange for "correctness".
I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".
Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.
You can use similar logic to what you described, but instead with something like User and ValidatedUser. I just don't think there's much benefit to doing it with specifically the email field and turning email into an object. Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?" and it's very similar to just checking a validation bool except it's hiding what's actually going on.
> I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".
> Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.
You are looking at this single type in isolation. The benefit of an email type over using a string to hold the email is not validating the actual string as an email address, it's forcing the compiler to issue an error if you ever pass a string to a function expecting an email.
Consider function `foo`, which takes an email and a username parameter.
> Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?"
In languages with a strong type system, `User` should hold `email: Option<ValidatedEmail>`. This will reject erroneous attempts `user.email = Email::parse(raw_string);` at compile time, as `Result<Email,ParseError>` is not compatible / assignable to `Option<ValidatedEmail>`.
It's kind of a "oh I forgot to check `email.isValidated`" reminder, except now being presented as an incompatible type assignment and at compile-time. Borrowing Rust's syntax, the type error can be solved with
Which more or less gets translated as "Check email well-formedness of this raw string. If it's well-formed, try to send a test email. In case of any failure during parsing or test email, leave the `user.email` field to be empty (represented with `Option::None`)".
> and it's very similar to just checking a validation bool except it's hiding what's actually going on.
Arguably, it's the other way around. Looking back at `email: Option<ValidatedEmail>`, it's visible at compile-time `User` demands "checking validation bool", violate this and you will get a compile-time error.
On the other hand, the usual approach of assigning raw string directly doesn't say anything at all about its contract, hiding the contract of `user.email` must be a well-formed, contactable email. Not only it's possible to assign arbitrary malformed "email" string, remembering to check `email.isValidated` is also programmer due diligence, forget once and now there's a bug.
> This makes those tools, as powerful as they can be, unable to help us think about and enforce correctness across horizons that are not visible from the standpoint of a single project at a single point in time (systems distributed across time, across network, across version history).
Eh?
I don't think there's something preventing you from constructing guardrails with type system & tests enforcing correctness that handles versioning.
I'm not buying the "unable to help us to think about" part. I argue that Rust's `Option`/`Result`/sum-type + exhaustive match/switch very valuable for proper error handling. You could define your own error type and exhaustively handle each case gracefully, with the bonus that now your compiler is also able to check it.
I guess you could pick a subset of a particular natural language such that it removes ambiguity. At that point, you're basically reinventing something like COBOL or Python.
Ambiguity in natural languages is a feature, not a bug. While it's better not to be an unintentional pun or joke instruction that might get interpreted as "launch the missile" by computer.
However, each project error tolerance is different. Arguably, for an average task within the umbrella of "software engineer", even current LLMs seem good enough for most purposes. It's a kind of similar transition to automatic memory managed language, trading control for "DX".
C#'s anonymous type shares some flexibility of structural type system even though it still a nominal type.
> A language would only need a single "newtype" or "nominal" keyword to create nominal types from structural types.
I think you also can add `structural` keyword & apply structural type system in generally nominal type system as well if we're talking about adding feature.
"Script" PLs tend to be interpreted, dynamic, and handwave various machine-level details. In contrast, "compiled" PLs usually provide you the constructs to manipulate native machine-level features directly.
Realistically, communities around "script" languages aren't going to talk much about memory layout or syscall. Instead, getting the job done fast (devtime-wise) is their main focus.
On the other hand, "compiled" languages tend to draw people who like squeezing every bit of computing power from their computer, even though it tends to raise the complexity.
FWIW, FAFO is a very good way to learn. Assuming we can respawn indefinitely and preserve knowledge between respawns, driving fast and taking off your seatbelt would definitely teach you more than just reading a book.
But in this specific case, if the respawn feature is not available or dying isn't a desirable event, FAFO might not be the best way to learn how to drive.
I also think we have the data in for memory safety in C. Even the best people, with the best processes in the world seem to keep writing memory safety bugs. The “just be more vigilant” plan doesn’t seem to work.
> FWIW, FAFO is a very good way to learn. Assuming we can respawn indefinitely and preserve knowledge between respawns, driving fast and taking off your seatbelt would definitely teach you more than just reading a book.
Yes, just sucks for the person who you hit with your car, or the person whose laptop gets owned because of your code.
"FAFO" is not a great method of learning when the cost is externalized.
Disagree.
Even though performing checks on dynamic PLs is much harder than on static ones, PLs are designed to be non-ambiguous. There should be exactly 1 interpretation for any syntactically valid expression. Your example will unambiguously resolve to an error in a standard-conforming Python interpreter.
On the other hand, natural languages are not restricted by ambiguity. That's why something like Poe's law exists. There's simply no way to resolve the ambiguity by just staring at the words themselves, you need additional information to know the author's intent.
In other words, an "English interpreter" cannot exist. Remove the ambiguities, you get "interpreter" and you'll end up with non-ambiguous, Python-COBOL-like languages.
With that said, I agree with your point that blindly accepting 20kloc is certainly not a good idea.
reply