Out of curiosity, are there any VCSs that operate on AST instead of plaintext lines? (Or is something like this being developed or proven impossible?)
I guess it should be possible to cooperate on shared codebase without need for every contributor to check in and out text files following exactly the same formatting. Or even naming convention. Or even same language, provided all collaborators can transpile to and from some agreed-upon shared AST target.
I know it might seem unhinged at first, but think about it: your (parseable) code is representation of the tree anyways (with some unrelated "whitespace fluff" around). If you follow strict formatting rules that you can express programmatically, you can recostruct that "fluff" from bare AST. If you can store all your violations against your style near the code, you can even sin and break it. If you store data about what you need to see differently from the shared AST - local renames of variables, for example - then you should be able to use your own naming convention, formatting and even source language, without bothering collaborators with tabs/spaces, hungarian notation or the fact that you prefer some different dialect or metalanguage.
Not "unhinged". Most kids these days get their first introduction to computer programming using of of the many "block coding" environments, almost all of which are straightforward recapitulations of Javascript under the hood. And it works, and it avoids the problem of having to teach them how to deal with syntax errors before you teach them imperative logic.
The reason people don't do this is that it's just a bad idea. The fact that all source code is stored in a universally understood data format with pervasive support across decades of tools is a feature and not a bug. How do you grep your AST to see if it's using some old API that needs to be refactored? Surely you'll answer that you use your fancy AST grep tool, which is not grep, and thus works differently for every environment. Basically every environment now has to have its own special editor, grep, diff, merge, etc... Even things like documentation generation and source control rely on files being text. And you're throwing all that out just to be different.
Also, FWIW: it's optimizing the wrong part of the problem anyway. The total cost to an organization that develops and deploys software of any form is overwhelmingly dominated by tasks like debugging and documentation and integration. The time spent actually typing correctly-formatted text into your editor is a vanishingly small fraction of software development, and really that's all this helps.
Considering many languages' very own out-of-the-box tooling (e.g. gofmt, syn) often have glaring gaps[1][2] in the understanding/roundtripping of the language's AST constructs, I would never be able to trust something like this to store and restore my code.
One problem with that though is that smudge and clean are not used in rename detection. Git purposely skips running these filters to detect renames for performance. There are quite a lot of other issues with smudge/clean too though.
I was thinking about trying this out, but there are some reasons why I don't think it's feasible.
Where are your comments stored?
What happens when you need to run out in the middle of a fire and you don't have time to make your code compile-able? How do you commit "un-compile-able" changes?
I think there are some really compelling reasons to try AST-checkin - all your loops can now be changed to functional, dialect changes like you mention, etc. - but there are some pretty significant downsides as well.
This would enable some advanced merge conflict resolution strategies, I suppose. However, it can also be done by building the ASTs on demand and still storing plain text.
It would be cool to integrate Tree Sitter into a VCS. It'd be more flexible if that were an option for a project/folder/file, but also offer a text diff option for readmes/docs or for if someone is using the VCS to write a book or something.
there's some machine from the 70s that does this. iirc it stores all source code in an ast like representation alongside binaries and has some kind of built in version control.
I guess it should be possible to cooperate on shared codebase without need for every contributor to check in and out text files following exactly the same formatting. Or even naming convention. Or even same language, provided all collaborators can transpile to and from some agreed-upon shared AST target.
I know it might seem unhinged at first, but think about it: your (parseable) code is representation of the tree anyways (with some unrelated "whitespace fluff" around). If you follow strict formatting rules that you can express programmatically, you can recostruct that "fluff" from bare AST. If you can store all your violations against your style near the code, you can even sin and break it. If you store data about what you need to see differently from the shared AST - local renames of variables, for example - then you should be able to use your own naming convention, formatting and even source language, without bothering collaborators with tabs/spaces, hungarian notation or the fact that you prefer some different dialect or metalanguage.