Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Out of curiosity, are there any VCSs that operate on AST instead of plaintext lines? (Or is something like this being developed or proven impossible?)

I guess it should be possible to cooperate on shared codebase without need for every contributor to check in and out text files following exactly the same formatting. Or even naming convention. Or even same language, provided all collaborators can transpile to and from some agreed-upon shared AST target.

I know it might seem unhinged at first, but think about it: your (parseable) code is representation of the tree anyways (with some unrelated "whitespace fluff" around). If you follow strict formatting rules that you can express programmatically, you can recostruct that "fluff" from bare AST. If you can store all your violations against your style near the code, you can even sin and break it. If you store data about what you need to see differently from the shared AST - local renames of variables, for example - then you should be able to use your own naming convention, formatting and even source language, without bothering collaborators with tabs/spaces, hungarian notation or the fact that you prefer some different dialect or metalanguage.



> I know it might seem unhinged at first

Not "unhinged". Most kids these days get their first introduction to computer programming using of of the many "block coding" environments, almost all of which are straightforward recapitulations of Javascript under the hood. And it works, and it avoids the problem of having to teach them how to deal with syntax errors before you teach them imperative logic.

The reason people don't do this is that it's just a bad idea. The fact that all source code is stored in a universally understood data format with pervasive support across decades of tools is a feature and not a bug. How do you grep your AST to see if it's using some old API that needs to be refactored? Surely you'll answer that you use your fancy AST grep tool, which is not grep, and thus works differently for every environment. Basically every environment now has to have its own special editor, grep, diff, merge, etc... Even things like documentation generation and source control rely on files being text. And you're throwing all that out just to be different.

Also, FWIW: it's optimizing the wrong part of the problem anyway. The total cost to an organization that develops and deploys software of any form is overwhelmingly dominated by tasks like debugging and documentation and integration. The time spent actually typing correctly-formatted text into your editor is a vanishingly small fraction of software development, and really that's all this helps.


PlasticSCM has semantic merge that does something like that: https://docs.plasticscm.com/semanticmerge/intro-guide/semant...


Not (just) a VCS, but this is the idea behind the Unison language: https://www.unison-lang.org/docs/the-big-idea/


Considering many languages' very own out-of-the-box tooling (e.g. gofmt, syn) often have glaring gaps[1][2] in the understanding/roundtripping of the language's AST constructs, I would never be able to trust something like this to store and restore my code.

[1] https://github.com/golang/go/issues/20744

[2] https://github.com/dtolnay/syn/issues/782


I believe the smalltalk vcs Monticello work on a semantic level?

https://eng.libretexts.org/Bookshelves/Computer_Science/Prog...


You can do most of this in git via custom diff-driver and smudge/clean filters.

For example git can already convert line-endings on the fly for windows. This is special-cased, but can just as well be implemented via smudge/clean.

Oh and git-lfs is done via smudge/clean too.


One problem with that though is that smudge and clean are not used in rename detection. Git purposely skips running these filters to detect renames for performance. There are quite a lot of other issues with smudge/clean too though.


I was thinking about trying this out, but there are some reasons why I don't think it's feasible.

Where are your comments stored?

What happens when you need to run out in the middle of a fire and you don't have time to make your code compile-able? How do you commit "un-compile-able" changes?

I think there are some really compelling reasons to try AST-checkin - all your loops can now be changed to functional, dialect changes like you mention, etc. - but there are some pretty significant downsides as well.


Nodes in the AST for comments, block comments and "raw text I don't understand" seems like a way to go?


Honestly yeah. Might have to give this another go.


these are both already solved issues that IDEs deal with with red-green trees


This would enable some advanced merge conflict resolution strategies, I suppose. However, it can also be done by building the ASTs on demand and still storing plain text.


It would be cool to integrate Tree Sitter into a VCS. It'd be more flexible if that were an option for a project/folder/file, but also offer a text diff option for readmes/docs or for if someone is using the VCS to write a book or something.


It would also alow the file structure to be relevant to source control, users could customize how the methods in a class are organised.


there's some machine from the 70s that does this. iirc it stores all source code in an ast like representation alongside binaries and has some kind of built in version control.

wish i could remember the name...


ahh yes, the rational r1000. an ada machine from the 70s that stored programs in a mixed ast/object data format called diana: https://insights.sei.cmu.edu/documents/948/1988_005_001_1565...


Plastic SCM developed Semantic Merge and diffing about a decade ago




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: