Out of curiosity, are there any VCSs that operate on AST instead of plaintext li...

ajross · on Jan 22, 2024

> I know it might seem unhinged at first

Not "unhinged". Most kids these days get their first introduction to computer programming using of of the many "block coding" environments, almost all of which are straightforward recapitulations of Javascript under the hood. And it works, and it avoids the problem of having to teach them how to deal with syntax errors before you teach them imperative logic.

The reason people don't do this is that it's just a bad idea. The fact that all source code is stored in a universally understood data format with pervasive support across decades of tools is a feature and not a bug. How do you grep your AST to see if it's using some old API that needs to be refactored? Surely you'll answer that you use your fancy AST grep tool, which is not grep, and thus works differently for every environment. Basically every environment now has to have its own special editor, grep, diff, merge, etc... Even things like documentation generation and source control rely on files being text. And you're throwing all that out just to be different.

Also, FWIW: it's optimizing the wrong part of the problem anyway. The total cost to an organization that develops and deploys software of any form is overwhelmingly dominated by tasks like debugging and documentation and integration. The time spent actually typing correctly-formatted text into your editor is a vanishingly small fraction of software development, and really that's all this helps.

iveqy · on Jan 23, 2024

PlasticSCM has semantic merge that does something like that: https://docs.plasticscm.com/semanticmerge/intro-guide/semant...

gsuuon · on Jan 23, 2024

Not (just) a VCS, but this is the idea behind the Unison language: https://www.unison-lang.org/docs/the-big-idea/

ironmagma · on Jan 23, 2024

Considering many languages' very own out-of-the-box tooling (e.g. gofmt, syn) often have glaring gaps[1][2] in the understanding/roundtripping of the language's AST constructs, I would never be able to trust something like this to store and restore my code.

[1] https://github.com/golang/go/issues/20744

[2] https://github.com/dtolnay/syn/issues/782

e12e · on Jan 22, 2024

I believe the smalltalk vcs Monticello work on a semantic level?

https://eng.libretexts.org/Bookshelves/Computer_Science/Prog...

nolist_policy · on Jan 22, 2024

You can do most of this in git via custom diff-driver and smudge/clean filters.

For example git can already convert line-endings on the fly for windows. This is special-cased, but can just as well be implemented via smudge/clean.

Oh and git-lfs is done via smudge/clean too.

ironmagma · on Jan 23, 2024

One problem with that though is that smudge and clean are not used in rename detection. Git purposely skips running these filters to detect renames for performance. There are quite a lot of other issues with smudge/clean too though.

distortedsignal · on Jan 22, 2024

I was thinking about trying this out, but there are some reasons why I don't think it's feasible.

Where are your comments stored?

What happens when you need to run out in the middle of a fire and you don't have time to make your code compile-able? How do you commit "un-compile-able" changes?

I think there are some really compelling reasons to try AST-checkin - all your loops can now be changed to functional, dialect changes like you mention, etc. - but there are some pretty significant downsides as well.

lowbloodsugar · on Jan 22, 2024

Nodes in the AST for comments, block comments and "raw text I don't understand" seems like a way to go?

distortedsignal · on Jan 22, 2024

Honestly yeah. Might have to give this another go.

n42 · on Jan 22, 2024

these are both already solved issues that IDEs deal with with red-green trees

Kharacternyk · on Jan 22, 2024

This would enable some advanced merge conflict resolution strategies, I suppose. However, it can also be done by building the ASTs on demand and still storing plain text.

mcdonje · on Jan 23, 2024

It would be cool to integrate Tree Sitter into a VCS. It'd be more flexible if that were an option for a project/folder/file, but also offer a text diff option for readmes/docs or for if someone is using the VCS to write a book or something.

throwaway69123 · on Jan 23, 2024

It would also alow the file structure to be relevant to source control, users could customize how the methods in a class are organised.

a-dub · on Jan 23, 2024

there's some machine from the 70s that does this. iirc it stores all source code in an ast like representation alongside binaries and has some kind of built in version control.

wish i could remember the name...

a-dub · on Jan 23, 2024

ahh yes, the rational r1000. an ada machine from the 70s that stored programs in a mixed ast/object data format called diana: https://insights.sei.cmu.edu/documents/948/1988_005_001_1565...

psantosl · on Jan 23, 2024

Plastic SCM developed Semantic Merge and diffing about a decade ago