Datafun is quite cool! Formulog and Datafun seem similar---both combine logic programming and pure functional programming---but they take wildly different approaches.
Datafun is a foundational re-imagining of what a Datalog could look like: start with a higher-order programming language and give it a first-class notion of least fixed points. A type system for tracking monotonicity lets you roll your own Datalog. It's impressive that you can reconstruct semi-naive evaluation (morally: in each 'round' of evaluation, only apply rules matching new results) in their setting (https://dl.acm.org/doi/abs/10.1145/3371090). Datafun is still a ways away from the performance and implementation maturity of existing Datalogs, though.
Formulog's approach is to try to let Datalog be Datalog as much as possible. We end up with restrictions around higher-order functions and other FP features in order to keep things simple on the Datalog side---quite the opposite of Datafun's fancy type system. Our Formulog interpreter does pretty well with internment, parallel execution, and magic sets, but you could easily port our design to existing Datalog compilers and get even bigger speedups. It's not clear how to do that for Datafun... yet.
(I suspect you could port our SMT interface to Datafun without too much of a problem, too.)
Super interesting, and cool technique. Do you have any insight into why CSA outperforms PP so often? I would have assumed the solvers were tuned for PP
I think the solvers _are_ tuned for PP. But we're comparing CSA and PP on the queries that Formulog issues... which don't really match well with the DFS discipline that the PP stack aligns with. I think CSA beats PP in our experiments because CSA is more flexible about locality.
Broadly---and I haven't looked at the memory usage to confirm this---I think CSA trades space (cache more old formulae, not just the prefix of some stack) for time (look, our answers are in the cache!).
It was a great refresher as someone who once liked math but hasn't done much of it in ~20 years :) I had seen the blog posts, but there was some "color" in the videos that helped. For example I didn't realize that the fonts sometimes matter! Honestly, I still don't really read the notation, as I haven't had a strong reason to, but I feel it would be useful at some point.
----
For others, I also recommend this 2017 talk by Guy Steele It's Time for a New Old Language
Because even people in the field seem to have problems with the notation. He also was asked about this work a few days ago here and said he was still working on it in the background (being a "completionist"):
FWIW as you know, Oil is more static than shell, and that was largely motivated by tools and static analysis (and negatively motivated by false positives in ShellCheck https://news.ycombinator.com/item?id=22213155)
I would like to go further in that direction, but getting the basic functionality and performance up to par has taken up essentially 100% of the time so far :-(
My use of Zephyr ASDL was also partly motivated by some vague desire to get the AST into OCaml. However I haven't used OCaml in quite awhile and I get hung up on small things like writing a serializer and deserializer. I don't want to do it for every type/schema, so it requires some kind of reflection. And my understanding is that there are a bunch of packages that do this like sexplib, but I never got further than that.
Formulog sounds very nice, so I wonder if there is some recommended way of bridging the gap? For example imagine you want to load enormous Clang AST or TypeScript ASTs into Formulog. The parsers alone are 10K-30K lines of code, i.e. it's essentially infeasible to reproduce those parsers in another language in a reasonable time. And even just duplicating the schema is a pretty big engineering issue, since there are so many node types! I could generate them from Zephyr ASDL, but other projects can't. I wonder if you have any thoughts on that? i.e. to make the work more accessible on codebases "in the wild"
-----
Also FWIW I mentioned this "microgrammars" work a few days ago because I'm always looking for ways to make things less work in practice :)
Thanks! :) We should be very clear that the bulk of the work is Aaron Bembenek's.
I think Formulog would work great for analyzing the shell---as would any other Datalog, though SMT-based string reasoning will certainly come in handy. I don't think it will help you with parsing issues, though. The general approach to static analysis with Datalog avoids parsing in Datalog itself, relying on an EDB ("extensional database"---think of it as 'ground facts' about the world, which your program generalizes) to tell you things about the program. See, e.g., https://github.com/plast-lab/cclyzer/tree/master/tools/fact-... for an example of a program for generating EDB facts from LLVM. Just like real-world parsers, these are complicated artifacts.
Ah OK thanks for the link. Since it depends on commercial software, I don't see a path to trying it (which is fine, because I probably don't have time anyway :-/ )
So are you saying that it's more conventional to serialize relations from C++ or Python, rather than serialize an AST as I was suggesting?
Your blog post mentions ASTs too, so I'm not quite clear on that point. I don't have much experience writing such analyzers, and I'd be interested if there is any wisdom / examples on serializing ASTs vs. relations, and if the relations are at the "same level" as the AST, or a higher level of abstraction, etc.
-----
FWIW I read a bunch of the papers by Yannis because I'm interested in experiences of using high level languages in production:
I did get hung up on writing simple pure functions in Prolog. There seems to be a debate over whether unification "deserves" its own first-class language, or whether it should be a library in a bigger language, and after that experience, I would lean toward the latter. I didn't really see the light in Prolog. Error messages were a problem -- for the user of the program, and for the developer of the program (me).
So while I haven't looked at Formulog yet, it definitely seems like a good idea to marry some "normal" programming conveniences with Datalog!
I'd say it's conventional to reuse an existing parser to generate facts.
The AST point is a subtle one. Classic Datalog (the thing that characterizes PTIME computation) doesn't have "constructors" like the ADTs (algebraic data types) we use in Formulog to define ASTs. Datalog doesn't even have records, like Soufflé. So instead you'll get facts like:
I'm not sure if that's you mean by serializing relations. But having ASTs in your language is a boon: rather than having dozens of EDB relations to store information about your program, you can just say what it is:
As for your point about Prolog, it's a tricky thing: the interface between tools like compilers and the analyses they run is interesting, but not necessarily interesting enough to publish about. So folks just... don't work on that part, as far as I can tell. But I'm very curious about how to have an efficient EDB, what it looks like to send queries to an engine, and other modes of computation that might relax monotonicity (e.g., making multiple queries to a Datalog solver, where facts might start out true in one "round" of computation and then become false in a later "round"). Query-based compilers (e.g., https://ollef.github.io/blog/posts/query-based-compilers.htm...) could be a good place to connect the dots here, as could language servers.
Dynamically typed languages are automobiles and statically
typed are bicyles?? Not exactly a neutral comparison and
hypocracy in light of his complaints against the strong
vs. weak labels.
Good thing he didn't make it. Please read the article next time. Here's the relevant quote:
[Obsessively tracking type information in dynamic
languages] prevents a programmer from realizing the
benefits of dynamic typing. It's like buying a new car,
but refusing to drive any faster than a bicycle. The car
is horrible; you can't get up the mountain trails, and it
requires gasoline on top of everything else. Indeed, a
car is a pretty lousy excuse for a bicycle! Similarly,
dynamically typed languages are pretty lousy excuses for
statically typed languages.
The point the author is making is that some programmers may have been frustrated by dynamic languages because they didn't allow their programs to fail, i.e., with runtime type errors. I disagree with the author's point---I've never heard of anyone having that problem---but you've woefully misunderstood what the author says.
I don't want to be argumentative - its only a light hearted comment - but i did of course read the article. I just interpreted something differently to yourself.
I read it again, and the analogy is still there. Car is to dynamic typing as bicycle is to ... anyway ... you can play down its significance or ambiguity, but i'd need convincing that its not there at all.
Fair enough. I would play down the significance where you played it up. Considering the equanimitous tone of the whole article, it seems unfair---or, at least, ungenerous---to take him to task for a colorful, one-off analogy.
First, Coq isn't an automatic theorem prover, it's a proof assistant. Adequacy is an issue, but (particularly) for computer-science applications it's sometimes possible to use the Coq artifact as the implementation directly. Adam Chilpala's recent work has made some pretty impressive advances in verifying working code.
I think you misunderstand, though: there's no obsession with absolute truth, at least not in the mathematical world I live in. I'm a PL researcher; we don't pretend to absolute truth, but formality and correctness are our cardinal virtues. (Take Coq as an apropos example.) Our proofs are (a) by induction, (b) long, and (c) can involve subtle and fussy steps. And (d), in an ideal our world, our proofs are also the bedrock of high-assurance systems. Proofs with different qualities may be better candidates for a more intuitive approach. In particular, actual mathematicians are probably more interested in the intuitive side of proof than we are. The article is in fact interested in that more intuitive mathematics; Lipton points out that this intuitive mathematics---whether as formal as 4CT or as informal as the vision conjecture he describes---is full of fun surprises. I'm not sure he's worried about the sort of nitpicking formality that I'm into. :)
Good advice. Transaction costs and, in particular, startup costs are some of the most difficult things I face in my (graduate student) day-to-day. Simply getting up to the whiteboard can be difficult.
I'm definitely going to start using a reference managing website.
I've been using Ad Block Plus to keep myself out of Reddit and Google Reader at work. (Sadly, I think the time has come to add HN, as well.) The method works, and I've been tempted only once or twice to circumvent the blocks.
There's one particular thing about Christianity though.
Well OK, a couple of things, one it's kind of a modern
less complex flavor of Judaism, no offense meant to any
Hebrews reading this.
First, the similarity between the practice of Judaism and Christianity is the product of the last century or so. At the time, the ideas of, e.g., Paul diverged significantly from contemporary Jewish thinking.
Second, if you don't want to offend Jews, don't call them "Hebrews". ;)
Why, not? It's archaic and not strictly a synonym, but why would it be offensive. I understand that those that use it, usually mean offense, but there is no real reason for it.
Thinking of those two things as distinct concepts with clear borders is a mistake. Both religious and ethnic group do not usually have definite boundaries and they are not always seperate from each other. Saying Arabs when you mean Muslim is what you mean is indeed a mistake. Most Muslims are not Arabs, but that proves nothing about how to classify these things.
Hebrews is a different issue. Jewish is actually Judah-ite, from the tribe of Judah. These were (according to the tradition) the surviving tribe of the Israelites or People of Israel. People is actually a bit misleading. In Hebrew it uses the word normally translated as 'Nation'. In modern times, the relationship between Judah & Jewish has been blurred since some groups trace their heritage to other (lost) tribes.
In any case, all of these equally refer to an ethnic group though you could argue that they are not exact synonyms. This is not just about the origin of the words. It is also the religious tradition. Jewish is not a Religion in the sense that Islam or Christianity are. It are not something you chose. It is not something you can stop and it is not something that anybody who is not (ethnically) Jewish is required or encouraged to join. Basically, the "religion" as you would probably call it doesn't distinguish between religion and ethnic group. The members of this ethnic-religious do not usually do so either. That is, many secular, atheist Jews describe themselves as Jewish, and behave in someway that indicates this.
Hebrews & Israelites, People of Israel, are as far a I know, used interchangeably in the old testament. In modern Hebrew all but the most pedantic (and religious) will use Israelites interchangeably with Jews. Secular Jews are unlikely to say Jews. "Hebrews" is not really used in modern Hebrew unless making some sort of biblical reference. In some languages it has a derogatory ring but in some languages describing someone as a Jew/Hebrew/Israelite is derogatory regardless of the word you use.
There are no French programming languages, not even Pascal
or Eiffel uses French words for the syntax, and all the
documentation is in English.
OCaml is implemented by le French, and I've seen code that uses French variable names. I haven't yet seen the camlp4 hack that lets you say laissez instead of let, though.
I was inclined to be skeptical---breaking news, shocking the brain during sleep impairs performance---but they used three different groups of rats:
...we tested the role of SPW-Rs on memory consolidation.
Three groups of rats (test group, n = 7; stimulated
controls, n = 7; unimplanted controls, n = 12) were
trained to find food rewards ... .
During post-training rest and sleep, all of the online-
detected ripples were suppressed by commissural
stimulations in test rats (average online detection rate
was 86.0 ± 1.3% (s.e.m.) of post hoc detected SPW-Rs;
...). Stimulated control rats underwent the same
protocol, except that a random delay (80–120 ms)
was introduced between SPW-R detection and stimulation,
ensuring that the stimulations occurred mainly outside of
the ripple episodes.
Now check this out:
Thus, these control rats received the same number of
stimulations as test rats, but their hippocampal
ripples were left largely intact. The global architecture
of sleep and the local field potential power in distinct
sleep stages were not modified by the suppression of
SPW-Rs ... . As stimulation outside SPW-Rs had
no detectable effect on task performance ..., the two
control groups were pooled and compared with test rats.
Performance of the test rats was significantly impaired
... .
Datafun is a foundational re-imagining of what a Datalog could look like: start with a higher-order programming language and give it a first-class notion of least fixed points. A type system for tracking monotonicity lets you roll your own Datalog. It's impressive that you can reconstruct semi-naive evaluation (morally: in each 'round' of evaluation, only apply rules matching new results) in their setting (https://dl.acm.org/doi/abs/10.1145/3371090). Datafun is still a ways away from the performance and implementation maturity of existing Datalogs, though.
Formulog's approach is to try to let Datalog be Datalog as much as possible. We end up with restrictions around higher-order functions and other FP features in order to keep things simple on the Datalog side---quite the opposite of Datafun's fancy type system. Our Formulog interpreter does pretty well with internment, parallel execution, and magic sets, but you could easily port our design to existing Datalog compilers and get even bigger speedups. It's not clear how to do that for Datafun... yet.
(I suspect you could port our SMT interface to Datafun without too much of a problem, too.)