More

mgreenbe · on Aug 8, 2020

Datafun is quite cool! Formulog and Datafun seem similar---both combine logic programming and pure functional programming---but they take wildly different approaches.

Datafun is a foundational re-imagining of what a Datalog could look like: start with a higher-order programming language and give it a first-class notion of least fixed points. A type system for tracking monotonicity lets you roll your own Datalog. It's impressive that you can reconstruct semi-naive evaluation (morally: in each 'round' of evaluation, only apply rules matching new results) in their setting (https://dl.acm.org/doi/abs/10.1145/3371090). Datafun is still a ways away from the performance and implementation maturity of existing Datalogs, though.

Formulog's approach is to try to let Datalog be Datalog as much as possible. We end up with restrictions around higher-order functions and other FP features in order to keep things simple on the Datalog side---quite the opposite of Datafun's fancy type system. Our Formulog interpreter does pretty well with internment, parallel execution, and magic sets, but you could easily port our design to existing Datalog compilers and get even bigger speedups. It's not clear how to do that for Datafun... yet.

(I suspect you could port our SMT interface to Datafun without too much of a problem, too.)

mgreenbe · on Aug 8, 2020

We do use incremental solving. check-sat-assuming is generally better than push/pop, though, because Datalog's bottom-up search isn't DFS.

If you're interested, check out our ICLP 2020 extended abstract: https://cs.pomona.edu/~michael/papers/iclp2020_extabs.pdf. We should have more on this in not too long.

fovc · on Aug 9, 2020

Super interesting, and cool technique. Do you have any insight into why CSA outperforms PP so often? I would have assumed the solvers were tuned for PP

mgreenbe · on Aug 9, 2020

I think the solvers _are_ tuned for PP. But we're comparing CSA and PP on the queries that Formulog issues... which don't really match well with the DFS discipline that the PP stack aligns with. I think CSA beats PP in our experiments because CSA is more flexible about locality.

Broadly---and I haven't looked at the memory usage to confirm this---I think CSA trades space (cache more old formulae, not just the prefix of some stack) for time (look, our answers are in the cache!).

mgreenbe · on Aug 8, 2020

See also:

http://siek.blogspot.com/2012/07/crash-course-on-notation-in...

chubot · on Aug 8, 2020

(Nice to see your work shared here :) )

I went through the corresponding videos last year:

https://www.youtube.com/watch?v=vU3caZPtT2I

https://www.youtube.com/watch?v=MhuK_aepu1Y

It was a great refresher as someone who once liked math but hasn't done much of it in ~20 years :) I had seen the blog posts, but there was some "color" in the videos that helped. For example I didn't realize that the fonts sometimes matter! Honestly, I still don't really read the notation, as I haven't had a strong reason to, but I feel it would be useful at some point.

----

For others, I also recommend this 2017 talk by Guy Steele It's Time for a New Old Language

https://www.youtube.com/watch?v=7HKbjYqqPPQ

Because even people in the field seem to have problems with the notation. He also was asked about this work a few days ago here and said he was still working on it in the background (being a "completionist"):

https://www.youtube.com/watch?v=c_ZJECVlpog

-----

FWIW as you know, Oil is more static than shell, and that was largely motivated by tools and static analysis (and negatively motivated by false positives in ShellCheck https://news.ycombinator.com/item?id=22213155)

I would like to go further in that direction, but getting the basic functionality and performance up to par has taken up essentially 100% of the time so far :-(

My use of Zephyr ASDL was also partly motivated by some vague desire to get the AST into OCaml. However I haven't used OCaml in quite awhile and I get hung up on small things like writing a serializer and deserializer. I don't want to do it for every type/schema, so it requires some kind of reflection. And my understanding is that there are a bunch of packages that do this like sexplib, but I never got further than that.

Formulog sounds very nice, so I wonder if there is some recommended way of bridging the gap? For example imagine you want to load enormous Clang AST or TypeScript ASTs into Formulog. The parsers alone are 10K-30K lines of code, i.e. it's essentially infeasible to reproduce those parsers in another language in a reasonable time. And even just duplicating the schema is a pretty big engineering issue, since there are so many node types! I could generate them from Zephyr ASDL, but other projects can't. I wonder if you have any thoughts on that? i.e. to make the work more accessible on codebases "in the wild"

-----

Also FWIW I mentioned this "microgrammars" work a few days ago because I'm always looking for ways to make things less work in practice :)

https://news.ycombinator.com/item?id=23978432

Doing anything with languages seems to be very "long winded" so I'm glad to see work in that direction!

mgreenbe · on Aug 8, 2020

Thanks! :) We should be very clear that the bulk of the work is Aaron Bembenek's.

I think Formulog would work great for analyzing the shell---as would any other Datalog, though SMT-based string reasoning will certainly come in handy. I don't think it will help you with parsing issues, though. The general approach to static analysis with Datalog avoids parsing in Datalog itself, relying on an EDB ("extensional database"---think of it as 'ground facts' about the world, which your program generalizes) to tell you things about the program. See, e.g., https://github.com/plast-lab/cclyzer/tree/master/tools/fact-... for an example of a program for generating EDB facts from LLVM. Just like real-world parsers, these are complicated artifacts.

chubot · on Aug 9, 2020

Ah OK thanks for the link. Since it depends on commercial software, I don't see a path to trying it (which is fine, because I probably don't have time anyway :-/ )

So are you saying that it's more conventional to serialize relations from C++ or Python, rather than serialize an AST as I was suggesting?

Your blog post mentions ASTs too, so I'm not quite clear on that point. I don't have much experience writing such analyzers, and I'd be interested if there is any wisdom / examples on serializing ASTs vs. relations, and if the relations are at the "same level" as the AST, or a higher level of abstraction, etc.

-----

FWIW I read a bunch of the papers by Yannis because I'm interested in experiences of using high level languages in production:

https://news.ycombinator.com/item?id=21666658

After all the reason I like shell and R is that they're both higher level than Python or JS :)

I like short code, and Datalog seems interesting in that regard. I also hacked on this Python type inferencer in Prolog awhile ago:

https://github.com/andychu/hatlog

I did get hung up on writing simple pure functions in Prolog. There seems to be a debate over whether unification "deserves" its own first-class language, or whether it should be a library in a bigger language, and after that experience, I would lean toward the latter. I didn't really see the light in Prolog. Error messages were a problem -- for the user of the program, and for the developer of the program (me).

So while I haven't looked at Formulog yet, it definitely seems like a good idea to marry some "normal" programming conveniences with Datalog!

mgreenbe · on Aug 10, 2020

I'd say it's conventional to reuse an existing parser to generate facts.

The AST point is a subtle one. Classic Datalog (the thing that characterizes PTIME computation) doesn't have "constructors" like the ADTs (algebraic data types) we use in Formulog to define ASTs. Datalog doesn't even have records, like Soufflé. So instead you'll get facts like:

``` next_insn(i0, i1). insn_type(i0, alloc). alloc_size(i0, 8). insn_type(i1, move). insn_dest(i1, i0). insn_value(i1, 10). ```

to characterize instructions like:

``` %0 = alloc(8) %0 = 10 ```

I'm not sure if that's you mean by serializing relations. But having ASTs in your language is a boon: rather than having dozens of EDB relations to store information about your program, you can just say what it is:

``` next_insn(i0, i1). insn_is(i0, alloc(8)). insn_is(i1, move(i0, 10)). ```

As for your point about Prolog, it's a tricky thing: the interface between tools like compilers and the analyses they run is interesting, but not necessarily interesting enough to publish about. So folks just... don't work on that part, as far as I can tell. But I'm very curious about how to have an efficient EDB, what it looks like to send queries to an engine, and other modes of computation that might relax monotonicity (e.g., making multiple queries to a Datalog solver, where facts might start out true in one "round" of computation and then become false in a later "round"). Query-based compilers (e.g., https://ollef.github.io/blog/posts/query-based-compilers.htm...) could be a good place to connect the dots here, as could language servers.

mgreenbe · on Sept 29, 2009

  Dynamically typed languages are automobiles and statically 
  typed are bicyles?? Not exactly a neutral comparison and 
  hypocracy in light of his complaints against the strong 
  vs. weak labels.

Good thing he didn't make it. Please read the article next time. Here's the relevant quote:

  [Obsessively tracking type information in dynamic 
  languages] prevents a programmer from realizing the 
  benefits of dynamic typing. It's like buying a new car, 
  but refusing to drive any faster than a bicycle. The car 
  is horrible; you can't get up the mountain trails, and it 
  requires gasoline on top of everything else. Indeed, a 
  car is a pretty lousy excuse for a bicycle! Similarly, 
  dynamically typed languages are pretty lousy excuses for 
  statically typed languages.

The point the author is making is that some programmers may have been frustrated by dynamic languages because they didn't allow their programs to fail, i.e., with runtime type errors. I disagree with the author's point---I've never heard of anyone having that problem---but you've woefully misunderstood what the author says.

lsd5you · on Sept 29, 2009

I don't want to be argumentative - its only a light hearted comment - but i did of course read the article. I just interpreted something differently to yourself.

I read it again, and the analogy is still there. Car is to dynamic typing as bicycle is to ... anyway ... you can play down its significance or ambiguity, but i'd need convincing that its not there at all.

mgreenbe · on Sept 29, 2009

Fair enough. I would play down the significance where you played it up. Considering the equanimitous tone of the whole article, it seems unfair---or, at least, ungenerous---to take him to task for a colorful, one-off analogy.

mgreenbe · on Sept 28, 2009

First, Coq isn't an automatic theorem prover, it's a proof assistant. Adequacy is an issue, but (particularly) for computer-science applications it's sometimes possible to use the Coq artifact as the implementation directly. Adam Chilpala's recent work has made some pretty impressive advances in verifying working code.

I think you misunderstand, though: there's no obsession with absolute truth, at least not in the mathematical world I live in. I'm a PL researcher; we don't pretend to absolute truth, but formality and correctness are our cardinal virtues. (Take Coq as an apropos example.) Our proofs are (a) by induction, (b) long, and (c) can involve subtle and fussy steps. And (d), in an ideal our world, our proofs are also the bedrock of high-assurance systems. Proofs with different qualities may be better candidates for a more intuitive approach. In particular, actual mathematicians are probably more interested in the intuitive side of proof than we are. The article is in fact interested in that more intuitive mathematics; Lipton points out that this intuitive mathematics---whether as formal as 4CT or as informal as the vision conjecture he describes---is full of fun surprises. I'm not sure he's worried about the sort of nitpicking formality that I'm into. :)

mgreenbe · on Sept 25, 2009

A neat trick, but---as one of the commenters points out---manual dictionary passing is a bit much. I would love Haskell, but for laziness...

mgreenbe · on Sept 22, 2009

Good advice. Transaction costs and, in particular, startup costs are some of the most difficult things I face in my (graduate student) day-to-day. Simply getting up to the whiteboard can be difficult.

I'm definitely going to start using a reference managing website.

I've been using Ad Block Plus to keep myself out of Reddit and Google Reader at work. (Sadly, I think the time has come to add HN, as well.) The method works, and I've been tempted only once or twice to circumvent the blocks.

mgreenbe · on Sept 21, 2009

  There's one particular thing about Christianity though. 
  Well OK, a couple of things, one it's kind of a modern 
  less complex flavor of Judaism, no offense meant to any 
  Hebrews reading this.

First, the similarity between the practice of Judaism and Christianity is the product of the last century or so. At the time, the ideas of, e.g., Paul diverged significantly from contemporary Jewish thinking.

Second, if you don't want to offend Jews, don't call them "Hebrews". ;)

netsp · on Sept 21, 2009

Why, not? It's archaic and not strictly a synonym, but why would it be offensive. I understand that those that use it, usually mean offense, but there is no real reason for it.

In lots of languages it is not derogatory.

maxwell · on Sept 21, 2009

Equating a religious group with an ethnic one is mistaken (like saying 'Arabs when 'Muslims is meant), regardless of whether anyone is offended.

netsp · on Sept 21, 2009

Thinking of those two things as distinct concepts with clear borders is a mistake. Both religious and ethnic group do not usually have definite boundaries and they are not always seperate from each other. Saying Arabs when you mean Muslim is what you mean is indeed a mistake. Most Muslims are not Arabs, but that proves nothing about how to classify these things.

Hebrews is a different issue. Jewish is actually Judah-ite, from the tribe of Judah. These were (according to the tradition) the surviving tribe of the Israelites or People of Israel. People is actually a bit misleading. In Hebrew it uses the word normally translated as 'Nation'. In modern times, the relationship between Judah & Jewish has been blurred since some groups trace their heritage to other (lost) tribes.

In any case, all of these equally refer to an ethnic group though you could argue that they are not exact synonyms. This is not just about the origin of the words. It is also the religious tradition. Jewish is not a Religion in the sense that Islam or Christianity are. It are not something you chose. It is not something you can stop and it is not something that anybody who is not (ethnically) Jewish is required or encouraged to join. Basically, the "religion" as you would probably call it doesn't distinguish between religion and ethnic group. The members of this ethnic-religious do not usually do so either. That is, many secular, atheist Jews describe themselves as Jewish, and behave in someway that indicates this.

Hebrews & Israelites, People of Israel, are as far a I know, used interchangeably in the old testament. In modern Hebrew all but the most pedantic (and religious) will use Israelites interchangeably with Jews. Secular Jews are unlikely to say Jews. "Hebrews" is not really used in modern Hebrew unless making some sort of biblical reference. In some languages it has a derogatory ring but in some languages describing someone as a Jew/Hebrew/Israelite is derogatory regardless of the word you use.

mgreenbe · on Sept 21, 2009

You forgot to add the argument that it's (strictly speaking) more accurate, since Yehuda is just one of the tribes.

But s/Hebrew/Colored/. Same arguments apply, but---here we are.

mgreenbe · on Sept 20, 2009

  There are no French programming languages, not even Pascal 
  or Eiffel uses French words for the syntax, and all the 
  documentation is in English.

OCaml is implemented by le French, and I've seen code that uses French variable names. I haven't yet seen the camlp4 hack that lets you say laissez instead of let, though.

mgreenbe · on Sept 17, 2009

Original article: http://www.nature.com/neuro/journal/vaop/ncurrent/abs/nn.238...

I was inclined to be skeptical---breaking news, shocking the brain during sleep impairs performance---but they used three different groups of rats:

  ...we tested the role of SPW-Rs on memory consolidation.
  Three groups of rats (test group, n = 7; stimulated 
  controls, n = 7; unimplanted controls, n = 12) were 
  trained to find food rewards ... . 

  During post-training rest and sleep, all of the online-
  detected ripples were suppressed by commissural 
  stimulations in test rats (average online detection rate 
  was 86.0 ± 1.3% (s.e.m.) of post hoc detected SPW-Rs;
  ...). Stimulated control rats underwent the same 
  protocol, except that a random delay (80–120 ms)
  was introduced between SPW-R detection and stimulation, 
  ensuring that the stimulations occurred mainly outside of 
  the ripple episodes.

Now check this out:

  Thus, these control rats received the same number of 
  stimulations as test rats, but their hippocampal
  ripples were left largely intact. The global architecture 
  of sleep and the local field potential power in distinct 
  sleep stages were not modified by the suppression of 
  SPW-Rs ... .  As stimulation outside SPW-Rs had
  no detectable effect on task performance ..., the two 
  control groups were pooled and compared with test rats. 
  Performance of the test rats was significantly impaired 
  ... .

Cool experiment design with an intriguing result.

codeodor · on Sept 17, 2009

Thanks for that. I was going to include the link to that, but when I clicked for the full text it asked me to subscribe.

Now that I've gone back, I was able to grab the PDF.