Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article says that PEG handles non-deterministic grammars by becoming non-deterministic itself. I don't know about PEG, but if that's true, it sure sounds like a recipe for disaster.


Thanks for reading my article. I don't say that, though the distinction is tricky. PEG is relentlessly deterministic, but that does not mean that the user can determine, in practice, what choice PEG will make.

It's the difference between my being able to tell you that Jimmy Hoffa is definitely in a specific place, and my being able to tell you where that place is.


Are you saying that the user "can not" determine? Who's the user? the source code writer?


The author of the PEG script cannot in general know what language it actually describes -- it's just too hard. For more, see https://jeffreykegler.github.io/Ocean-of-Awareness-blog/indi.... It has references, which you can follow up on.

You're safe if the grammar is LL(1) -- what you sees is what you get. After that it gets mysterious fast.


The language definition is pre-existent to the PEG parser you can write for it. PEG has a lot of advantages as a tool, and its a nice experience to write by hand. You do not author a PEG script, you author a language definition first. You're saying that if the lang definition is lost, then is hard to recover it from the PEG script or the parser code. Yes, but it is not a normal situation.


> The article says that PEG handles non-deterministic grammars by becoming non-deterministic itself.

Wait, where? As noted here, PEGs are certainly deterministic, so I just double-checked the article: it doesn’t claim that PEGs are non-deterministic. In fact, it says that “PEG [is] always linear”, and (way before that), “all non-deterministic parsers can run in worse than quasi-linear time”. So the article indirectly states that PEGs are, in fact, deterministic.

PEG’s problem isn’t non-determinism; it’s PEG’s linear memory requirement in the length of the input (rather than the parse tree depth), as well as the fact that it fails to parse some nondeterministic (but unambiguous) grammars.


> PEG’s linear memory requirement in the length of the input

What kind of PEG are you talking about? At it's core, PEG is syntax sugar over recursive descent, and has the same memory requirements. Memoization, maybe?

That said, I wouldn't use PEG for anything else than LL(1). Add a second level with precedence climbing, and you're set for a pretty big class of languages.


> What kind of PEG are you talking about?

I’m talking about the Packrat algorithm which, as far as I know, is the only practically used PEG algorithm. The memory requirement comes from the fact that the algorithm handles partial match failures without having to recompute intermediate steps. By comparison, a bog-standard recursive descent parser without memoisation has exponential worst-case runtime [1].

[1] https://stackoverflow.com/a/16897246/1968


Last time I checked Packrat had nothing to do with memoization. Packrat parsing is the same as precedence climbing, very similar to the shunting yard algorithm, except it leverages the call stack to simplify the code.

And last time I checked, PEG was syntax sugar above recursive descent, just like parser combinators. I implemented¹ the damn thing, so I think I know what I'm talking about. Somewhat. Memoization is just a way to go beyond LL(1), and handle left recursive grammars, it's not not mandatory.

[1]: http://loup-vaillant.fr/projects/metacompilers/


The original Packrat parser explicitly defines it as using memoisation:

> Packrat parsing provides the simplicity, elegance, and generality of the backtracking model, but eliminates the risk of super-linear parse time, by saving all intermediate parsing results as they are computed and ensuring that no result is evaluated more than once.[1]

And just to get a potential misunderstanding out of the way: I’m not claiming that PEGs are formally distinct from recursive descent: According to my understanding (but unlike you I’ve never implemented the packrat algorithm itself … though I’ve written plenty of recursive descent parsers and some PEG-based parsers) there’s a 1:1 correspondence between nonterminals in PEG, and a function in the recursive descent parser. The difference is that one is a formal grammar for describing parsers, while the other is the actual parser implementation.

[1] https://paperpile.com/app/p/3c6d20f0-ab15-0b13-b809-1fdf4ac7...


PEG uses ordered choice instead of alternation; all PEGs are thus deterministic. The article's description of PEG is so off-base that it makes me question the entire article.


How about not making the grammar non deterministic in the first place? I think that 'attractive new syntax' -comment means that there is no usable ports for Marpa. I guess I could try to help with that .. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: