Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are combined. Chomsky Hierarchies are the core of modern Computer science because they map perfectly into automata theory. They are always taught together in computer science.

>E.g. would it be possible to create an algorithm that takes a grammar (and maybe a desired context window size) as input and constructs a transformer network that generates sentences exactly from that grammar?

You don't need transformers for what you describe. That's 101 theory of computation class where you learn about automata, grammars, parsers, and generators.





Yeah, I know the theory of formal grammars and automata that the Chomsky hierarchy is part of. What I meant is that language models and specifically transformer networks are usually entirely separate from that theory, so it would be useful to build a bridge between "modern" language processing using GPTs/LLMs and the classical formal theory.

The most obvious overlap in usage is with programming languages: LLMs can parse and generate code in formal languages, but their processing model is completely different from syntax trees and parsers. So the question is, how do they store the formal structure of a programming language and could this be mapped back in any way to a grammar or automaton?


The way I see it is that attention is graph structured -- this token here is connected to that token there and so forth by the attention lighting up or also in the sense that there are a bunch of places in the document where people are talking about "Noam" or "Chomsky" or "Noam Chomsky" or "The Author" or "him", etc.

Alternately if you were looking it from a semantic web perspective the knowledge expressed in a document is a graph and that graph structure is more fundamental than the tree structure of a text because you could express the same knowledge in different orders. Serialization fundamentally requires putting things in some specific order which might specifically be chronological (work from 2005 to the present as a play with many acts) or could be organized around some conceptual hierarchy (kitsune legends, self psychology, character acting, animal and human behavior and physiology, ...) or the minimization or elimination of backward references (whatever it is that the C spec does a touch wrong but post-Common Lisp specs do right) , etc. Ultimately the graph is pruned away into a tree where the remaining links are denoted by syntactic features in the local scale of the document, you're kinda left filling in the rest of the links with some combination of pragmatics, logical inference, something like SAT solving, etc,

A conventional parsing point of view sees a Java program as a tree but for ordinary purposes it does not matter what order you put the fields and methods in and even though procedural programs are allegedly a sequence of operations done in a certain order it is frequently the case that it does not matter at all if you run line 71 or line 75 first so it is often the case that the graph is the real thing and the trees that we're so comfortable with are the shadows on the walls of the cave.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: