diff options
Diffstat (limited to 'chain/src/plan.org')
-rw-r--r-- | chain/src/plan.org | 154 |
1 files changed, 135 insertions, 19 deletions
diff --git a/chain/src/plan.org b/chain/src/plan.org index 8c63492..61738a9 100644 --- a/chain/src/plan.org +++ b/chain/src/plan.org @@ -2,29 +2,73 @@ #+AUTHOR: Durand #+DATE: <2022-11-18 Ven 19:57> -* Things to do [2/7] +* Things to do [5/10] - [X] Implement builders for graphs - [X] Find sets of the first terminals for each non-terminal, in the grammar module. -- [-] NFA [4/5] +- [-] Establish a testing framework of various grammars. [5/6] + + [X] One simple grammar + + [X] Notes + + [X] Parentheses + + [X] Pathelogical left-recursive + + [X] Pathelogical right-recursive + + [ ] Pathelogically ambiguous + # More should be added +- [X] NFA [4/4] + [X] Add regular expression into NFA. + [X] Convert a set of regular expressions into a nondeterministic finite automaton, where non-terminals denote the indices of regular expressions in the set to substitute. - + [X] Convert a grammar into its grammar of left-linear closures of - non-temrinals and their derivatives. - + [X] Convert nondeterministic finite automata to their null - closures. - + [ ] Test more grammars to ensure it works correctly. -- [ ] Define the Atom trait. -- [ ] Implement virtual nodes for each derivative of each atom: The - lack of this step might be the root cause of the failure of the - previous version of this package. -- [ ] Implement languages. [0/2] - + [ ] First implement them as doubly labelled (directed acyclic) - graphs. - + [ ] Implement finding derivatives. + + [X] Convert a grammar to the language of atomic languages. [2/2] + * [X] Convert a grammar into its grammar of left-linear closures + of non-temrinals and their derivatives. + * [X] Convert nondeterministic finite automata to their null + closures. + + [X] Test more grammars to ensure it works correctly. [5/5] + * [X] Test the conversion to left-closure as a set of regular + expressions. + * [X] Test the conversion of a set of regular expressions into a + nondeterministic finite automaton. + * [X] Test the closure of null edges, where the nullity of an edge + is defined by a function. In our specific case, an edge is null + if the grammar symbol that edge represents, either a terminal or + a non-terminal, can match an empty string. + * [X] Test the removal of empty transitions from nondeterministic + finite automata. + * [X] Test the removal of dead states, where a state is dead if + and only if no other states have an edge into that state. +- [ ] Refactor [0/3] + + [ ] When we pull in some regular language because of the + left-linear expansion, we need to mark that branch as coming from + left-linear expansions. This is necessary because we cannot + follow a left-linearly expanded branch when we take the derivative + of a language. We only expand left-linearly when we try to access + the atomic languages [s]^{(t)}. We can mark by returning a set of + nodes which are the beginnings of left-linearly expanded branches. + + [ ] An edge of the NFA should carry a label that can be more + informative than solely a terminal or a non-terminal. + + [ ] Add a mechanism for a grammar to attach labels to edges of NFA + which are not necessarily Copy-able, and which should be stored in a + separate array, such that the labels on edges of NFA point to the + elements of the array. +- [X] Atom [3/3] + + [X] Define the Atom trait. + + [X] Implement virtual nodes for each derivative of each atom: The + lack of this step might be the root cause of the failure of the + previous version of this project. + + [X] Test atoms +- [-] Implement languages. [1/3] + + [X] First define a trait with the expected behaviours. + + [ ] Then implement them as doubly labelled graphs. + + [ ] Thenimplement finding derivatives by use of the chain rule. +- [-] Implement forests [0/2] + + [-] Define a trait with the expected behaviours. + + [ ] Implement them using adjacency list: we only need one label + per edge, and we do not wish to exclude duplicate edges, and we do + not need to index edges by the labels. All in all, the labels can + be implemented by a simple hash map that maps edges to labels, so + a special data type is not needed here. - [ ] Implement semiring. [0/5] + [ ] Define the trait. + [ ] Implement the boolean semiring. @@ -32,6 +76,9 @@ + [ ] Implement the free semiring. + [ ] Compute and record a semiring value when computing derivatives. +- [X] Miscellaneous [1/1] + + [X] Implement a function for NFA that checks if a given predicate + is satisfied for each edge label. * Atomic Languages @@ -39,10 +86,6 @@ This describes the behaviours of atomic languages. The atomic language consists of the null closure of any non-terminal symbol in the grammar, and their deriavtives by terminals and non-terminal. -* Script for creating GIF animation - -[[https://gist.github.com/maelvls/5379127][a gist]] - * Derivative Languages This is the main driving device of the algorithm. Basically, the @@ -82,6 +125,61 @@ should be easy enough, since we already have the mechanism of graphs, nondeterministic automata, and semirings. All we need to do is to combine them together. +* Lexers + +I got this idea from [[https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01127.html][a discussion on emacs-devel]]. The idea can be +formulated as + +#+begin_quote +Potentially it allows invocation of a parser with different variants +of lexers - one mode with block tokens for the exploration of +project's structure, and another mode for indentation and error +checking purposes. +#+end_quote + +I think this is the "right" use of lexers. That is to say, the parser +by default should operate on an abstract type of tokens, which are but +unsigned integers, and rely on the user application to provide a lexer +for turning the actual input, like a string, into an iterator of +tokens. The lexer can choose to do any sort of preprocessing that it +sees fit, or it can do nothing and just turn characters to unsigned +integers. In the latter case, this parser generator works on the +character level. But, when one neeeds, one can instruct the parser +generator to work on the level of preprocessed tokens easily. + +This has the advantage of allowing a certain degree of flexibility in +the type of tokens accepted, while keeping the implementation simple +at the same time: the internal core only has to deal with unsigned +integers, after all. + +* Library for drawing graphs + +In the past, I thought that drawing graphs is only a development tool +for my package, so I can rely on such external tools as GraphViz, as +that would not be needed for my end-users. + +But recently I realized that this is not a need only for the +development of the package, but also for the users as well. To be +more specific, the target users are those who want to "play" with +grammars. So if one cannot view the resulting parse forest diagrams +easily and interactively, it would be hard to "play" with grammars. + +Moreover, the internal workings of the parsing machine might also be +interesting for users to inspect, whether to debug the program or to +know more under the hood. As such it is required to view the diagrams +of the internal directed acyclic graphs. + +For all these reasons, I know regard the ability to draw graphs and to +interact with those graphs is essential to the user experience of this +package. As a consequence, I would like to develop a small default +library for this purpose. I follow the convention adapted by this +package here as well, which is to provide a default implementation for +the functionality, while still leaving the room for users to +substitute with other external packages, if so desired, for whatever +reason. For example, an external package may provide an unprecedented +performance for drawing graphs in Emacs. If such a package appears, I +can easily avail of that external package to draw graphs. + * Testing ground I am in a strong need to test things out. The most important one is @@ -92,3 +190,21 @@ understanding of the main algorithm is plain wrong. This is the main reason I started this rewrite of the package. +* To figure out + +I need to figure out how to construct the parse forest from a +left-most null-closed derivation graph. + +Reading through the original paper again, I found that the author +augmented the atomic languages with annotations of partial derivation +steps on each edge of the nondeterministic finite automaton. In +brief, this is what I tried to achieve with some obscure constructs +such as expanding reduction informations carried by the +nondeterministic finite automata, in one of the previous versions. To +sum up, I really need to carry those informations in the first place, +otherwise the parse forests cannot be reliably construced later on. + +But I really do not want to adopt the approach of the original author. +I still prefer the idea of a recorder. That is to say, instead of +constructing the parse forest after the recognition, we shall +construct the parse forest simultaneously while we are recognizing. |