summaryrefslogtreecommitdiff
path: root/chain/src/plan.org
diff options
context:
space:
mode:
Diffstat (limited to 'chain/src/plan.org')
-rw-r--r--chain/src/plan.org154
1 files changed, 135 insertions, 19 deletions
diff --git a/chain/src/plan.org b/chain/src/plan.org
index 8c63492..61738a9 100644
--- a/chain/src/plan.org
+++ b/chain/src/plan.org
@@ -2,29 +2,73 @@
#+AUTHOR: Durand
#+DATE: <2022-11-18 Ven 19:57>
-* Things to do [2/7]
+* Things to do [5/10]
- [X] Implement builders for graphs
- [X] Find sets of the first terminals for each non-terminal, in the
grammar module.
-- [-] NFA [4/5]
+- [-] Establish a testing framework of various grammars. [5/6]
+ + [X] One simple grammar
+ + [X] Notes
+ + [X] Parentheses
+ + [X] Pathelogical left-recursive
+ + [X] Pathelogical right-recursive
+ + [ ] Pathelogically ambiguous
+ # More should be added
+- [X] NFA [4/4]
+ [X] Add regular expression into NFA.
+ [X] Convert a set of regular expressions into a nondeterministic
finite automaton, where non-terminals denote the indices of
regular expressions in the set to substitute.
- + [X] Convert a grammar into its grammar of left-linear closures of
- non-temrinals and their derivatives.
- + [X] Convert nondeterministic finite automata to their null
- closures.
- + [ ] Test more grammars to ensure it works correctly.
-- [ ] Define the Atom trait.
-- [ ] Implement virtual nodes for each derivative of each atom: The
- lack of this step might be the root cause of the failure of the
- previous version of this package.
-- [ ] Implement languages. [0/2]
- + [ ] First implement them as doubly labelled (directed acyclic)
- graphs.
- + [ ] Implement finding derivatives.
+ + [X] Convert a grammar to the language of atomic languages. [2/2]
+ * [X] Convert a grammar into its grammar of left-linear closures
+ of non-temrinals and their derivatives.
+ * [X] Convert nondeterministic finite automata to their null
+ closures.
+ + [X] Test more grammars to ensure it works correctly. [5/5]
+ * [X] Test the conversion to left-closure as a set of regular
+ expressions.
+ * [X] Test the conversion of a set of regular expressions into a
+ nondeterministic finite automaton.
+ * [X] Test the closure of null edges, where the nullity of an edge
+ is defined by a function. In our specific case, an edge is null
+ if the grammar symbol that edge represents, either a terminal or
+ a non-terminal, can match an empty string.
+ * [X] Test the removal of empty transitions from nondeterministic
+ finite automata.
+ * [X] Test the removal of dead states, where a state is dead if
+ and only if no other states have an edge into that state.
+- [ ] Refactor [0/3]
+ + [ ] When we pull in some regular language because of the
+ left-linear expansion, we need to mark that branch as coming from
+ left-linear expansions. This is necessary because we cannot
+ follow a left-linearly expanded branch when we take the derivative
+ of a language. We only expand left-linearly when we try to access
+ the atomic languages [s]^{(t)}. We can mark by returning a set of
+ nodes which are the beginnings of left-linearly expanded branches.
+ + [ ] An edge of the NFA should carry a label that can be more
+ informative than solely a terminal or a non-terminal.
+ + [ ] Add a mechanism for a grammar to attach labels to edges of NFA
+ which are not necessarily Copy-able, and which should be stored in a
+ separate array, such that the labels on edges of NFA point to the
+ elements of the array.
+- [X] Atom [3/3]
+ + [X] Define the Atom trait.
+ + [X] Implement virtual nodes for each derivative of each atom: The
+ lack of this step might be the root cause of the failure of the
+ previous version of this project.
+ + [X] Test atoms
+- [-] Implement languages. [1/3]
+ + [X] First define a trait with the expected behaviours.
+ + [ ] Then implement them as doubly labelled graphs.
+ + [ ] Thenimplement finding derivatives by use of the chain rule.
+- [-] Implement forests [0/2]
+ + [-] Define a trait with the expected behaviours.
+ + [ ] Implement them using adjacency list: we only need one label
+ per edge, and we do not wish to exclude duplicate edges, and we do
+ not need to index edges by the labels. All in all, the labels can
+ be implemented by a simple hash map that maps edges to labels, so
+ a special data type is not needed here.
- [ ] Implement semiring. [0/5]
+ [ ] Define the trait.
+ [ ] Implement the boolean semiring.
@@ -32,6 +76,9 @@
+ [ ] Implement the free semiring.
+ [ ] Compute and record a semiring value when computing
derivatives.
+- [X] Miscellaneous [1/1]
+ + [X] Implement a function for NFA that checks if a given predicate
+ is satisfied for each edge label.
* Atomic Languages
@@ -39,10 +86,6 @@ This describes the behaviours of atomic languages. The atomic
language consists of the null closure of any non-terminal symbol in
the grammar, and their deriavtives by terminals and non-terminal.
-* Script for creating GIF animation
-
-[[https://gist.github.com/maelvls/5379127][a gist]]
-
* Derivative Languages
This is the main driving device of the algorithm. Basically, the
@@ -82,6 +125,61 @@ should be easy enough, since we already have the mechanism of graphs,
nondeterministic automata, and semirings. All we need to do is to
combine them together.
+* Lexers
+
+I got this idea from [[https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01127.html][a discussion on emacs-devel]]. The idea can be
+formulated as
+
+#+begin_quote
+Potentially it allows invocation of a parser with different variants
+of lexers - one mode with block tokens for the exploration of
+project's structure, and another mode for indentation and error
+checking purposes.
+#+end_quote
+
+I think this is the "right" use of lexers. That is to say, the parser
+by default should operate on an abstract type of tokens, which are but
+unsigned integers, and rely on the user application to provide a lexer
+for turning the actual input, like a string, into an iterator of
+tokens. The lexer can choose to do any sort of preprocessing that it
+sees fit, or it can do nothing and just turn characters to unsigned
+integers. In the latter case, this parser generator works on the
+character level. But, when one neeeds, one can instruct the parser
+generator to work on the level of preprocessed tokens easily.
+
+This has the advantage of allowing a certain degree of flexibility in
+the type of tokens accepted, while keeping the implementation simple
+at the same time: the internal core only has to deal with unsigned
+integers, after all.
+
+* Library for drawing graphs
+
+In the past, I thought that drawing graphs is only a development tool
+for my package, so I can rely on such external tools as GraphViz, as
+that would not be needed for my end-users.
+
+But recently I realized that this is not a need only for the
+development of the package, but also for the users as well. To be
+more specific, the target users are those who want to "play" with
+grammars. So if one cannot view the resulting parse forest diagrams
+easily and interactively, it would be hard to "play" with grammars.
+
+Moreover, the internal workings of the parsing machine might also be
+interesting for users to inspect, whether to debug the program or to
+know more under the hood. As such it is required to view the diagrams
+of the internal directed acyclic graphs.
+
+For all these reasons, I know regard the ability to draw graphs and to
+interact with those graphs is essential to the user experience of this
+package. As a consequence, I would like to develop a small default
+library for this purpose. I follow the convention adapted by this
+package here as well, which is to provide a default implementation for
+the functionality, while still leaving the room for users to
+substitute with other external packages, if so desired, for whatever
+reason. For example, an external package may provide an unprecedented
+performance for drawing graphs in Emacs. If such a package appears, I
+can easily avail of that external package to draw graphs.
+
* Testing ground
I am in a strong need to test things out. The most important one is
@@ -92,3 +190,21 @@ understanding of the main algorithm is plain wrong.
This is the main reason I started this rewrite of the package.
+* To figure out
+
+I need to figure out how to construct the parse forest from a
+left-most null-closed derivation graph.
+
+Reading through the original paper again, I found that the author
+augmented the atomic languages with annotations of partial derivation
+steps on each edge of the nondeterministic finite automaton. In
+brief, this is what I tried to achieve with some obscure constructs
+such as expanding reduction informations carried by the
+nondeterministic finite automata, in one of the previous versions. To
+sum up, I really need to carry those informations in the first place,
+otherwise the parse forests cannot be reliably construced later on.
+
+But I really do not want to adopt the approach of the original author.
+I still prefer the idea of a recorder. That is to say, instead of
+constructing the parse forest after the recognition, we shall
+construct the parse forest simultaneously while we are recognizing.