diff options
author | JSDurand <mmemmew@gmail.com> | 2023-01-11 23:47:26 +0800 |
---|---|---|
committer | JSDurand <mmemmew@gmail.com> | 2023-01-11 23:47:26 +0800 |
commit | 1a3d346f413325ed37848a6b2526e8e729269833 (patch) | |
tree | ab8812f8094d096c68aee53cf516e986cc9a273a /forest/src/design.org | |
parent | f27d604d93ce583d4404e1874664e08382ea2f00 (diff) |
Record left-linear expansion and forest format
Now the grammar will record the left-linear expansions when generating
the nondeterministic finite automaton frmo its rules, and will record
whether an edge in the nondeterministic finite automaton comes from a
left-linear expansion. The latter is needed because while performing
a chain-rule derivation, we do not need the left-linear expanded
derivations in the "first layer". This might well have been the root
cause of the bad performance of the previous version of this package.
Also I have figured out how to properly generate and handle parse
forests while manipulating the "chain-rule machine".
Diffstat (limited to 'forest/src/design.org')
-rw-r--r-- | forest/src/design.org | 95 |
1 files changed, 85 insertions, 10 deletions
diff --git a/forest/src/design.org b/forest/src/design.org index 771ca4b..09db113 100644 --- a/forest/src/design.org +++ b/forest/src/design.org @@ -28,7 +28,7 @@ topic sooner or later, why not deal with it now? ;P represent this alternation. Packed nodes are not labelled. They just serve the role of putting nodes together. -* Some thoughts [1/3] +* Some thoughts We do not need to attach forest fragments to nodes of nondeterministic finite automata: we just attach to them lists of grammar slots, which @@ -44,23 +44,98 @@ non-terminals. Some caveats: -- [X] Every node in the forest needs to know its parents. This is - needed for "reductions". That is, after we have parsed the entire +- Every node in the forest needs to know its parents. This is needed + for "reductions". That is, after we have parsed the entire expansion for a non-terminal, we need to go back to where that non-terminal was and continue from there. -- [ ] We need to record the expansions of those "virtual nodes". That - is, the virtual nodes represent atomic languages such as [s]^{(t)} - where s is a non-terminal and t is a terminal. To be more specific, - it represents the derivative of the left-linear closure of the +- We need to record the expansions of those "virtual nodes". That is, + the virtual nodes represent atomic languages such as [s]^{(t)} where s + is a non-terminal and t is a terminal. To be more specific, it + represents the derivative of the left-linear closure of the non-terminal s with respect to the terminal t. The left-linear closure of s is the set of all strings (consisting of terminals and non-terminals alike) that are derivable from s alone, without consuming any inputs, by expanding according to the grammar rules. This means that we need to know if which non-terminals were expanded in order to get to a state in [s]^{(t)}. -- [ ] Each edge in the chain-rule machine needs to be labelled also - with a position in the forest. This perfectly solves the problem of - those "plugs"! +- Each edge in the chain-rule machine needs to be labelled also with a + position in the forest. This perfectly solves the problem of those + "plugs"! +* Formal Definition +We need to figure out the definition of the forest. That it to say, +what will the resulting forest look like for a grammar. +Moreover, we need to know how to correspond each step in the +chain-rule machine with the definition of the forest. + +After these two steps, we can easily proceed with the construction of +the chain-rule machine. + +** Axioms + +There are two *axioms* from which the definition of the forests +follows. + +1. Each node has a unique label, given (principally) by three + components: + 1) input range start + 2) input range end + 3) grammar label +2. If different subtrees share the same node label as the root, a + "clone" of the root should be made, and there should be a + "representative" of all clones, which is treated as a normal node + with that label, and all clones are its children. + +The first axiom ensures that the size of the forest is bounded by the +cube of \(n\), \(n\) is the length of the input sequence. And the +second axiom specifies what to do when subtrees share a parent. + +Since the clones are introduced only when subtrees share a parent, +their number cannot exceed the number of nodes of a forest without +clones. This shows that adding clones does not affect the growth rate +of the size of forests. + +A remark: the /clones/ are called /packed nodes/ in the traditional +terminology, but the terminology of a clone makes more sense to me. + +** Correspondence with chain-rule machine + +Every edge in the chain-rule machine corresponds to an edge in the +forest; denote this correspondence by \(f\). + +The chain-rule operation can be described as follows: + +1. Start with an edge \(e\) with children \(d_1, \ldots, d_n\) in the + chain-rule machine. +2. Prepare a new forest fragment as follows. + 1) For every child \(g\) of \(e\) in the atomic language, if \(g\) + is the terminal that appears as the current input \(t\), let the + new fragment be defined as the node moved by \(g\), alone. + 2) If \(g\) is a non-terminal and its first-set contains \(t\), + then for every left-linearly closed child of \(g\) that is + labelled \(t\), denoted as \(h\), then let the fragment be + defined as the node moved by \(g\), with the pre-recorded + left-linear expansion information from \(g\) to \(h\) appended + as children. +3. For \(i=1,\ldots,n\), consider the edge \(f(d_i)\) in the forest. + There are three cases: + 1) If the next edge is empty, that means we have found something + totally new to add to the forest, so we just add that. + 2) If the next edge after \(f(e)\) contains the new fragment as a + sub-tree already, this means the current branch has already been + explored before, and we have nothing else to do. + 3) If the next edge is non-empty and does not match the new + fragment, this means we have a conflict with a previously found + branch. We solve it simply by following the second axiom above: + we make the node that is the source of \(f(e)\) a clone, and add + the new fragment under a new clone. Note that if the node is + already a clone, we just make a clone under the parent of that + clone. +4. Update the correspondence \(f\) by updating \(f(g)\) and \(f(h)\) + accordingly. + +In this process we always follow the first axiom by ensuring that the +labels of the forest are unique to each node, except for those clones, +which are only created by following the second axiom. |