Record left-linear expansion and forest format

Now the grammar will record the left-linear expansions when generating the nondeterministic finite automaton frmo its rules, and will record whether an edge in the nondeterministic finite automaton comes from a left-linear expansion. The latter is needed because while performing a chain-rule derivation, we do not need the left-linear expanded derivations in the "first layer". This might well have been the root cause of the bad performance of the previous version of this package. Also I have figured out how to properly generate and handle parse forests while manipulating the "chain-rule machine".
author: JSDurand <mmemmew@gmail.com> 2023-01-11 23:47:26 +0800
committer: JSDurand <mmemmew@gmail.com> 2023-01-11 23:47:26 +0800
commit: 1a3d346f413325ed37848a6b2526e8e729269833 (patch)
tree: ab8812f8094d096c68aee53cf516e986cc9a273a /forest/src/design.org
parent: f27d604d93ce583d4404e1874664e08382ea2f00 (diff)
1 files changed, 85 insertions, 10 deletions
diff --git a/forest/src/design.org b/forest/src/design.org
index 771ca4b..09db113 100644
--- a/forest/src/design.org
+++ b/forest/src/design.org
@@ -28,7 +28,7 @@ topic sooner or later, why not deal with it now?  ;P
   represent this alternation.  Packed nodes are not labelled.  They
   just serve the role of putting nodes together.
 
-* Some thoughts [1/3]
+* Some thoughts
 
 We do not need to attach forest fragments to nodes of nondeterministic
 finite automata: we just attach to them lists of grammar slots, which
@@ -44,23 +44,98 @@ non-terminals.
 
 Some caveats:
 
-- [X] Every node in the forest needs to know its parents.  This is
-  needed for "reductions".  That is, after we have parsed the entire
+- Every node in the forest needs to know its parents.  This is needed
+  for "reductions".  That is, after we have parsed the entire
   expansion for a non-terminal, we need to go back to where that
   non-terminal was and continue from there.
-- [ ] We need to record the expansions of those "virtual nodes".  That
-  is, the virtual nodes represent atomic languages such as [s]^{(t)}
-  where s is a non-terminal and t is a terminal.  To be more specific,
-  it represents the derivative of the left-linear closure of the
+- We need to record the expansions of those "virtual nodes".  That is,
+  the virtual nodes represent atomic languages such as [s]^{(t)} where s
+  is a non-terminal and t is a terminal.  To be more specific, it
+  represents the derivative of the left-linear closure of the
   non-terminal s with respect to the terminal t.  The left-linear
   closure of s is the set of all strings (consisting of terminals and
   non-terminals alike) that are derivable from s alone, without
   consuming any inputs, by expanding according to the grammar rules.
   This means that we need to know if which non-terminals were expanded
   in order to get to a state in [s]^{(t)}.
-- [ ] Each edge in the chain-rule machine needs to be labelled also
-  with a position in the forest.  This perfectly solves the problem of
-  those "plugs"!
+- Each edge in the chain-rule machine needs to be labelled also with a
+  position in the forest.  This perfectly solves the problem of those
+  "plugs"!
 
+* Formal Definition
 
+We need to figure out the definition of the forest.  That it to say,
+what will the resulting forest look like for a grammar.
 
+Moreover, we need to know how to correspond each step in the
+chain-rule machine with the definition of the forest.
+
+After these two steps, we can easily proceed with the construction of
+the chain-rule machine.
+
+** Axioms
+
+There are two *axioms* from which the definition of the forests
+follows.
+
+1. Each node has a unique label, given (principally) by three
+   components:
+   1) input range start
+   2) input range end
+   3) grammar label
+2. If different subtrees share the same node label as the root, a
+   "clone" of the root should be made, and there should be a
+   "representative" of all clones, which is treated as a normal node
+   with that label, and all clones are its children.
+
+The first axiom ensures that the size of the forest is bounded by the
+cube of \(n\), \(n\) is the length of the input sequence.  And the
+second axiom specifies what to do when subtrees share a parent.
+
+Since the clones are introduced only when subtrees share a parent,
+their number cannot exceed the number of nodes of a forest without
+clones.  This shows that adding clones does not affect the growth rate
+of the size of forests.
+
+A remark: the /clones/ are called /packed nodes/ in the traditional
+terminology, but the terminology of a clone makes more sense to me.
+
+** Correspondence with chain-rule machine
+
+Every edge in the chain-rule machine corresponds to an edge in the
+forest; denote this correspondence by \(f\).
+
+The chain-rule operation can be described as follows:
+
+1. Start with an edge \(e\) with children \(d_1, \ldots, d_n\) in the
+   chain-rule machine.
+2. Prepare a new forest fragment as follows.
+   1) For every child \(g\) of \(e\) in the atomic language, if \(g\)
+      is the terminal that appears as the current input \(t\), let the
+      new fragment be defined as the node moved by \(g\), alone.
+   2) If \(g\) is a non-terminal and its first-set contains \(t\),
+      then for every left-linearly closed child of \(g\) that is
+      labelled \(t\), denoted as \(h\), then let the fragment be
+      defined as the node moved by \(g\), with the pre-recorded
+      left-linear expansion information from \(g\) to \(h\) appended
+      as children.
+3. For \(i=1,\ldots,n\), consider the edge \(f(d_i)\) in the forest.
+   There are three cases:
+   1) If the next edge is empty, that means we have found something
+      totally new to add to the forest, so we just add that.
+   2) If the next edge after \(f(e)\) contains the new fragment as a
+      sub-tree already, this means the current branch has already been
+      explored before, and we have nothing else to do.
+   3) If the next edge is non-empty and does not match the new
+      fragment, this means we have a conflict with a previously found
+      branch.  We solve it simply by following the second axiom above:
+      we make the node that is the source of \(f(e)\) a clone, and add
+      the new fragment under a new clone.  Note that if the node is
+      already a clone, we just make a clone under the parent of that
+      clone.
+4. Update the correspondence \(f\) by updating \(f(g)\) and \(f(h)\)
+   accordingly.
+
+In this process we always follow the first axiom by ensuring that the
+labels of the forest are unique to each node, except for those clones,
+which are only created by following the second axiom.
author	JSDurand <mmemmew@gmail.com>	2023-01-11 23:47:26 +0800
committer	JSDurand <mmemmew@gmail.com>	2023-01-11 23:47:26 +0800
commit	1a3d346f413325ed37848a6b2526e8e729269833 (patch)
tree	ab8812f8094d096c68aee53cf516e986cc9a273a /forest/src/design.org
parent	f27d604d93ce583d4404e1874664e08382ea2f00 (diff)