summaryrefslogtreecommitdiff
path: root/chain/src/item/thoughts-on-reducer.org
diff options
context:
space:
mode:
Diffstat (limited to 'chain/src/item/thoughts-on-reducer.org')
-rw-r--r--chain/src/item/thoughts-on-reducer.org121
1 files changed, 121 insertions, 0 deletions
diff --git a/chain/src/item/thoughts-on-reducer.org b/chain/src/item/thoughts-on-reducer.org
new file mode 100644
index 0000000..39a799a
--- /dev/null
+++ b/chain/src/item/thoughts-on-reducer.org
@@ -0,0 +1,121 @@
+#+TITLE: Thoughts on the reducer
+#+AUTHOR: Durand
+#+DATE: <2023-06-05 Lun 22:08>
+
+This module implements a data type for storing reduction
+information.
+
+* Questions
+
+What is reduction information, and why is it necessary to store it?
+
+* Nullable-closure process
+
+In the process of running the chain-rule machine, we will collapse
+edges, in the sense that, if there is an edge from node a to node b
+and another from node b to node c, and if the edge from node a to node
+b is "nullable", that is if the edge corresponds to a position in the
+atomic language that is deemed nullable by the atomic language, then
+we also make an edge from node a to node c.
+
+The purpose of this process of forming the nullable closure is to
+ensure that the chain-rule machine can save the time to traverse the
+entire machine to find the nullable closure later on. But a
+side-effect of this process is that reductions are not explicitly
+marked.
+
+To explain this in detail, we first investigate what reduction is and
+what it means in the chain-rule machine, and then explain the problem.
+
+* Three types of actions
+
+We can imagine a "traditional parser generator" as a stack machine:
+there are three types of actions associated with it, depending on the
+current state and the current input token. The first is expansion: it
+means that we are expanding from a non-terminal, by some rule. The
+second is a normal push: we just continue parsing according to the
+current rule. The final one is reduction: it means the current
+expansion from a non-terminal has terminated and we go back to the
+previous level of expansion.
+
+* Relation to the chain-rule machine
+
+For our chain-rule machine, expansion means to create a new node
+pointing at the current node, forming a path of length increased by
+one. A normal push means to create a new node that points to the
+target of an edge going out from the current node, which was not
+created by the process of forming nullable closures. And the
+reduction means to create a new node that points to the target of an
+edge going out from the current node, which *was* created by the
+process of forming nullable closures.
+
+* Problem
+
+As can be seen from the previous paragraph, the distinction between a
+normal push and a reduction in the chain-rule machine is simply
+whether or not the original edge was created by the process of forming
+nullable closures. For the chain-rule machine, this does not matter:
+it can function well. For the formation of the derivation forest,
+however, this is not so well: we cannot read-off immediately whether
+or not to perform reductions from the chain-rule machine.
+
+* Solution
+
+Since we cannot read-off the reduction information directly from the
+chain-rule machine, we have to store that information somehow.
+
+** Digression: past experiences
+
+When developping this part, I did not have a clear picture of the
+situation in mind: I was experimenting with the ways to construct the
+derivation forest from the chain-rule machine, as the paper describing
+the algorithm does not construct the derivation forest directly: it
+constructs an alternative format. A consequence is that I spent quite
+some time in figuring out how to construct derivation forests
+correctly.
+
+During the experiments, I tried various ideas: including to "fill-in"
+the reductions after we have parsed the entire input. This seemed
+ostensibly a good idea, as I seem to be able to "read-off" the
+reductions from the resulting partially complete derivation forests.
+
+As it turned out, this was actually a bad idea. In fact, I can now
+prove that this will not work by the presence of a specific grammar
+that will cause this approach to fail definitely. This led me to
+believe that the only way is to store the needed reduction information
+in order to fill in this gap.
+
+** Storing reduction information
+
+Now we want to store the reduction information, so we need to be clear
+about what we are storing.
+
+In the derivation forest, a reduction happens when we reach the
+right-most descendent of a subtree, which marks the end of an
+expansion from a non-terminal. Moreover, our derivation forests
+usually contain half-open nodes, which mean that they are not
+completed yet and can keep growing, until a reduction happens when we
+"close" the half-open node. Thus, what we really need is the list of
+nodes to close.
+
+This information is readily available when we form nullable closures:
+the skipped nodes are the nodes we need to close later on. To be more
+exact, when we skip nodes, we have two nodes: the top node and the
+bottom node, and we want to store the middle node that is skipped.
+
+This naturally led to the structure of a nondeterministic finite
+automaton: when we want to close nodes, we start from the bottom-most
+node, and query the nodes upward by the top node, until we reach the
+top node or until we have no more nodes to go.
+
+The special characteristic about this structure is that we need to
+label both the vertices and the edges. Since we already have a
+structure that labels edges, we can simply extend that structure by
+adding an array of labels and possibly a map from labels to nodes, to
+make sure the labels of vertices are unique and can be queried
+quickly.
+
+One thing to note is that we actually only need the ability to query
+children by the labels, and do not need to query labels by the target.
+We need experiments to see if thiw will be the bottleneck and hence
+need to be optimized out.