diff options
Diffstat (limited to 'chain/src/item/thoughts-on-reducer.org')
-rw-r--r-- | chain/src/item/thoughts-on-reducer.org | 121 |
1 files changed, 121 insertions, 0 deletions
diff --git a/chain/src/item/thoughts-on-reducer.org b/chain/src/item/thoughts-on-reducer.org new file mode 100644 index 0000000..39a799a --- /dev/null +++ b/chain/src/item/thoughts-on-reducer.org @@ -0,0 +1,121 @@ +#+TITLE: Thoughts on the reducer +#+AUTHOR: Durand +#+DATE: <2023-06-05 Lun 22:08> + +This module implements a data type for storing reduction +information. + +* Questions + +What is reduction information, and why is it necessary to store it? + +* Nullable-closure process + +In the process of running the chain-rule machine, we will collapse +edges, in the sense that, if there is an edge from node a to node b +and another from node b to node c, and if the edge from node a to node +b is "nullable", that is if the edge corresponds to a position in the +atomic language that is deemed nullable by the atomic language, then +we also make an edge from node a to node c. + +The purpose of this process of forming the nullable closure is to +ensure that the chain-rule machine can save the time to traverse the +entire machine to find the nullable closure later on. But a +side-effect of this process is that reductions are not explicitly +marked. + +To explain this in detail, we first investigate what reduction is and +what it means in the chain-rule machine, and then explain the problem. + +* Three types of actions + +We can imagine a "traditional parser generator" as a stack machine: +there are three types of actions associated with it, depending on the +current state and the current input token. The first is expansion: it +means that we are expanding from a non-terminal, by some rule. The +second is a normal push: we just continue parsing according to the +current rule. The final one is reduction: it means the current +expansion from a non-terminal has terminated and we go back to the +previous level of expansion. + +* Relation to the chain-rule machine + +For our chain-rule machine, expansion means to create a new node +pointing at the current node, forming a path of length increased by +one. A normal push means to create a new node that points to the +target of an edge going out from the current node, which was not +created by the process of forming nullable closures. And the +reduction means to create a new node that points to the target of an +edge going out from the current node, which *was* created by the +process of forming nullable closures. + +* Problem + +As can be seen from the previous paragraph, the distinction between a +normal push and a reduction in the chain-rule machine is simply +whether or not the original edge was created by the process of forming +nullable closures. For the chain-rule machine, this does not matter: +it can function well. For the formation of the derivation forest, +however, this is not so well: we cannot read-off immediately whether +or not to perform reductions from the chain-rule machine. + +* Solution + +Since we cannot read-off the reduction information directly from the +chain-rule machine, we have to store that information somehow. + +** Digression: past experiences + +When developping this part, I did not have a clear picture of the +situation in mind: I was experimenting with the ways to construct the +derivation forest from the chain-rule machine, as the paper describing +the algorithm does not construct the derivation forest directly: it +constructs an alternative format. A consequence is that I spent quite +some time in figuring out how to construct derivation forests +correctly. + +During the experiments, I tried various ideas: including to "fill-in" +the reductions after we have parsed the entire input. This seemed +ostensibly a good idea, as I seem to be able to "read-off" the +reductions from the resulting partially complete derivation forests. + +As it turned out, this was actually a bad idea. In fact, I can now +prove that this will not work by the presence of a specific grammar +that will cause this approach to fail definitely. This led me to +believe that the only way is to store the needed reduction information +in order to fill in this gap. + +** Storing reduction information + +Now we want to store the reduction information, so we need to be clear +about what we are storing. + +In the derivation forest, a reduction happens when we reach the +right-most descendent of a subtree, which marks the end of an +expansion from a non-terminal. Moreover, our derivation forests +usually contain half-open nodes, which mean that they are not +completed yet and can keep growing, until a reduction happens when we +"close" the half-open node. Thus, what we really need is the list of +nodes to close. + +This information is readily available when we form nullable closures: +the skipped nodes are the nodes we need to close later on. To be more +exact, when we skip nodes, we have two nodes: the top node and the +bottom node, and we want to store the middle node that is skipped. + +This naturally led to the structure of a nondeterministic finite +automaton: when we want to close nodes, we start from the bottom-most +node, and query the nodes upward by the top node, until we reach the +top node or until we have no more nodes to go. + +The special characteristic about this structure is that we need to +label both the vertices and the edges. Since we already have a +structure that labels edges, we can simply extend that structure by +adding an array of labels and possibly a map from labels to nodes, to +make sure the labels of vertices are unique and can be queried +quickly. + +One thing to note is that we actually only need the ability to query +children by the labels, and do not need to query labels by the target. +We need experiments to see if thiw will be the bottleneck and hence +need to be optimized out. |