diff options
Diffstat (limited to 'chain/src/reducer.rs')
-rw-r--r-- | chain/src/reducer.rs | 191 |
1 files changed, 191 insertions, 0 deletions
diff --git a/chain/src/reducer.rs b/chain/src/reducer.rs new file mode 100644 index 0000000..7ff1858 --- /dev/null +++ b/chain/src/reducer.rs @@ -0,0 +1,191 @@ +//! This module implements a data type for storing reduction +//! information. +//! +//! # Questions +//! +//! What is reduction information, and why is it necessary to store +//! it? +//! +//! # Nullable-closure process +//! +//! In the process of running the chain-rule machine, we will collapse +//! edges, in the sense that, if there is an edge from node a to node +//! b and another from node b to node c, and if the edge from node a +//! to node b is "nullable", that is if the edge corresponds to a +//! position in the atomic language that is deemed nullable by the +//! atomic language, then we also make an edge from node a to node c. +//! +//! The purpose of this process of forming the nullable closure is to +//! ensure that the chain-rule machine can save the time to traverse +//! the entire machine to find the nullable closure later on. But a +//! side-effect of this process is that reductions are not explicitly +//! marked. +//! +//! To explain this in detail, we first investigate what reduction is +//! and what it means in the chain-rule machine, and then explain the +//! problem. +//! +//! # Three types of actions +//! +//! We can imagine a "traditional parser generator" as a stack +//! machine: there are three types of actions associated with it, +//! depending on the current state and the current input token. The +//! first is expansion: it means that we are expanding from a +//! non-terminal, by some rule. The second is a normal push: we just +//! continue parsing according to the current rule. The final one is +//! reduction: it means the current expansion from a non-terminal has +//! terminated and we go back to the previous level of expansion. +//! +//! # Relation to the chain-rule machine +//! +//! For our chain-rule machine, expansion means to create a new node +//! pointing at the current node, forming a path of length increased +//! by one. A normal push means to create a new node that points to +//! the target of an edge going out from the current node, which was +//! not created by the process of forming nullable closures. And the +//! reduction means to create a new node that points to the target of +//! an edge going out from the current node, which *was* created by +//! the process of forming nullable closures. +//! +//! # Problem +//! +//! As can be seen from the previous paragraph, the distinction +//! between a normal push and a reduction in the chain-rule machine is +//! simply whether or not the original edge was created by the process +//! of forming nullable closures. For the chain-rule machine, this +//! does not matter: it can function well. For the formation of the +//! derivation forest, however, this is not so well: we cannot +//! read-off immediately whether or not to perform reductions from the +//! chain-rule machine. +//! +//! # Solution +//! +//! Since we cannot read-off the reduction information directly from +//! the chain-rule machine, we have to store that information somehow. +//! +//! ## Digression: past experiences +//! +//! When developping this part, I did not have a clear picture of the +//! situation in mind: I was experimenting with the ways to construct +//! the derivation forest from the chain-rule machine, as the paper +//! describing the algorithm does not construct the derivation forest +//! directly: it constructs an alternative format. A consequence is +//! that I spent quite some time in figuring out how to construct +//! derivation forests correctly. +//! +//! During the experiments, I tried various ideas: including to +//! "fill-in" the reductions after we have parsed the entire input. +//! This seemed ostensibly a good idea, as I seem to be able to +//! "read-off" the reductions from the resulting partially complete +//! derivation forests. +//! +//! As it turned out, this was actually a bad idea. In fact, I can +//! now prove that this will not work by the presence of a specific +//! grammar that will cause this approach to fail definitely. This +//! led me to believe that the only way is to store the needed +//! reduction information in order to fill in this gap. +//! +//! ## Storing reduction information +//! +//! Now we want to store the reduction information, so we need to be +//! clear about what we are storing. +//! +//! In the derivation forest, a reduction happens when we reach the +//! right-most descendent of a subtree, which marks the end of an +//! expansion from a non-terminal. Moreover, our derivation forests +//! usually contain half-open nodes, which mean that they are not +//! completed yet and can keep growing, until a reduction happens when +//! we "close" the half-open node. Thus, what we really need is the +//! list of nodes to close. +//! +//! This information is readily available when we form nullable +//! closures: the skipped nodes are the nodes we need to close later +//! on. To be more exact, when we skip nodes, we have two nodes: the +//! top node and the bottom node, and we want to store the middle node +//! that is skipped. +//! +//! This naturally led to the structure of a nondeterministic finite +//! automaton: when we want to close nodes, we start from the +//! bottom-most node, and query the nodes upward by the top node, +//! until we reach the top node or until we have no more nodes to go. +//! +//! The special characteristic about this structure is that we need to +//! label both the vertices and the edges. Since we already have a +//! structure that labels edges, we can simply extend that structure +//! by adding an array of labels and possibly a map from labels to +//! nodes, to make sure the labels of vertices are unique and can be +//! queried quickly. +//! +//! One thing to note is that we actually only need the ability to +//! query children by the labels, and do not need to query labels by +//! the target. So we can represent this nondeterministic finite +//! automaton by a plain hashmap. + +#![allow(unused)] + +use std::collections::{hash_set::Iter, HashMap, HashSet}; + +#[derive(Debug, Default, Clone)] +pub(crate) struct Reducer(HashMap<(usize, usize), HashSet<usize>>); + +#[derive(Debug, Default)] +pub(crate) enum ReducerIter<'a> { + #[default] + Empty, + NonEmpty(Iter<'a, usize>), +} + +impl<'a> Iterator for ReducerIter<'a> { + type Item = usize; + + #[inline] + fn next(&mut self) -> Option<Self::Item> { + match self { + ReducerIter::Empty => None, + ReducerIter::NonEmpty(iter) => iter.next().copied(), + } + } + + #[inline] + fn size_hint(&self) -> (usize, Option<usize>) { + match self { + ReducerIter::Empty => (0, Some(0)), + ReducerIter::NonEmpty(iter) => iter.size_hint(), + } + } +} + +impl<'a> ExactSizeIterator for ReducerIter<'a> { + #[inline] + fn len(&self) -> usize { + match self { + ReducerIter::Empty => 0, + ReducerIter::NonEmpty(iter) => iter.len(), + } + } +} + +impl<'a> ReducerIter<'a> { + #[inline] + pub(crate) fn new(iter: Option<Iter<'a, usize>>) -> Self { + match iter { + Some(iter) => Self::NonEmpty(iter), + None => Self::default(), + } + } +} + +impl Reducer { + #[inline] + pub(crate) fn query(&self, bottom: usize, top: usize) -> ReducerIter<'_> { + ReducerIter::new(self.0.get(&(bottom, top)).map(|set| set.iter())) + } + + #[inline] + pub(crate) fn save(&mut self, bottom: usize, top: usize, middle: usize) { + self.0 + .entry((bottom, top)) + .or_insert_with(Default::default) + .insert(middle); + } +} |