//! This module implements a data type for storing reduction //! information. //! //! # Questions //! //! What is reduction information, and why is it necessary to store //! it? //! //! # Nullable-closure process //! //! In the process of running the chain-rule machine, we will collapse //! edges, in the sense that, if there is an edge from node a to node //! b and another from node b to node c, and if the edge from node a //! to node b is "nullable", that is if the edge corresponds to a //! position in the atomic language that is deemed nullable by the //! atomic language, then we also make an edge from node a to node c. //! //! The purpose of this process of forming the nullable closure is to //! ensure that the chain-rule machine can save the time to traverse //! the entire machine to find the nullable closure later on. But a //! side-effect of this process is that reductions are not explicitly //! marked. //! //! To explain this in detail, we first investigate what reduction is //! and what it means in the chain-rule machine, and then explain the //! problem. //! //! # Three types of actions //! //! We can imagine a "traditional parser generator" as a stack //! machine: there are three types of actions associated with it, //! depending on the current state and the current input token. The //! first is expansion: it means that we are expanding from a //! non-terminal, by some rule. The second is a normal push: we just //! continue parsing according to the current rule. The final one is //! reduction: it means the current expansion from a non-terminal has //! terminated and we go back to the previous level of expansion. //! //! # Relation to the chain-rule machine //! //! For our chain-rule machine, expansion means to create a new node //! pointing at the current node, forming a path of length increased //! by one. A normal push means to create a new node that points to //! the target of an edge going out from the current node, which was //! not created by the process of forming nullable closures. And the //! reduction means to create a new node that points to the target of //! an edge going out from the current node, which *was* created by //! the process of forming nullable closures. //! //! # Problem //! //! As can be seen from the previous paragraph, the distinction //! between a normal push and a reduction in the chain-rule machine is //! simply whether or not the original edge was created by the process //! of forming nullable closures. For the chain-rule machine, this //! does not matter: it can function well. For the formation of the //! derivation forest, however, this is not so well: we cannot //! read-off immediately whether or not to perform reductions from the //! chain-rule machine. //! //! # Solution //! //! Since we cannot read-off the reduction information directly from //! the chain-rule machine, we have to store that information somehow. //! //! ## Digression: past experiences //! //! When developping this part, I did not have a clear picture of the //! situation in mind: I was experimenting with the ways to construct //! the derivation forest from the chain-rule machine, as the paper //! describing the algorithm does not construct the derivation forest //! directly: it constructs an alternative format. A consequence is //! that I spent quite some time in figuring out how to construct //! derivation forests correctly. //! //! During the experiments, I tried various ideas: including to //! "fill-in" the reductions after we have parsed the entire input. //! This seemed ostensibly a good idea, as I seem to be able to //! "read-off" the reductions from the resulting partially complete //! derivation forests. //! //! As it turned out, this was actually a bad idea. In fact, I can //! now prove that this will not work by the presence of a specific //! grammar that will cause this approach to fail definitely. This //! led me to believe that the only way is to store the needed //! reduction information in order to fill in this gap. //! //! ## Storing reduction information //! //! Now we want to store the reduction information, so we need to be //! clear about what we are storing. //! //! In the derivation forest, a reduction happens when we reach the //! right-most descendent of a subtree, which marks the end of an //! expansion from a non-terminal. Moreover, our derivation forests //! usually contain half-open nodes, which mean that they are not //! completed yet and can keep growing, until a reduction happens when //! we "close" the half-open node. Thus, what we really need is the //! list of nodes to close. //! //! This information is readily available when we form nullable //! closures: the skipped nodes are the nodes we need to close later //! on. To be more exact, when we skip nodes, we have two nodes: the //! top node and the bottom node, and we want to store the middle node //! that is skipped. //! //! This naturally led to the structure of a nondeterministic finite //! automaton: when we want to close nodes, we start from the //! bottom-most node, and query the nodes upward by the top node, //! until we reach the top node or until we have no more nodes to go. //! //! The special characteristic about this structure is that we need to //! label both the vertices and the edges. Since we already have a //! structure that labels edges, we can simply extend that structure //! by adding an array of labels and possibly a map from labels to //! nodes, to make sure the labels of vertices are unique and can be //! queried quickly. //! //! One thing to note is that we actually only need the ability to //! query children by the labels, and do not need to query labels by //! the target. So we can represent this nondeterministic finite //! automaton by a plain hashmap. #![allow(unused)] use std::collections::{hash_set::Iter, HashMap, HashSet}; #[derive(Debug, Default, Clone)] pub(crate) struct Reducer(HashMap<(usize, usize), HashSet>); #[derive(Debug, Default)] pub(crate) enum ReducerIter<'a> { #[default] Empty, NonEmpty(Iter<'a, usize>), } impl<'a> Iterator for ReducerIter<'a> { type Item = usize; #[inline] fn next(&mut self) -> Option { match self { ReducerIter::Empty => None, ReducerIter::NonEmpty(iter) => iter.next().copied(), } } #[inline] fn size_hint(&self) -> (usize, Option) { match self { ReducerIter::Empty => (0, Some(0)), ReducerIter::NonEmpty(iter) => iter.size_hint(), } } } impl<'a> ExactSizeIterator for ReducerIter<'a> { #[inline] fn len(&self) -> usize { match self { ReducerIter::Empty => 0, ReducerIter::NonEmpty(iter) => iter.len(), } } } impl<'a> ReducerIter<'a> { #[inline] pub(crate) fn new(iter: Option>) -> Self { match iter { Some(iter) => Self::NonEmpty(iter), None => Self::default(), } } } impl Reducer { #[inline] pub(crate) fn query(&self, bottom: usize, top: usize) -> ReducerIter<'_> { ReducerIter::new(self.0.get(&(bottom, top)).map(|set| set.iter())) } #[inline] pub(crate) fn save(&mut self, bottom: usize, top: usize, middle: usize) { self.0 .entry((bottom, top)) .or_insert_with(Default::default) .insert(middle); } }