summaryrefslogtreecommitdiff
path: root/chain/src/reducer.rs
diff options
context:
space:
mode:
Diffstat (limited to 'chain/src/reducer.rs')
-rw-r--r--chain/src/reducer.rs191
1 files changed, 191 insertions, 0 deletions
diff --git a/chain/src/reducer.rs b/chain/src/reducer.rs
new file mode 100644
index 0000000..7ff1858
--- /dev/null
+++ b/chain/src/reducer.rs
@@ -0,0 +1,191 @@
+//! This module implements a data type for storing reduction
+//! information.
+//!
+//! # Questions
+//!
+//! What is reduction information, and why is it necessary to store
+//! it?
+//!
+//! # Nullable-closure process
+//!
+//! In the process of running the chain-rule machine, we will collapse
+//! edges, in the sense that, if there is an edge from node a to node
+//! b and another from node b to node c, and if the edge from node a
+//! to node b is "nullable", that is if the edge corresponds to a
+//! position in the atomic language that is deemed nullable by the
+//! atomic language, then we also make an edge from node a to node c.
+//!
+//! The purpose of this process of forming the nullable closure is to
+//! ensure that the chain-rule machine can save the time to traverse
+//! the entire machine to find the nullable closure later on. But a
+//! side-effect of this process is that reductions are not explicitly
+//! marked.
+//!
+//! To explain this in detail, we first investigate what reduction is
+//! and what it means in the chain-rule machine, and then explain the
+//! problem.
+//!
+//! # Three types of actions
+//!
+//! We can imagine a "traditional parser generator" as a stack
+//! machine: there are three types of actions associated with it,
+//! depending on the current state and the current input token. The
+//! first is expansion: it means that we are expanding from a
+//! non-terminal, by some rule. The second is a normal push: we just
+//! continue parsing according to the current rule. The final one is
+//! reduction: it means the current expansion from a non-terminal has
+//! terminated and we go back to the previous level of expansion.
+//!
+//! # Relation to the chain-rule machine
+//!
+//! For our chain-rule machine, expansion means to create a new node
+//! pointing at the current node, forming a path of length increased
+//! by one. A normal push means to create a new node that points to
+//! the target of an edge going out from the current node, which was
+//! not created by the process of forming nullable closures. And the
+//! reduction means to create a new node that points to the target of
+//! an edge going out from the current node, which *was* created by
+//! the process of forming nullable closures.
+//!
+//! # Problem
+//!
+//! As can be seen from the previous paragraph, the distinction
+//! between a normal push and a reduction in the chain-rule machine is
+//! simply whether or not the original edge was created by the process
+//! of forming nullable closures. For the chain-rule machine, this
+//! does not matter: it can function well. For the formation of the
+//! derivation forest, however, this is not so well: we cannot
+//! read-off immediately whether or not to perform reductions from the
+//! chain-rule machine.
+//!
+//! # Solution
+//!
+//! Since we cannot read-off the reduction information directly from
+//! the chain-rule machine, we have to store that information somehow.
+//!
+//! ## Digression: past experiences
+//!
+//! When developping this part, I did not have a clear picture of the
+//! situation in mind: I was experimenting with the ways to construct
+//! the derivation forest from the chain-rule machine, as the paper
+//! describing the algorithm does not construct the derivation forest
+//! directly: it constructs an alternative format. A consequence is
+//! that I spent quite some time in figuring out how to construct
+//! derivation forests correctly.
+//!
+//! During the experiments, I tried various ideas: including to
+//! "fill-in" the reductions after we have parsed the entire input.
+//! This seemed ostensibly a good idea, as I seem to be able to
+//! "read-off" the reductions from the resulting partially complete
+//! derivation forests.
+//!
+//! As it turned out, this was actually a bad idea. In fact, I can
+//! now prove that this will not work by the presence of a specific
+//! grammar that will cause this approach to fail definitely. This
+//! led me to believe that the only way is to store the needed
+//! reduction information in order to fill in this gap.
+//!
+//! ## Storing reduction information
+//!
+//! Now we want to store the reduction information, so we need to be
+//! clear about what we are storing.
+//!
+//! In the derivation forest, a reduction happens when we reach the
+//! right-most descendent of a subtree, which marks the end of an
+//! expansion from a non-terminal. Moreover, our derivation forests
+//! usually contain half-open nodes, which mean that they are not
+//! completed yet and can keep growing, until a reduction happens when
+//! we "close" the half-open node. Thus, what we really need is the
+//! list of nodes to close.
+//!
+//! This information is readily available when we form nullable
+//! closures: the skipped nodes are the nodes we need to close later
+//! on. To be more exact, when we skip nodes, we have two nodes: the
+//! top node and the bottom node, and we want to store the middle node
+//! that is skipped.
+//!
+//! This naturally led to the structure of a nondeterministic finite
+//! automaton: when we want to close nodes, we start from the
+//! bottom-most node, and query the nodes upward by the top node,
+//! until we reach the top node or until we have no more nodes to go.
+//!
+//! The special characteristic about this structure is that we need to
+//! label both the vertices and the edges. Since we already have a
+//! structure that labels edges, we can simply extend that structure
+//! by adding an array of labels and possibly a map from labels to
+//! nodes, to make sure the labels of vertices are unique and can be
+//! queried quickly.
+//!
+//! One thing to note is that we actually only need the ability to
+//! query children by the labels, and do not need to query labels by
+//! the target. So we can represent this nondeterministic finite
+//! automaton by a plain hashmap.
+
+#![allow(unused)]
+
+use std::collections::{hash_set::Iter, HashMap, HashSet};
+
+#[derive(Debug, Default, Clone)]
+pub(crate) struct Reducer(HashMap<(usize, usize), HashSet<usize>>);
+
+#[derive(Debug, Default)]
+pub(crate) enum ReducerIter<'a> {
+ #[default]
+ Empty,
+ NonEmpty(Iter<'a, usize>),
+}
+
+impl<'a> Iterator for ReducerIter<'a> {
+ type Item = usize;
+
+ #[inline]
+ fn next(&mut self) -> Option<Self::Item> {
+ match self {
+ ReducerIter::Empty => None,
+ ReducerIter::NonEmpty(iter) => iter.next().copied(),
+ }
+ }
+
+ #[inline]
+ fn size_hint(&self) -> (usize, Option<usize>) {
+ match self {
+ ReducerIter::Empty => (0, Some(0)),
+ ReducerIter::NonEmpty(iter) => iter.size_hint(),
+ }
+ }
+}
+
+impl<'a> ExactSizeIterator for ReducerIter<'a> {
+ #[inline]
+ fn len(&self) -> usize {
+ match self {
+ ReducerIter::Empty => 0,
+ ReducerIter::NonEmpty(iter) => iter.len(),
+ }
+ }
+}
+
+impl<'a> ReducerIter<'a> {
+ #[inline]
+ pub(crate) fn new(iter: Option<Iter<'a, usize>>) -> Self {
+ match iter {
+ Some(iter) => Self::NonEmpty(iter),
+ None => Self::default(),
+ }
+ }
+}
+
+impl Reducer {
+ #[inline]
+ pub(crate) fn query(&self, bottom: usize, top: usize) -> ReducerIter<'_> {
+ ReducerIter::new(self.0.get(&(bottom, top)).map(|set| set.iter()))
+ }
+
+ #[inline]
+ pub(crate) fn save(&mut self, bottom: usize, top: usize, middle: usize) {
+ self.0
+ .entry((bottom, top))
+ .or_insert_with(Default::default)
+ .insert(middle);
+ }
+}