fixed the bugs of node duplications and left-open nodes

There were two main issues in the previous version. One is that there are lots of duplications of nodes when manipulating the forest. This does not mean that labels repeat: by the use of the data type this cannot happen. What happened is that there were cloned nodes whose children are exactly equal. In this case there is no need to clone that node in the first place. This is now fixed by checking carefully before cloning, so that we do not clone unnecessary nodes. The other issue, which is perhaps more important, is that there are nodes which are not closed. This means that when there should be a reuction of grammar rules, the forest does not mark the corresponding node as already reduced. The incorrect forests thus caused is hard to fix: I tried several different approaches to fix it afterwards, but all to no avail. I also tried to record enough information to fix these nodes during the manipulations. It turned out that recording nodes is a dead end, as I cannot properly syncronize the information in the forest and the information in the chain-rule machine. Any inconsistencies will result in incorrect operations later on. The approach I finally adapt is to perform every possible reduction at each step. This might lead to some more nodes than what we need. But those are technically expected to be there after all, and it is easy to filter them out, so it is fine, from my point of view at the moment. Therefore, what remains is to filter those nodes out and connect it to the holy Emacs. :D
author: JSDurand <mmemmew@gmail.com> 2023-06-18 15:03:34 +0800
committer: JSDurand <mmemmew@gmail.com> 2023-06-18 15:03:34 +0800
commit: a80db17473ff09cc72acba2c1975101e6dbedf39 (patch)
tree: 4d5dfcdfcd1537b51b92349d9a5274aa90a6d110 /chain/src/reducer.rs
parent: 6ce44bb3bdb79e8e727ee6fc7f5c6cd7fa0bb51e (diff)
1 files changed, 191 insertions, 0 deletions
diff --git a/chain/src/reducer.rs b/chain/src/reducer.rs
new file mode 100644
index 0000000..7ff1858
--- /dev/null
+++ b/chain/src/reducer.rs
@@ -0,0 +1,191 @@
+//! This module implements a data type for storing reduction
+//! information.
+//!
+//! # Questions
+//!
+//! What is reduction information, and why is it necessary to store
+//! it?
+//!
+//! # Nullable-closure process
+//!
+//! In the process of running the chain-rule machine, we will collapse
+//! edges, in the sense that, if there is an edge from node a to node
+//! b and another from node b to node c, and if the edge from node a
+//! to node b is "nullable", that is if the edge corresponds to a
+//! position in the atomic language that is deemed nullable by the
+//! atomic language, then we also make an edge from node a to node c.
+//!
+//! The purpose of this process of forming the nullable closure is to
+//! ensure that the chain-rule machine can save the time to traverse
+//! the entire machine to find the nullable closure later on.  But a
+//! side-effect of this process is that reductions are not explicitly
+//! marked.
+//!
+//! To explain this in detail, we first investigate what reduction is
+//! and what it means in the chain-rule machine, and then explain the
+//! problem.
+//!
+//! # Three types of actions
+//!
+//! We can imagine a "traditional parser generator" as a stack
+//! machine: there are three types of actions associated with it,
+//! depending on the current state and the current input token.  The
+//! first is expansion: it means that we are expanding from a
+//! non-terminal, by some rule.  The second is a normal push: we just
+//! continue parsing according to the current rule.  The final one is
+//! reduction: it means the current expansion from a non-terminal has
+//! terminated and we go back to the previous level of expansion.
+//!
+//! # Relation to the chain-rule machine
+//!
+//! For our chain-rule machine, expansion means to create a new node
+//! pointing at the current node, forming a path of length increased
+//! by one.  A normal push means to create a new node that points to
+//! the target of an edge going out from the current node, which was
+//! not created by the process of forming nullable closures.  And the
+//! reduction means to create a new node that points to the target of
+//! an edge going out from the current node, which *was* created by
+//! the process of forming nullable closures.
+//!
+//! # Problem
+//!
+//! As can be seen from the previous paragraph, the distinction
+//! between a normal push and a reduction in the chain-rule machine is
+//! simply whether or not the original edge was created by the process
+//! of forming nullable closures.  For the chain-rule machine, this
+//! does not matter: it can function well.  For the formation of the
+//! derivation forest, however, this is not so well: we cannot
+//! read-off immediately whether or not to perform reductions from the
+//! chain-rule machine.
+//!
+//! # Solution
+//!
+//! Since we cannot read-off the reduction information directly from
+//! the chain-rule machine, we have to store that information somehow.
+//!
+//! ## Digression: past experiences
+//!
+//! When developping this part, I did not have a clear picture of the
+//! situation in mind: I was experimenting with the ways to construct
+//! the derivation forest from the chain-rule machine, as the paper
+//! describing the algorithm does not construct the derivation forest
+//! directly: it constructs an alternative format.  A consequence is
+//! that I spent quite some time in figuring out how to construct
+//! derivation forests correctly.
+//!
+//! During the experiments, I tried various ideas: including to
+//! "fill-in" the reductions after we have parsed the entire input.
+//! This seemed ostensibly a good idea, as I seem to be able to
+//! "read-off" the reductions from the resulting partially complete
+//! derivation forests.
+//!
+//! As it turned out, this was actually a bad idea.  In fact, I can
+//! now prove that this will not work by the presence of a specific
+//! grammar that will cause this approach to fail definitely.  This
+//! led me to believe that the only way is to store the needed
+//! reduction information in order to fill in this gap.
+//!
+//! ## Storing reduction information
+//!
+//! Now we want to store the reduction information, so we need to be
+//! clear about what we are storing.
+//!
+//! In the derivation forest, a reduction happens when we reach the
+//! right-most descendent of a subtree, which marks the end of an
+//! expansion from a non-terminal.  Moreover, our derivation forests
+//! usually contain half-open nodes, which mean that they are not
+//! completed yet and can keep growing, until a reduction happens when
+//! we "close" the half-open node.  Thus, what we really need is the
+//! list of nodes to close.
+//!
+//! This information is readily available when we form nullable
+//! closures: the skipped nodes are the nodes we need to close later
+//! on.  To be more exact, when we skip nodes, we have two nodes: the
+//! top node and the bottom node, and we want to store the middle node
+//! that is skipped.
+//!
+//! This naturally led to the structure of a nondeterministic finite
+//! automaton: when we want to close nodes, we start from the
+//! bottom-most node, and query the nodes upward by the top node,
+//! until we reach the top node or until we have no more nodes to go.
+//!
+//! The special characteristic about this structure is that we need to
+//! label both the vertices and the edges.  Since we already have a
+//! structure that labels edges, we can simply extend that structure
+//! by adding an array of labels and possibly a map from labels to
+//! nodes, to make sure the labels of vertices are unique and can be
+//! queried quickly.
+//!
+//! One thing to note is that we actually only need the ability to
+//! query children by the labels, and do not need to query labels by
+//! the target.  So we can represent this nondeterministic finite
+//! automaton by a plain hashmap.
+
+#![allow(unused)]
+
+use std::collections::{hash_set::Iter, HashMap, HashSet};
+
+#[derive(Debug, Default, Clone)]
+pub(crate) struct Reducer(HashMap<(usize, usize), HashSet<usize>>);
+
+#[derive(Debug, Default)]
+pub(crate) enum ReducerIter<'a> {
+    #[default]
+    Empty,
+    NonEmpty(Iter<'a, usize>),
+}
+
+impl<'a> Iterator for ReducerIter<'a> {
+    type Item = usize;
+
+    #[inline]
+    fn next(&mut self) -> Option<Self::Item> {
+        match self {
+            ReducerIter::Empty => None,
+            ReducerIter::NonEmpty(iter) => iter.next().copied(),
+        }
+    }
+
+    #[inline]
+    fn size_hint(&self) -> (usize, Option<usize>) {
+        match self {
+            ReducerIter::Empty => (0, Some(0)),
+            ReducerIter::NonEmpty(iter) => iter.size_hint(),
+        }
+    }
+}
+
+impl<'a> ExactSizeIterator for ReducerIter<'a> {
+    #[inline]
+    fn len(&self) -> usize {
+        match self {
+            ReducerIter::Empty => 0,
+            ReducerIter::NonEmpty(iter) => iter.len(),
+        }
+    }
+}
+
+impl<'a> ReducerIter<'a> {
+    #[inline]
+    pub(crate) fn new(iter: Option<Iter<'a, usize>>) -> Self {
+        match iter {
+            Some(iter) => Self::NonEmpty(iter),
+            None => Self::default(),
+        }
+    }
+}
+
+impl Reducer {
+    #[inline]
+    pub(crate) fn query(&self, bottom: usize, top: usize) -> ReducerIter<'_> {
+        ReducerIter::new(self.0.get(&(bottom, top)).map(|set| set.iter()))
+    }
+
+    #[inline]
+    pub(crate) fn save(&mut self, bottom: usize, top: usize, middle: usize) {
+        self.0
+            .entry((bottom, top))
+            .or_insert_with(Default::default)
+            .insert(middle);
+    }
+}
author	JSDurand <mmemmew@gmail.com>	2023-06-18 15:03:34 +0800
committer	JSDurand <mmemmew@gmail.com>	2023-06-18 15:03:34 +0800
commit	a80db17473ff09cc72acba2c1975101e6dbedf39 (patch)
tree	4d5dfcdfcd1537b51b92349d9a5274aa90a6d110 /chain/src/reducer.rs
parent	6ce44bb3bdb79e8e727ee6fc7f5c6cd7fa0bb51e (diff)