summaryrefslogtreecommitdiff
path: root/chain/src/item/reduction.rs
diff options
context:
space:
mode:
Diffstat (limited to 'chain/src/item/reduction.rs')
-rw-r--r--chain/src/item/reduction.rs428
1 files changed, 428 insertions, 0 deletions
diff --git a/chain/src/item/reduction.rs b/chain/src/item/reduction.rs
new file mode 100644
index 0000000..89306e6
--- /dev/null
+++ b/chain/src/item/reduction.rs
@@ -0,0 +1,428 @@
+//! This module implements the operation of reductions on the forest.
+//!
+//! # What are reductions
+//!
+//! To explain this in detail, we first investigate the
+//! nullable-closure process, then discuss what reduction is and what
+//! it means in the chain-rule machine, and then explain the problem.
+//!
+//! ## Nullable-closure process
+//!
+//! In the process of running the chain-rule machine, we will collapse
+//! edges, in the sense that, if there is an edge from node a to node
+//! b and another from node b to node c, and if the edge from node a
+//! to node b is "nullable", that is if the edge corresponds to a
+//! position in the atomic language that is deemed nullable by the
+//! atomic language, then we also make an edge from node a to node c.
+//!
+//! The purpose of this process of forming the nullable closure is to
+//! ensure that the chain-rule machine can save the time to traverse
+//! the entire machine to find the nullable closure later on. But a
+//! side-effect of this process is that reductions are not explicitly
+//! marked.
+//!
+//! Note that we must perform this nullable closure process in order
+//! that the time complexity of the chain-rule machine lies within the
+//! cubic bound.
+//!
+//! ## Three types of actions
+//!
+//! We can imagine a "traditional parser generator" as a stack
+//! machine: there are three types of actions associated with it,
+//! depending on the current state and the current input token. The
+//! first is expansion: it means that we are expanding from a
+//! non-terminal, by some rule. The second is a normal push: we just
+//! continue parsing according to the current rule. The final one is
+//! reduction: it means the current expansion from a non-terminal has
+//! terminated and we go back to the previous level of expansion.
+//!
+//! ## Relation to the chain-rule machine
+//!
+//! For our chain-rule machine, expansion means to create a new node
+//! pointing at the current node, forming a path with one more length.
+//! A normal push means to create a new node that points to the target
+//! of an edge going out from the current node, which was not created
+//! by the process of forming nullable closures. And the reduction
+//! means to create a new node that points to the target of an edge
+//! going out from the current node, which *was* created by the
+//! process of forming nullable closures.
+//!
+//! # Problem
+//!
+//! As can be seen from the previous paragraph, the distinction
+//! between a normal push and a reduction in the chain-rule machine is
+//! simply whether or not the original edge was created by the process
+//! of forming nullable closures. For the chain-rule machine, this
+//! does not matter: it can function well. For the formation of the
+//! derivation forest, however, this is not so well: we cannot
+//! read-off immediately whether or not to perform reductions from the
+//! chain-rule machine.
+//!
+//! # Solutions
+//!
+//! I tried some solutions to solve this problem.
+//!
+//! ## Complete at the end
+//!
+//! The first idea I tried is to find the nodes that were not closed
+//! properly and fill in the needed reductions, at the end of the
+//! parse. This idea did not end up well, as we already lost a lot of
+//! information at the end of the parse, so it becomes quite difficult
+//! to know on which nodes we need to perform reductions.
+//!
+//! ## Record precise information
+//!
+//! The next idea is to record the nodes that were skipped by the
+//! nullable-closure process, and then when we encounter a skipped
+//! segment, we perform the reductions there. This idea sounds
+//! feasible at first, but it turns out that we cannot track the nodes
+//! properly. That is, when running the chain-rule machine, we will
+//! split and clone nodes from time to time. A consequence is that
+//! the old node numbers that were recorded previously will not be
+//! correct afterwards. This means we cannot know exactly which
+//! reductions to perform later.
+//!
+//! ## Guided brute-force
+//!
+//! Now I am trying out the third idea. The motivation is simple: if
+//! we cannot properly track nodes, then we track no nodes. Then, in
+//! order to perform reductions, we do not distinguish between
+//! necessary reductions and unnecessary reductions, and perform all
+//! possible reductions. In other words, when we are about to plant a
+//! new node in the forest, we first check the last child of the
+//! to-be-parent. If it is properly closed, we do nothing.
+//! Otherwise, we recursively descend into its children(s) to find all
+//! last children, and perform all possible valid reductions.
+
+use super::*;
+use crate::{
+ atom::{Atom, DefaultAtom},
+ default::Error,
+ item::default::DefaultForest,
+};
+use grammar::{GrammarLabel, TNT};
+use graph::Graph;
+
+use std::collections::HashMap as Map;
+
+impl DefaultForest<ForestLabel<GrammarLabel>> {
+ /// Perform reduction at last descendents of `node`.
+ ///
+ /// # Parameters
+ ///
+ /// The parameter `pos` is the next starting position. It is used
+ /// to find the descendents that need reductions: only those nodes
+ /// which have descendents with the correct ending positions will
+ /// receive reductions.
+ ///
+ /// The parameter `atom` is used to know which rule numbers are
+ /// deemed as accepting. Only accepting rule numbers can receive
+ /// reductions: this is the definition of being accepting.
+ ///
+ /// The parameter `ter` is used to fill in segments for virtual
+ /// nodes if they are found along the way.
+ ///
+ /// # Errors
+ ///
+ /// 1. Index out of bounds: some node number is corrupted.
+ /// 2. Node has no label: some node label is lost.
+ pub(crate) fn reduction(
+ &mut self,
+ node: usize,
+ pos: usize,
+ ter: usize,
+ atom: &DefaultAtom,
+ ) -> Result<usize, Error> {
+ let mut result = node;
+
+ // step 1: Determine if this needs reductions.
+
+ if self.root() == Some(node) {
+ return Ok(result);
+ }
+
+ // REVIEW: Do we need to check the end matches the position?
+
+ let mut node_label = self
+ .vertex_label(node)?
+ .ok_or(Error::NodeNoLabel(node))?
+ .label();
+
+ if node_label.end().is_some() {
+ return Ok(result);
+ }
+
+ // step 2: Find descendents that need reductions.
+
+ let mut correct_ends: Map<usize, bool> = Default::default();
+
+ let mut order_of_correct_ends: Vec<usize> = Vec::new();
+
+ let mut stack: Vec<usize> = vec![node];
+
+ let mut file = std::fs::OpenOptions::new().append(true).open("debug.log");
+
+ use std::{borrow::BorrowMut, io::Write};
+
+ // Whether or not to write a debug file.
+ let to_write = false;
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("beginning, pos = {pos}, node = {node}\n").as_bytes())
+ .unwrap();
+ }
+ }
+
+ 'stack_loop: while let Some(top) = stack.pop() {
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("top: {top}\n").as_bytes()).unwrap();
+ }
+ }
+
+ if correct_ends.get(&top).is_some() {
+ continue 'stack_loop;
+ }
+
+ let top_label = self.vertex_label(top)?.ok_or(Error::NodeNoLabel(top))?;
+
+ if let Some(end) = top_label.label().end() {
+ correct_ends.insert(top, end == pos);
+
+ continue 'stack_loop;
+ }
+
+ if let Some(rule) = top_label.label().label().rule() {
+ // A rule node is not considered directly: it should
+ // be affected by its child implicitly.
+ //
+ // We only consider a rule node if it is deemed
+ // accepting by the atom.
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: {rule}, {stack:?}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+
+ if atom
+ .is_accepting(rule * 2 + 1)
+ .map_err(|_| Error::IndexOutOfBounds(2 * rule + 1, atom.accepting_len()))?
+ {
+ let mut has_unexplored_children = false;
+ let mut inserted = false;
+
+ 'child_loop: for child in self.children_of(top)? {
+ match correct_ends.get(&child).copied() {
+ Some(true) => {
+ correct_ends.insert(top, true);
+
+ inserted = true;
+ }
+ None => {
+ has_unexplored_children = true;
+
+ break 'child_loop;
+ }
+ _ => {}
+ }
+ }
+
+ if has_unexplored_children {
+ stack.push(top);
+ stack.extend(self.children_of(top)?);
+ } else if !inserted {
+ correct_ends.insert(top, false);
+ }
+ } else {
+ correct_ends.insert(top, false);
+ }
+
+ continue 'stack_loop;
+ }
+
+ if top_label.is_packed() {
+ let mut has_unexplored_children = false;
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: packed, {stack:?}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+
+ for child in self.children_of(top)? {
+ match correct_ends.get(&child).copied() {
+ Some(true) => {
+ // NOTE: This is not recorded in the
+ // correct orders, as we do not handle
+ // packed nodes directly.
+
+ correct_ends.insert(top, true);
+ }
+ None => {
+ has_unexplored_children = true;
+ }
+ _ => {}
+ }
+ }
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(
+ format!("{}: packed, {has_unexplored_children}\n", line!()).as_bytes(),
+ )
+ .unwrap();
+ }
+ }
+
+ if has_unexplored_children {
+ stack.push(top);
+ stack.extend(self.children_of(top)?);
+ } else if correct_ends.get(&top).is_none() {
+ correct_ends.insert(top, false);
+ }
+
+ continue 'stack_loop;
+ }
+
+ let degree = self.degree(top)?;
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: degree = {degree}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+
+ let last_index = if degree != 0 {
+ degree - 1
+ } else {
+ // a leaf is supposed to be a terminal node and hence
+ // should have an ending position
+
+ let end = match top_label.label().end() {
+ None => match top_label.label().label().tnt() {
+ Some(TNT::Ter(_)) => {
+ panic!("a terminal node {top} has no ending position?");
+ }
+ Some(TNT::Non(nt)) => {
+ correct_ends.insert(top, true);
+
+ self.close_pavi(atom.borrow(), PaVi::Virtual(nt, ter, top), pos)?;
+
+ continue 'stack_loop;
+ }
+ None => {
+ unreachable!("we already handled the rule case above");
+ }
+ },
+ Some(end) => end,
+ };
+
+ correct_ends.insert(top, end == pos);
+
+ if end == pos {
+ order_of_correct_ends.push(top);
+ }
+
+ continue 'stack_loop;
+ };
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: last = {last_index}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+
+ let last_child = self.nth_child(top, last_index)?;
+
+ if let Some(child_correctness_value) = correct_ends.get(&last_child).copied() {
+ correct_ends.insert(top, child_correctness_value);
+
+ if child_correctness_value {
+ order_of_correct_ends.push(top);
+ }
+ } else {
+ stack.extend([top, last_child]);
+ }
+ }
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(
+ format!("{}: orders = {order_of_correct_ends:?}\n", line!()).as_bytes(),
+ )
+ .unwrap();
+ }
+ }
+
+ // step 3: perform reductions by `splone`.
+ //
+ // NOTE: We must fix the order from top to bottom: this is the
+ // reverse order of `order_of_correct_ends` .
+
+ for node in order_of_correct_ends.into_iter().rev() {
+ let label = self.vertex_label(node)?.ok_or(Error::NodeNoLabel(node))?;
+ let degree = self.degree(node)?;
+
+ if !matches!(label.label().label().tnt(), Some(TNT::Non(_))) {
+ continue;
+ }
+
+ #[cfg(debug_assertions)]
+ {
+ let label_label_label = label.label().label();
+
+ assert!(label_label_label.rule().is_none());
+
+ assert!(matches!(label_label_label.tnt(), Some(TNT::Non(_))));
+
+ assert!(label.label().end().is_none());
+
+ assert_ne!(degree, 0);
+ }
+
+ let last_index = degree - 1;
+
+ self.splone(node, Some(pos), last_index, false)?;
+ }
+
+ node_label.set_end(pos);
+
+ if let Some(packed) =
+ self.query_label(ForestLabel::new(node_label, ForestLabelType::Packed))
+ {
+ result = packed;
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: packed = {packed}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+ } else if let Some(plain) =
+ self.query_label(ForestLabel::new(node_label, ForestLabelType::Plain))
+ {
+ result = plain;
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(format!("{}: plain = {plain}\n", line!()).as_bytes())
+ .unwrap();
+ }
+ }
+ }
+
+ if to_write {
+ if let Ok(ref mut file) = file.borrow_mut() {
+ file.write_all(&[101, 110, 100, 10]).unwrap();
+ }
+ }
+
+ Ok(result)
+ }
+}