diff options
Diffstat (limited to 'chain/src/item/reduction.rs')
-rw-r--r-- | chain/src/item/reduction.rs | 428 |
1 files changed, 428 insertions, 0 deletions
diff --git a/chain/src/item/reduction.rs b/chain/src/item/reduction.rs new file mode 100644 index 0000000..89306e6 --- /dev/null +++ b/chain/src/item/reduction.rs @@ -0,0 +1,428 @@ +//! This module implements the operation of reductions on the forest. +//! +//! # What are reductions +//! +//! To explain this in detail, we first investigate the +//! nullable-closure process, then discuss what reduction is and what +//! it means in the chain-rule machine, and then explain the problem. +//! +//! ## Nullable-closure process +//! +//! In the process of running the chain-rule machine, we will collapse +//! edges, in the sense that, if there is an edge from node a to node +//! b and another from node b to node c, and if the edge from node a +//! to node b is "nullable", that is if the edge corresponds to a +//! position in the atomic language that is deemed nullable by the +//! atomic language, then we also make an edge from node a to node c. +//! +//! The purpose of this process of forming the nullable closure is to +//! ensure that the chain-rule machine can save the time to traverse +//! the entire machine to find the nullable closure later on. But a +//! side-effect of this process is that reductions are not explicitly +//! marked. +//! +//! Note that we must perform this nullable closure process in order +//! that the time complexity of the chain-rule machine lies within the +//! cubic bound. +//! +//! ## Three types of actions +//! +//! We can imagine a "traditional parser generator" as a stack +//! machine: there are three types of actions associated with it, +//! depending on the current state and the current input token. The +//! first is expansion: it means that we are expanding from a +//! non-terminal, by some rule. The second is a normal push: we just +//! continue parsing according to the current rule. The final one is +//! reduction: it means the current expansion from a non-terminal has +//! terminated and we go back to the previous level of expansion. +//! +//! ## Relation to the chain-rule machine +//! +//! For our chain-rule machine, expansion means to create a new node +//! pointing at the current node, forming a path with one more length. +//! A normal push means to create a new node that points to the target +//! of an edge going out from the current node, which was not created +//! by the process of forming nullable closures. And the reduction +//! means to create a new node that points to the target of an edge +//! going out from the current node, which *was* created by the +//! process of forming nullable closures. +//! +//! # Problem +//! +//! As can be seen from the previous paragraph, the distinction +//! between a normal push and a reduction in the chain-rule machine is +//! simply whether or not the original edge was created by the process +//! of forming nullable closures. For the chain-rule machine, this +//! does not matter: it can function well. For the formation of the +//! derivation forest, however, this is not so well: we cannot +//! read-off immediately whether or not to perform reductions from the +//! chain-rule machine. +//! +//! # Solutions +//! +//! I tried some solutions to solve this problem. +//! +//! ## Complete at the end +//! +//! The first idea I tried is to find the nodes that were not closed +//! properly and fill in the needed reductions, at the end of the +//! parse. This idea did not end up well, as we already lost a lot of +//! information at the end of the parse, so it becomes quite difficult +//! to know on which nodes we need to perform reductions. +//! +//! ## Record precise information +//! +//! The next idea is to record the nodes that were skipped by the +//! nullable-closure process, and then when we encounter a skipped +//! segment, we perform the reductions there. This idea sounds +//! feasible at first, but it turns out that we cannot track the nodes +//! properly. That is, when running the chain-rule machine, we will +//! split and clone nodes from time to time. A consequence is that +//! the old node numbers that were recorded previously will not be +//! correct afterwards. This means we cannot know exactly which +//! reductions to perform later. +//! +//! ## Guided brute-force +//! +//! Now I am trying out the third idea. The motivation is simple: if +//! we cannot properly track nodes, then we track no nodes. Then, in +//! order to perform reductions, we do not distinguish between +//! necessary reductions and unnecessary reductions, and perform all +//! possible reductions. In other words, when we are about to plant a +//! new node in the forest, we first check the last child of the +//! to-be-parent. If it is properly closed, we do nothing. +//! Otherwise, we recursively descend into its children(s) to find all +//! last children, and perform all possible valid reductions. + +use super::*; +use crate::{ + atom::{Atom, DefaultAtom}, + default::Error, + item::default::DefaultForest, +}; +use grammar::{GrammarLabel, TNT}; +use graph::Graph; + +use std::collections::HashMap as Map; + +impl DefaultForest<ForestLabel<GrammarLabel>> { + /// Perform reduction at last descendents of `node`. + /// + /// # Parameters + /// + /// The parameter `pos` is the next starting position. It is used + /// to find the descendents that need reductions: only those nodes + /// which have descendents with the correct ending positions will + /// receive reductions. + /// + /// The parameter `atom` is used to know which rule numbers are + /// deemed as accepting. Only accepting rule numbers can receive + /// reductions: this is the definition of being accepting. + /// + /// The parameter `ter` is used to fill in segments for virtual + /// nodes if they are found along the way. + /// + /// # Errors + /// + /// 1. Index out of bounds: some node number is corrupted. + /// 2. Node has no label: some node label is lost. + pub(crate) fn reduction( + &mut self, + node: usize, + pos: usize, + ter: usize, + atom: &DefaultAtom, + ) -> Result<usize, Error> { + let mut result = node; + + // step 1: Determine if this needs reductions. + + if self.root() == Some(node) { + return Ok(result); + } + + // REVIEW: Do we need to check the end matches the position? + + let mut node_label = self + .vertex_label(node)? + .ok_or(Error::NodeNoLabel(node))? + .label(); + + if node_label.end().is_some() { + return Ok(result); + } + + // step 2: Find descendents that need reductions. + + let mut correct_ends: Map<usize, bool> = Default::default(); + + let mut order_of_correct_ends: Vec<usize> = Vec::new(); + + let mut stack: Vec<usize> = vec![node]; + + let mut file = std::fs::OpenOptions::new().append(true).open("debug.log"); + + use std::{borrow::BorrowMut, io::Write}; + + // Whether or not to write a debug file. + let to_write = false; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("beginning, pos = {pos}, node = {node}\n").as_bytes()) + .unwrap(); + } + } + + 'stack_loop: while let Some(top) = stack.pop() { + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("top: {top}\n").as_bytes()).unwrap(); + } + } + + if correct_ends.get(&top).is_some() { + continue 'stack_loop; + } + + let top_label = self.vertex_label(top)?.ok_or(Error::NodeNoLabel(top))?; + + if let Some(end) = top_label.label().end() { + correct_ends.insert(top, end == pos); + + continue 'stack_loop; + } + + if let Some(rule) = top_label.label().label().rule() { + // A rule node is not considered directly: it should + // be affected by its child implicitly. + // + // We only consider a rule node if it is deemed + // accepting by the atom. + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: {rule}, {stack:?}\n", line!()).as_bytes()) + .unwrap(); + } + } + + if atom + .is_accepting(rule * 2 + 1) + .map_err(|_| Error::IndexOutOfBounds(2 * rule + 1, atom.accepting_len()))? + { + let mut has_unexplored_children = false; + let mut inserted = false; + + 'child_loop: for child in self.children_of(top)? { + match correct_ends.get(&child).copied() { + Some(true) => { + correct_ends.insert(top, true); + + inserted = true; + } + None => { + has_unexplored_children = true; + + break 'child_loop; + } + _ => {} + } + } + + if has_unexplored_children { + stack.push(top); + stack.extend(self.children_of(top)?); + } else if !inserted { + correct_ends.insert(top, false); + } + } else { + correct_ends.insert(top, false); + } + + continue 'stack_loop; + } + + if top_label.is_packed() { + let mut has_unexplored_children = false; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: packed, {stack:?}\n", line!()).as_bytes()) + .unwrap(); + } + } + + for child in self.children_of(top)? { + match correct_ends.get(&child).copied() { + Some(true) => { + // NOTE: This is not recorded in the + // correct orders, as we do not handle + // packed nodes directly. + + correct_ends.insert(top, true); + } + None => { + has_unexplored_children = true; + } + _ => {} + } + } + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all( + format!("{}: packed, {has_unexplored_children}\n", line!()).as_bytes(), + ) + .unwrap(); + } + } + + if has_unexplored_children { + stack.push(top); + stack.extend(self.children_of(top)?); + } else if correct_ends.get(&top).is_none() { + correct_ends.insert(top, false); + } + + continue 'stack_loop; + } + + let degree = self.degree(top)?; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: degree = {degree}\n", line!()).as_bytes()) + .unwrap(); + } + } + + let last_index = if degree != 0 { + degree - 1 + } else { + // a leaf is supposed to be a terminal node and hence + // should have an ending position + + let end = match top_label.label().end() { + None => match top_label.label().label().tnt() { + Some(TNT::Ter(_)) => { + panic!("a terminal node {top} has no ending position?"); + } + Some(TNT::Non(nt)) => { + correct_ends.insert(top, true); + + self.close_pavi(atom.borrow(), PaVi::Virtual(nt, ter, top), pos)?; + + continue 'stack_loop; + } + None => { + unreachable!("we already handled the rule case above"); + } + }, + Some(end) => end, + }; + + correct_ends.insert(top, end == pos); + + if end == pos { + order_of_correct_ends.push(top); + } + + continue 'stack_loop; + }; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: last = {last_index}\n", line!()).as_bytes()) + .unwrap(); + } + } + + let last_child = self.nth_child(top, last_index)?; + + if let Some(child_correctness_value) = correct_ends.get(&last_child).copied() { + correct_ends.insert(top, child_correctness_value); + + if child_correctness_value { + order_of_correct_ends.push(top); + } + } else { + stack.extend([top, last_child]); + } + } + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all( + format!("{}: orders = {order_of_correct_ends:?}\n", line!()).as_bytes(), + ) + .unwrap(); + } + } + + // step 3: perform reductions by `splone`. + // + // NOTE: We must fix the order from top to bottom: this is the + // reverse order of `order_of_correct_ends` . + + for node in order_of_correct_ends.into_iter().rev() { + let label = self.vertex_label(node)?.ok_or(Error::NodeNoLabel(node))?; + let degree = self.degree(node)?; + + if !matches!(label.label().label().tnt(), Some(TNT::Non(_))) { + continue; + } + + #[cfg(debug_assertions)] + { + let label_label_label = label.label().label(); + + assert!(label_label_label.rule().is_none()); + + assert!(matches!(label_label_label.tnt(), Some(TNT::Non(_)))); + + assert!(label.label().end().is_none()); + + assert_ne!(degree, 0); + } + + let last_index = degree - 1; + + self.splone(node, Some(pos), last_index, false)?; + } + + node_label.set_end(pos); + + if let Some(packed) = + self.query_label(ForestLabel::new(node_label, ForestLabelType::Packed)) + { + result = packed; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: packed = {packed}\n", line!()).as_bytes()) + .unwrap(); + } + } + } else if let Some(plain) = + self.query_label(ForestLabel::new(node_label, ForestLabelType::Plain)) + { + result = plain; + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(format!("{}: plain = {plain}\n", line!()).as_bytes()) + .unwrap(); + } + } + } + + if to_write { + if let Ok(ref mut file) = file.borrow_mut() { + file.write_all(&[101, 110, 100, 10]).unwrap(); + } + } + + Ok(result) + } +} |