//! This module implements the operation of reductions on the forest. //! //! # What are reductions //! //! To explain this in detail, we first investigate the //! nullable-closure process, then discuss what reduction is and what //! it means in the chain-rule machine, and then explain the problem. //! //! ## Nullable-closure process //! //! In the process of running the chain-rule machine, we will collapse //! edges, in the sense that, if there is an edge from node a to node //! b and another from node b to node c, and if the edge from node a //! to node b is "nullable", that is if the edge corresponds to a //! position in the atomic language that is deemed nullable by the //! atomic language, then we also make an edge from node a to node c. //! //! The purpose of this process of forming the nullable closure is to //! ensure that the chain-rule machine can save the time to traverse //! the entire machine to find the nullable closure later on. But a //! side-effect of this process is that reductions are not explicitly //! marked. //! //! Note that we must perform this nullable closure process in order //! that the time complexity of the chain-rule machine lies within the //! cubic bound. //! //! ## Three types of actions //! //! We can imagine a "traditional parser generator" as a stack //! machine: there are three types of actions associated with it, //! depending on the current state and the current input token. The //! first is expansion: it means that we are expanding from a //! non-terminal, by some rule. The second is a normal push: we just //! continue parsing according to the current rule. The final one is //! reduction: it means the current expansion from a non-terminal has //! terminated and we go back to the previous level of expansion. //! //! ## Relation to the chain-rule machine //! //! For our chain-rule machine, expansion means to create a new node //! pointing at the current node, forming a path with one more length. //! A normal push means to create a new node that points to the target //! of an edge going out from the current node, which was not created //! by the process of forming nullable closures. And the reduction //! means to create a new node that points to the target of an edge //! going out from the current node, which *was* created by the //! process of forming nullable closures. //! //! # Problem //! //! As can be seen from the previous paragraph, the distinction //! between a normal push and a reduction in the chain-rule machine is //! simply whether or not the original edge was created by the process //! of forming nullable closures. For the chain-rule machine, this //! does not matter: it can function well. For the formation of the //! derivation forest, however, this is not so well: we cannot //! read-off immediately whether or not to perform reductions from the //! chain-rule machine. //! //! # Solutions //! //! I tried some solutions to solve this problem. //! //! ## Complete at the end //! //! The first idea I tried is to find the nodes that were not closed //! properly and fill in the needed reductions, at the end of the //! parse. This idea did not end up well, as we already lost a lot of //! information at the end of the parse, so it becomes quite difficult //! to know on which nodes we need to perform reductions. //! //! ## Record precise information //! //! The next idea is to record the nodes that were skipped by the //! nullable-closure process, and then when we encounter a skipped //! segment, we perform the reductions there. This idea sounds //! feasible at first, but it turns out that we cannot track the nodes //! properly. That is, when running the chain-rule machine, we will //! split and clone nodes from time to time. A consequence is that //! the old node numbers that were recorded previously will not be //! correct afterwards. This means we cannot know exactly which //! reductions to perform later. //! //! ## Guided brute-force //! //! Now I am trying out the third idea. The motivation is simple: if //! we cannot properly track nodes, then we track no nodes. Then, in //! order to perform reductions, we do not distinguish between //! necessary reductions and unnecessary reductions, and perform all //! possible reductions. In other words, when we are about to plant a //! new node in the forest, we first check the last child of the //! to-be-parent. If it is properly closed, we do nothing. //! Otherwise, we recursively descend into its children(s) to find all //! last children, and perform all possible valid reductions. use super::*; use crate::{ atom::{Atom, DefaultAtom}, default::Error, item::default::DefaultForest, }; use grammar::{GrammarLabel, TNT}; use graph::Graph; use std::collections::HashMap as Map; impl DefaultForest> { /// Perform reduction at last descendents of `node`. /// /// # Parameters /// /// The parameter `pos` is the next starting position. It is used /// to find the descendents that need reductions: only those nodes /// which have descendents with the correct ending positions will /// receive reductions. /// /// The parameter `atom` is used to know which rule numbers are /// deemed as accepting. Only accepting rule numbers can receive /// reductions: this is the definition of being accepting. /// /// The parameter `ter` is used to fill in segments for virtual /// nodes if they are found along the way. /// /// The parameter `accept_root` controls whether we want to /// perform reduction on the root. /// /// # Errors /// /// 1. Index out of bounds: some node number is corrupted. /// 2. Node has no label: some node label is lost. pub(crate) fn reduction( &mut self, node: usize, pos: usize, ter: usize, atom: &DefaultAtom, accept_root: bool, ) -> Result { let mut result = node; // step 1: Determine if this needs reductions. if !accept_root && self.root() == Some(node) { return Ok(result); } // REVIEW: Do we need to check the end matches the position? let mut node_label = self .vertex_label(node)? .ok_or(Error::NodeNoLabel(node))? .label(); if node_label.end().is_some() { return Ok(result); } // Data type for representing the status when performing a // search. #[derive(PartialEq, Eq, Copy, Clone, Debug)] enum Status { Correct, Incorrect, Visited, } impl From for Status { fn from(value: bool) -> Self { match value { true => Self::Correct, false => Self::Incorrect, } } } use Status::*; // step 2: Find descendents that need reductions. let mut correct_ends: Map = Default::default(); let mut order_of_correct_ends: Vec = Vec::new(); let mut stack: Vec = vec![node]; let mut file = std::fs::OpenOptions::new().append(true).open("debug.log"); use std::{borrow::BorrowMut, io::Write}; // Whether or not to write a debug file. let to_write = true; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("beginning, pos = {pos}, node = {node}\n").as_bytes()) .unwrap(); } } 'stack_loop: while let Some(top) = stack.pop() { if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("top: {top}\n").as_bytes()).unwrap(); } } let old_value = correct_ends.get(&top).copied(); if matches!(old_value, Some(Correct) | Some(Incorrect)) { continue 'stack_loop; } correct_ends.insert(top, Visited); let top_label = self.vertex_label(top)?.ok_or(Error::NodeNoLabel(top))?; if let Some(end) = top_label.label().end() { correct_ends.insert(top, (end == pos).into()); continue 'stack_loop; } if let Some(rule) = top_label.label().label().rule() { // A rule node is not considered directly: it should // be affected by its child implicitly. // // We only consider a rule node if it is deemed // accepting by the atom. if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: {rule}, {stack:?}\n", line!()).as_bytes()) .unwrap(); } } if atom .is_accepting(rule * 2 + 1) .map_err(|_| Error::IndexOutOfBounds(2 * rule + 1, atom.accepting_len()))? { let mut has_unexplored_children = false; let mut inserted = false; 'child_loop: for child in self.children_of(top)? { match correct_ends.get(&child).copied() { Some(Correct) => { correct_ends.insert(top, Correct); inserted = true; } None => { has_unexplored_children = true; break 'child_loop; } _ => {} } } if has_unexplored_children { stack.push(top); stack.extend( self.children_of(top)? .filter(|child| correct_ends.get(child).is_none()), ); } else if !inserted { correct_ends.insert(top, Incorrect); } } else { correct_ends.insert(top, Incorrect); } continue 'stack_loop; } if top_label.is_packed() { let mut has_unexplored_children = false; let mut inserted = false; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: packed, {stack:?}\n", line!()).as_bytes()) .unwrap(); } } for child in self.children_of(top)? { match correct_ends.get(&child).copied() { Some(Correct) => { // NOTE: This is not recorded in the // correct orders, as we do not handle // packed nodes directly. correct_ends.insert(top, Correct); inserted = true; } None => { has_unexplored_children = true; } _ => {} } } if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all( format!("{}: packed, {has_unexplored_children}\n", line!()).as_bytes(), ) .unwrap(); } } if has_unexplored_children { stack.push(top); stack.extend( self.children_of(top)? .filter(|child| correct_ends.get(child).is_none()), ); } else if !inserted { correct_ends.insert(top, Incorrect); } continue 'stack_loop; } let degree = self.degree(top)?; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: degree = {degree}\n", line!()).as_bytes()) .unwrap(); } } let last_index = if degree != 0 { degree - 1 } else { // a leaf is supposed to be a terminal node and hence // should have an ending position let end = match top_label.label().end() { None => match top_label.label().label().tnt() { Some(TNT::Ter(_)) => { panic!("a terminal node {top} has no ending position?"); } Some(TNT::Non(nt)) => { correct_ends.insert(top, Correct); self.close_pavi(atom.borrow(), PaVi::Virtual(nt, ter, top), pos)?; continue 'stack_loop; } None => { unreachable!("we already handled the rule case above"); } }, Some(end) => end, }; correct_ends.insert(top, (end == pos).into()); if end == pos { order_of_correct_ends.push(top); } continue 'stack_loop; }; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: last = {last_index}\n", line!()).as_bytes()) .unwrap(); } } let last_child = self.nth_child(top, last_index)?; if let Some(child_correctness_value) = correct_ends.get(&last_child).copied() { if child_correctness_value != Visited { correct_ends.insert(top, child_correctness_value); if child_correctness_value == Correct { order_of_correct_ends.push(top); } } } else { stack.extend([top, last_child]); } } if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all( format!("{}: orders = {order_of_correct_ends:?}\n", line!()).as_bytes(), ) .unwrap(); } } // step 3: perform reductions by `splone`. // // NOTE: We must fix the order from top to bottom: this is the // reverse order of `order_of_correct_ends` . for node in order_of_correct_ends.into_iter().rev() { let label = self.vertex_label(node)?.ok_or(Error::NodeNoLabel(node))?; let degree = self.degree(node)?; if !matches!(label.label().label().tnt(), Some(TNT::Non(_))) { continue; } #[cfg(debug_assertions)] { assert!(label.label().end().is_none()); assert_ne!(degree, 0); } let last_index = degree - 1; self.splone(node, Some(pos), last_index, false)?; } node_label.set_end(pos); if let Some(packed) = self.query_label(ForestLabel::new(node_label, ForestLabelType::Packed)) { result = packed; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: packed = {packed}\n", line!()).as_bytes()) .unwrap(); } } } else if let Some(plain) = self.query_label(ForestLabel::new(node_label, ForestLabelType::Plain)) { result = plain; if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(format!("{}: plain = {plain}\n", line!()).as_bytes()) .unwrap(); } } } if to_write { if let Ok(ref mut file) = file.borrow_mut() { file.write_all(&[101, 110, 100, 10]).unwrap(); } } Ok(result) } }