structural change: separate crates out

I put functionalities that are not strictly core to separate crates, so that the whole package becomes more modular, and makes it easier to try other parsing algorithms in the future. Also I have to figure the forests out before finishing the core chain-rule algorithm, as the part about forests affects the labels of the grammars directly. From my experiences in writing the previous version, it is asking for trouble to change the labels type dramatically at a later point: too many places need to be changed. Thus I decide to figure the rough part of forests out. Actually I only have to figure out how to attach forests fragments to edges of the underlying atomic languages, and the more complex parts of putting forests together can be left to the recorders, which is my vision of assembling semi-ring values during the chain-rule machine. It should be relatively easy to produce forests fragments from grammars since we are just trying to extract some information from the grammar, not to manipulate those information in some complicated way. We have to do some manipulations in the process, though, in order to make sure that the nulling and epsilon-removal processes do not invalidate these fragments.
author: JSDurand <mmemmew@gmail.com> 2023-01-03 23:44:02 +0800
committer: JSDurand <mmemmew@gmail.com> 2023-01-03 23:44:02 +0800
commit: bdbd4b4dc21af09711c97d3f903877443199af06 (patch)
tree: c6a9602f72ee1f6fd7fd3f64b8679a4de50a0159 /nfa/src/lib.rs
parent: 8463dd24f815fe2b8f25fe9763e0a43023bfbb20 (diff)
1 files changed, 72 insertions, 19 deletions
diff --git a/nfa/src/lib.rs b/nfa/src/lib.rs
index b1d62b3..31bd69a 100644
--- a/nfa/src/lib.rs
+++ b/nfa/src/lib.rs
@@ -14,7 +14,7 @@ use core::fmt::Display;
 
 use std::ops::{Deref, DerefMut};
 
-use graph::{Graph, GraphLabel, LabelGraph};
+use graph::{Graph, GraphLabel, LabelExtGraph};
 
 use error::Error;
 
@@ -52,14 +52,14 @@ pub trait Regex<T: GraphLabel>: Graph + Display + Clone {
         }
     }
 
-    // TODO: add functions that determine if certain "positions" in a
+    // TODO: Add functions that determine if certain "positions" in a
     // regular language satisfy some special properties, like at the
     // end of a Kleene star, or at the end of a regular language, et
     // cetera.  These might be needed later.
 }
 
-/// Since Option<T> does not inherit the Display from T, we wrap it to
-/// provide an automatic implementation of Display.
+/// Since `Option<T>` does not inherit the `Display` from `T`, we wrap
+/// it to provide an automatic implementation of `Display`.
 #[derive(Debug, Clone, Copy, Ord, PartialOrd, Eq, PartialEq, Hash)]
 pub struct DOption<T>(Option<T>);
 
@@ -120,42 +120,98 @@ pub enum SoC<T> {
 /// The expected behvaiour of a nondeterministic finite automaton.
 ///
 /// Every NFA is a special labelled graph.
-pub trait Nfa<T: GraphLabel>: LabelGraph<T> {
+pub trait Nfa<T: GraphLabel>: LabelExtGraph<T> {
+    // TODO: This trait should have a type for the labels.
+
     /// Remove all empty transitions from the nondeterministic finite
     /// automaton.
-    fn remove_epsilon(&mut self) -> Result<(), Error>;
+    fn remove_epsilon<F>(&mut self, f: F) -> Result<(), Error>
+    where
+        F: Fn(T) -> bool;
 
     /// Return a state-minimal NFA equivalent with the original one.
     ///
     /// This is not required.  It is just to allow me to experiment
     /// with NFA optimization algorithms.
-    fn minimize(&self) -> Result<Self, Error> {
+    fn minimize(&self) -> Result<Box<Self>, Error> {
         Err(Error::UnsupportedOperation)
     }
 
+    /// Check every node or edge by a given predicate.
+    ///
+    /// This should also verify that every node and edge has correct
+    /// indices, so that we can safely use `unwrap` later.  A
+    /// consequence is that, if one only wants to check the validity
+    /// of nodes and edges, one can pass a function that always
+    /// returns `true`.
+    #[inline]
+    fn labels_satisfy(&self, f: impl Fn(T) -> bool) -> Result<bool, Error> {
+        let nodes_len = self.nodes_len();
+        let mut result = true;
+
+        for node in self.nodes() {
+            for (label, children_iter) in self.labels_of(node)? {
+                for child in children_iter {
+                    if child >= nodes_len {
+                        dbg!(node, label);
+                        return Err(graph::error::Error::IndexOutOfBounds(child, nodes_len).into());
+                    }
+                }
+
+                // NOTE: Avoid executing `f` if `result` is already
+                // false.  But still proceed in verifying that nodes
+                // and edges are correct: the correctness of nodes and
+                // edges is more important than the function execution
+                // results, as the former directly affects the safety
+                // of many algorithms.
+                if result && !f(*label) {
+                    dbg!(node, label);
+                    result = false;
+                }
+            }
+        }
+
+        Ok(result)
+    }
+
     /// When we convert a regular expression to a nondeterministic
     /// finite automaton, the label should be optional, so we use a
     /// different type for the result type.
     type FromRegex<S: GraphLabel + Display + Default>;
 
-    /// Build a nondeterministic finite automaton out of a set of
-    /// regular expressions.
+    /// Build a nondeterministic finite automaton out of a set
+    /// `regexps` of regular expressions.
+    ///
+    /// The `sub_pred` is a predicate function that returns an
+    /// indication whether to carry the value around or to substitute
+    /// by another regular language in the given set.
+    ///
+    /// The `default` parameter specifies the label of a default edge,
+    /// that goes from a node to another, both of which are not
+    /// associated with the given regular languages.
     ///
-    /// The SUB_PRED is a predicate function that returns an optional
-    /// index for a label.  If it returns some index, then this means
-    /// we should substitute the index-th regular expression in the
-    /// set, whereever that label occurs in the set of regular
-    /// expressions.
+    /// This function should perform Thompson's construction,
+    /// basically.
     fn to_nfa(
         regexps: &[impl Regex<RegexType<T>>],
+        // TODO: The transformation should produce more information.
         sub_pred: impl Fn(T) -> Result<SoC<T>, Error>,
+        default: Option<T>,
     ) -> Result<Self::FromRegex<DOption<T>>, Error>;
 
     /// Remove all dead states from the nondeterministic finite
     /// automaton.
     ///
-    /// A state is dead if there are no edges going to the state.
-    fn remove_dead(&mut self) -> Result<(), Error>;
+    /// A state is dead if there are no edges going to the state, and
+    /// if it is not reserved.
+    ///
+    /// # Note
+    ///
+    /// Actually an implementation is allowed to just remove all edges
+    /// out of the dead nodes.  Then these nodes are considered gone
+    /// from the graph, and we don't need to re-index the existing
+    /// edges.  We can call this "a poor man's removal of nodes".
+    fn remove_dead(&mut self, reserve: impl Fn(usize) -> bool) -> Result<(), Error>;
 
     /// For each edge from A to B whose edge is considered nullable by
     /// a function `f`, and for every edge from B to C, add an edge
@@ -169,6 +225,3 @@ pub trait Nfa<T: GraphLabel>: LabelGraph<T> {
 
 pub mod default;
 pub mod desrec;
-
-#[cfg(test)]
-mod nfa_tests {}
author	JSDurand <mmemmew@gmail.com>	2023-01-03 23:44:02 +0800
committer	JSDurand <mmemmew@gmail.com>	2023-01-03 23:44:02 +0800
commit	bdbd4b4dc21af09711c97d3f903877443199af06 (patch)
tree	c6a9602f72ee1f6fd7fd3f64b8679a4de50a0159 /nfa/src/lib.rs
parent	8463dd24f815fe2b8f25fe9763e0a43023bfbb20 (diff)