chain/src/item/reduction.rs


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478

//! This module implements the operation of reductions on the forest.
//!
//! # What are reductions
//!
//! To explain this in detail, we first investigate the
//! nullable-closure process, then discuss what reduction is and what
//! it means in the chain-rule machine, and then explain the problem.
//!
//! ## Nullable-closure process
//!
//! In the process of running the chain-rule machine, we will collapse
//! edges, in the sense that, if there is an edge from node a to node
//! b and another from node b to node c, and if the edge from node a
//! to node b is "nullable", that is if the edge corresponds to a
//! position in the atomic language that is deemed nullable by the
//! atomic language, then we also make an edge from node a to node c.
//!
//! The purpose of this process of forming the nullable closure is to
//! ensure that the chain-rule machine can save the time to traverse
//! the entire machine to find the nullable closure later on.  But a
//! side-effect of this process is that reductions are not explicitly
//! marked.
//!
//! Note that we must perform this nullable closure process in order
//! that the time complexity of the chain-rule machine lies within the
//! cubic bound.
//!
//! ## Three types of actions
//!
//! We can imagine a "traditional parser generator" as a stack
//! machine: there are three types of actions associated with it,
//! depending on the current state and the current input token.  The
//! first is expansion: it means that we are expanding from a
//! non-terminal, by some rule.  The second is a normal push: we just
//! continue parsing according to the current rule.  The final one is
//! reduction: it means the current expansion from a non-terminal has
//! terminated and we go back to the previous level of expansion.
//!
//! ## Relation to the chain-rule machine
//!
//! For our chain-rule machine, expansion means to create a new node
//! pointing at the current node, forming a path with one more length.
//! A normal push means to create a new node that points to the target
//! of an edge going out from the current node, which was not created
//! by the process of forming nullable closures.  And the reduction
//! means to create a new node that points to the target of an edge
//! going out from the current node, which *was* created by the
//! process of forming nullable closures.
//!
//! # Problem
//!
//! As can be seen from the previous paragraph, the distinction
//! between a normal push and a reduction in the chain-rule machine is
//! simply whether or not the original edge was created by the process
//! of forming nullable closures.  For the chain-rule machine, this
//! does not matter: it can function well.  For the formation of the
//! derivation forest, however, this is not so well: we cannot
//! read-off immediately whether or not to perform reductions from the
//! chain-rule machine.
//!
//! # Solutions
//!
//! I tried some solutions to solve this problem.
//!
//! ## Complete at the end
//!
//! The first idea I tried is to find the nodes that were not closed
//! properly and fill in the needed reductions, at the end of the
//! parse.  This idea did not end up well, as we already lost a lot of
//! information at the end of the parse, so it becomes quite difficult
//! to know on which nodes we need to perform reductions.
//!
//! ## Record precise information
//!
//! The next idea is to record the nodes that were skipped by the
//! nullable-closure process, and then when we encounter a skipped
//! segment, we perform the reductions there.  This idea sounds
//! feasible at first, but it turns out that we cannot track the nodes
//! properly.  That is, when running the chain-rule machine, we will
//! split and clone nodes from time to time.  A consequence is that
//! the old node numbers that were recorded previously will not be
//! correct afterwards.  This means we cannot know exactly which
//! reductions to perform later.
//!
//! ## Guided brute-force
//!
//! Now I am trying out the third idea.  The motivation is simple: if
//! we cannot properly track nodes, then we track no nodes.  Then, in
//! order to perform reductions, we do not distinguish between
//! necessary reductions and unnecessary reductions, and perform all
//! possible reductions.  In other words, when we are about to plant a
//! new node in the forest, we first check the last child of the
//! to-be-parent.  If it is properly closed, we do nothing.
//! Otherwise, we recursively descend into its children(s) to find all
//! last children, and perform all possible valid reductions.

use super::*;
use crate::{
    atom::{Atom, DefaultAtom},
    default::Error,
    item::default::DefaultForest,
};
use grammar::{GrammarLabel, TNT};
use graph::Graph;

use std::collections::HashMap as Map;

impl DefaultForest<ForestLabel<GrammarLabel>> {
    /// Perform reduction at last descendents of `node`.
    ///
    /// # Parameters
    ///
    /// The parameter `pos` is the next starting position.  It is used
    /// to find the descendents that need reductions: only those nodes
    /// which have descendents with the correct ending positions will
    /// receive reductions.
    ///
    /// The parameter `atom` is used to know which rule numbers are
    /// deemed as accepting.  Only accepting rule numbers can receive
    /// reductions: this is the definition of being accepting.
    ///
    /// The parameter `ter` is used to fill in segments for virtual
    /// nodes if they are found along the way.
    ///
    /// The parameter `accept_root` controls whether we want to
    /// perform reduction on the root.
    ///
    /// # Errors
    ///
    /// 1. Index out of bounds: some node number is corrupted.
    /// 2. Node has no label: some node label is lost.
    pub(crate) fn reduction(
        &mut self,
        node: usize,
        pos: usize,
        ter: usize,
        atom: &DefaultAtom,
        accept_root: bool,
    ) -> Result<usize, Error> {
        let mut result = node;

        // if node == 15 && pos == 2 {
        //     let _ = self.print_viz("pos really before splone.gv");
        // }

        // step 1: Determine if this needs reductions.

        if !accept_root && self.root() == Some(node) {
            return Ok(result);
        }

        // REVIEW: Do we need to check the end matches the position?

        let mut node_label = self
            .vertex_label(node)?
            .ok_or(Error::NodeNoLabel(node))?
            .label();

        if node_label.end().is_some() {
            return Ok(result);
        }

        // Data type for representing the status when performing a
        // search.

        #[derive(PartialEq, Eq, Copy, Clone, Debug)]
        enum Status {
            Correct,
            Incorrect,
            Visited,
        }

        impl From<bool> for Status {
            fn from(value: bool) -> Self {
                match value {
                    true => Self::Correct,
                    false => Self::Incorrect,
                }
            }
        }

        use Status::*;

        // step 2: Find descendents that need reductions.

        let mut correct_ends: Map<usize, Status> = Default::default();

        let mut order_of_correct_ends: Vec<usize> = Vec::new();

        let mut stack: Vec<usize> = vec![node];

        let mut file = std::fs::OpenOptions::new().append(true).open("debug.log");

        use std::{borrow::BorrowMut, io::Write};

        // Whether or not to write a debug file.
        let to_write = true;

        if to_write {
            if let Ok(ref mut file) = file.borrow_mut() {
                file.write_all(format!("beginning, pos = {pos}, node = {node}\n").as_bytes())
                    .unwrap();
            }
        }

        'stack_loop: while let Some(top) = stack.pop() {
            if to_write {
                if let Ok(ref mut file) = file.borrow_mut() {
                    file.write_all(format!("top: {top}\n").as_bytes()).unwrap();
                }
            }

            let old_value = correct_ends.get(&top).copied();

            if matches!(old_value, Some(Correct) | Some(Incorrect)) {
                continue 'stack_loop;
            }

            correct_ends.insert(top, Visited);

            let top_label = self.vertex_label(top)?.ok_or(Error::NodeNoLabel(top))?;

            if let Some(end) = top_label.label().end() {
                correct_ends.insert(top, (end == pos).into());

                continue 'stack_loop;
            }

            if let Some(rule) = top_label.label().label().rule() {
                // A rule node is not considered directly: it should
                // be affected by its child implicitly.
                //
                // We only consider a rule node if it is deemed
                // accepting by the atom.

                if to_write {
                    if let Ok(ref mut file) = file.borrow_mut() {
                        file.write_all(format!("{}: {rule}, {stack:?}\n", line!()).as_bytes())
                            .unwrap();
                    }
                }

                if atom
                    .is_accepting(rule * 2 + 1)
                    .map_err(|_| Error::IndexOutOfBounds(2 * rule + 1, atom.accepting_len()))?
                {
                    let mut has_unexplored_children = false;
                    let mut inserted = false;

                    'child_loop: for child in self.children_of(top)? {
                        match correct_ends.get(&child).copied() {
                            Some(Correct) => {
                                correct_ends.insert(top, Correct);

                                inserted = true;
                            }
                            None => {
                                has_unexplored_children = true;

                                break 'child_loop;
                            }
                            _ => {}
                        }
                    }

                    if has_unexplored_children {
                        stack.push(top);
                        stack.extend(
                            self.children_of(top)?
                                .filter(|child| correct_ends.get(child).is_none()),
                        );
                    } else if !inserted {
                        correct_ends.insert(top, Incorrect);
                    }
                } else {
                    correct_ends.insert(top, Incorrect);
                }

                continue 'stack_loop;
            }

            if top_label.is_packed() {
                let mut has_unexplored_children = false;
                let mut inserted = false;

                if to_write {
                    if let Ok(ref mut file) = file.borrow_mut() {
                        file.write_all(format!("{}: packed, {stack:?}\n", line!()).as_bytes())
                            .unwrap();
                    }
                }

                for child in self.children_of(top)? {
                    match correct_ends.get(&child).copied() {
                        Some(Correct) => {
                            // NOTE: This is not recorded in the
                            // correct orders, as we do not handle
                            // packed nodes directly.

                            correct_ends.insert(top, Correct);
                            inserted = true;
                        }
                        None => {
                            has_unexplored_children = true;
                        }
                        _ => {}
                    }
                }

                if to_write {
                    if let Ok(ref mut file) = file.borrow_mut() {
                        file.write_all(
                            format!("{}: packed, {has_unexplored_children}\n", line!()).as_bytes(),
                        )
                        .unwrap();
                    }
                }

                if has_unexplored_children {
                    stack.push(top);
                    stack.extend(
                        self.children_of(top)?
                            .filter(|child| correct_ends.get(child).is_none()),
                    );
                } else if !inserted {
                    correct_ends.insert(top, Incorrect);
                }

                continue 'stack_loop;
            }

            let degree = self.degree(top)?;

            if to_write {
                if let Ok(ref mut file) = file.borrow_mut() {
                    file.write_all(format!("{}: degree = {degree}\n", line!()).as_bytes())
                        .unwrap();
                }
            }

            let last_index = if degree != 0 {
                degree - 1
            } else {
                // a leaf is supposed to be a terminal node and hence
                // should have an ending position

                let end = match top_label.label().end() {
                    None => match top_label.label().label().tnt() {
                        Some(TNT::Ter(_)) => {
                            panic!("a terminal node {top} has no ending position?");
                        }
                        Some(TNT::Non(nt)) => {
                            self.close_pavi(atom.borrow(), PaVi::Virtual(nt, ter, top), pos)?;

                            // dbg!(top, nt, ter, pos, self.degree(top)?, degree);

                            let correctness = self.degree(top)? > 0;

                            correct_ends.insert(top, correctness.into());

                            continue 'stack_loop;
                        }
                        None => {
                            unreachable!("we already handled the rule case above");
                        }
                    },
                    Some(end) => end,
                };

                correct_ends.insert(top, (end == pos).into());

                if end == pos {
                    order_of_correct_ends.push(top);
                }

                continue 'stack_loop;
            };

            if to_write {
                if let Ok(ref mut file) = file.borrow_mut() {
                    file.write_all(format!("{}: last = {last_index}\n", line!()).as_bytes())
                        .unwrap();
                }
            }

            let last_child = self.nth_child(top, last_index)?;

            if let Some(child_correctness_value) = correct_ends.get(&last_child).copied() {
                if child_correctness_value != Visited {
                    correct_ends.insert(top, child_correctness_value);

                    if child_correctness_value == Correct {
                        order_of_correct_ends.push(top);
                    }
                }
            } else {
                stack.extend([top, last_child]);
            }
        }

        if to_write {
            if let Ok(ref mut file) = file.borrow_mut() {
                file.write_all(
                    format!("{}: orders = {order_of_correct_ends:?}\n", line!()).as_bytes(),
                )
                .unwrap();
            }
        }

        // step 3: perform reductions by `splone`.
        //
        // NOTE: We must fix the order from top to bottom: this is the
        // reverse order of `order_of_correct_ends` .

        // if node == 15 && pos == 2 {
        //     dbg!(&order_of_correct_ends);
        //     let _ = self.print_viz("pos before splone.gv");
        // }

        for node in order_of_correct_ends.into_iter().rev() {
            let label = self.vertex_label(node)?.ok_or(Error::NodeNoLabel(node))?;
            let degree = self.degree(node)?;

            if !matches!(label.label().label().tnt(), Some(TNT::Non(_))) {
                continue;
            }

            #[cfg(debug_assertions)]
            {
                assert!(label.label().end().is_none());

                assert_ne!(degree, 0);
            }

            let last_index = degree - 1;

            // if node == 15 && pos == 2 {
            //     let _ = self.print_viz("before splone.gv");
            // }

            self.splone(node, Some(pos), last_index, false)?;
        }

        node_label.set_end(pos);

        if let Some(packed) =
            self.query_label(ForestLabel::new(node_label, ForestLabelType::Packed))
        {
            result = packed;

            if to_write {
                if let Ok(ref mut file) = file.borrow_mut() {
                    file.write_all(format!("{}: packed = {packed}\n", line!()).as_bytes())
                        .unwrap();
                }
            }
        } else if let Some(plain) =
            self.query_label(ForestLabel::new(node_label, ForestLabelType::Plain))
        {
            result = plain;

            if to_write {
                if let Ok(ref mut file) = file.borrow_mut() {
                    file.write_all(format!("{}: plain = {plain}\n", line!()).as_bytes())
                        .unwrap();
                }
            }
        }

        if to_write {
            if let Ok(ref mut file) = file.borrow_mut() {
                file.write_all(&[101, 110, 100, 10]).unwrap();
            }
        }

        Ok(result)
    }
}