src/bsr.h


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

#ifndef BSR_H
#define BSR_H
#include "util.h"
#include "grammar.h"

/* This file implements the "Binary Subtree Representation" of a
   derivation forest.  See the following paper for the miraculous
   details:

   Derivation representation using binary subtree sets

   by

   Elizabeth Scott, Adrian Johnstone and L.Thomas van Binsbergen. */

/* Note that there is an implementation of this GLL algorithm in the
   following GitHub repository:

   https://github.com/goccmack/gogll

   But the implementation there uses array look-up to determine if a
   BSR has already been added to a set.  I guess the reason is that
   this is not the bottleneck of performance in its applications.  But
   I am not satisfied with that solution.  I want to take advantage of
   hash-table's almost constant-time look-up feature to implement
   this. */

/* A BSR set has two types.

   The first type is of the form

   (X := alpha, i, j, k),

   where X := alpha is a grammar rule.  This is equivalent with a
   5-tuple

   (X, a, i, j, k),

   where X is the integer corresponding to X and a is the index of the
   rule X := alpha.

   The second type is of the form

   (X := beta . gamma, i, j, k),

   where beta is a prefix of the rule X := beta gamma.  So this is
   equivalent with a 6-tuple

   (X, a, b, i, j, k),

   where a is the index of the rule X := beta gamma and b is the
   length of beta.

   The simplest way to implement this is to have two "sets", one for
   each type.  But the two types need to be implemented in different
   ways.  For both types, we need to know if a BSR belongs to the set
   in (almost) constnat time, so we exclude the possibility of linked
   lists first.

   For the first type, we also need the ability to loop through all
   BSRs of the form (X, ...) for a given non-terminal X.  Further, we
   need to index the left and the right extents in almost constant
   time as well.  So we represent the first type as an array of hash
   tables, whose indices correspond to nonterminals.  Each hash table
   has as keys (i, k) and as values hash tables whose keys are those
   (a, j) such that (X, a, i, j, k) belong to the set.

   For the second type, we require the capability to loop through all
   BSRs of the form (X, a, b, ...) in a set for some given X, a, and
   b.  So we represent the second type as an array of arrays of arrays
   of hash tables, whose array indices correspond to X, a, and b,
   respectively.  The hash tables have keys (i, k) and values those j
   such that (X, a, b, i, j, k) belongs to the set.

   I don't know if there are better implementations.  If I think of
   better ways in the future, I will re-implement the BSR sets. */

/* With the newly written library tuple.h, the implementation changes
   as follows.

   The first type is stored as a "luple5", but the coordinates are in
   the following order.

   (X, a, i, k, j) => (X, i, j, a, k)

   This is to ensure that we can quickly find BSR's with a given
   non-terminal X and given left and right extents i and j.  This is
   necessary for extracting an SPPF out of it, and for deciding
   whether the parser succeeds in parsing the entire input.

   The second type os stored as a "luple6", with the following
   coordinates order.

   (X, a, b, i, k, j) => (X, a, b, i, j, k)

   The reason is similar to the above.  Though this is not necessary for
   deciding if the parser succeeds. */

typedef struct bsr_s bsr;

bsr *new_bsr(Grammar *g, NUM input_len, BOOL * const restrict errorp);

/* No flag to control the deletion, as the user cannot change how the
   data is stored. */
void destroy_bsr(bsr * const restrict bsrp);

/* the function bsrAdd in the paper */

/* X = non-terminal index, a = index of alternate, c = length of
   prefix */

/* i = left-extent of X, and j = right-extent.  k = the left-extent of
   the right-most terminal or non-terminal in the alternate
   corresponding to a. */

H_ATTR BOOL bsr_add(CCR_MOD(Grammar *) g, bsr * const restrict b,
                    pair6 p6);

BOOL bsr_find(CCR_MOD(bsr *) b, CCR_MOD(Grammar *) g,
              BOOL * const restrict errorp,
              pair6 p6);

P_ATTR BOOL bsr_lookup(bsr * b, NUM X, NUM i, NUM j);

/* Print a new line after LINE_LEN many elements. */
void bsr_print(bsr *b, Grammar *g, NUM line_len);

#endif