1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
|
#ifndef BSR_H
#define BSR_H
/* This file implements the "Binary Subtree Representation" of a
derivation forest. See the following paper for the miraculous
details:
Derivation representation using binary subtree sets
by
Elizabeth Scott, Adrian Johnstone and L.Thomas van Binsbergen. */
/* Note that there is an implementation of this GLL algorithm in the
following GitHub repository:
https://github.com/goccmack/gogll
But the implementation there uses array look-up to determine if a
BSR has already been added to a set. I guess the reason is that
this is not the bottleneck of performance in its applications. But
I am not satisfied with that solution. I want to take advantage of
hash-table's almost constant time look-up feature to implement
this. */
/* A BSR set has two types.
The first type is of the form
(X := alpha, i, j, k),
where X := alpha is a grammar rule. This is equivalent with a
5-tuple
(X, a, i, j, k),
where X is the integer corresponding to X and a is the index of the
rule X := alpha.
The second type is of the form
(X := beta . gamma, i, j, k),
where beta is a prefix of the rule X := beta gamma. So this is
equivalent with a 6-tuple
(X, a, b, i, j, k),
where a is the index of the rule X := beta gamma and b is the
length of beta.
The simplest way to implement this is to have two "sets", one for
each type. Each set is an array of arrays, where the root array's
indices correspond to non-terminals, and the leaf arrays correspond
to rules of that non-terminal. Then the rest is implemented
through hash-tables, where a hash table's values are yet another
hash table, and so on.
I don't know if there are better implementations. If I think of
better ways in the future, I will re-implement the BSR sets. */
/* Note that we also need to implement "process descriptors" for the
GLL parser. Since a process descriptor is of the form
(X := beta . gamma, i, j, k),
where X := beta gamma is a rule and i is the length of beta, we can
represent a process descriptor as
(X, a, i, j, k),
i.e. as a first-type BSR set. But the meaning of the "i" is
different. */
int place(int x);
#endif
|