src/bsr.h


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

#ifndef BSR_H
#define BSR_H

/* This file implements the "Binary Subtree Representation" of a
   derivation forest.  See the following paper for the miraculous
   details:

   Derivation representation using binary subtree sets
   
   by
   
   Elizabeth Scott, Adrian Johnstone and L.Thomas van Binsbergen. */

/* Note that there is an implementation of this GLL algorithm in the
   following GitHub repository:

   https://github.com/goccmack/gogll

   But the implementation there uses array look-up to determine if a
   BSR has already been added to a set.  I guess the reason is that
   this is not the bottleneck of performance in its applications.  But
   I am not satisfied with that solution.  I want to take advantage of
   hash-table's almost constant time look-up feature to implement
   this. */

/* A BSR set has two types.

   The first type is of the form

   (X := alpha, i, j, k),

   where X := alpha is a grammar rule.  This is equivalent with a
   5-tuple

   (X, a, i, j, k),

   where X is the integer corresponding to X and a is the index of the
   rule X := alpha.

   The second type is of the form

   (X := beta . gamma, i, j, k),

   where beta is a prefix of the rule X := beta gamma.  So this is
   equivalent with a 6-tuple

   (X, a, b, i, j, k),

   where a is the index of the rule X := beta gamma and b is the
   length of beta.

   The simplest way to implement this is to have two "sets", one for
   each type.  Each set is an array of arrays, where the root array's
   indices correspond to non-terminals, and the leaf arrays correspond
   to rules of that non-terminal.  Then the rest is implemented
   through hash-tables, where a hash table's values are yet another
   hash table, and so on.

   I don't know if there are better implementations.  If I think of
   better ways in the future, I will re-implement the BSR sets. */

/* Note that we also need to implement "process descriptors" for the
   GLL parser.  Since a process descriptor is of the form

   (X := beta . gamma, i, j, k),

   where X := beta gamma is a rule and i is the length of beta, we can
   represent a process descriptor as

   (X, a, i, j, k),

   i.e. as a first-type BSR set.  But the meaning of the "i" is
   different. */

int place(int x);

#endif