diff options
Diffstat (limited to 'src/bsr.h')
-rw-r--r-- | src/bsr.h | 77 |
1 files changed, 77 insertions, 0 deletions
diff --git a/src/bsr.h b/src/bsr.h new file mode 100644 index 0000000..8578ab2 --- /dev/null +++ b/src/bsr.h @@ -0,0 +1,77 @@ +#ifndef BSR_H +#define BSR_H + +/* This file implements the "Binary Subtree Representation" of a + derivation forest. See the following paper for the miraculous + details: + + Derivation representation using binary subtree sets + + by + + Elizabeth Scott, Adrian Johnstone and L.Thomas van Binsbergen. */ + +/* Note that there is an implementation of this GLL algorithm in the + following GitHub repository: + + https://github.com/goccmack/gogll + + But the implementation there uses array look-up to determine if a + BSR has already been added to a set. I guess the reason is that + this is not the bottleneck of performance in its applications. But + I am not satisfied with that solution. I want to take advantage of + hash-table's almost constant time look-up feature to implement + this. */ + +/* A BSR set has two types. + + The first type is of the form + + (X := alpha, i, j, k), + + where X := alpha is a grammar rule. This is equivalent with a + 5-tuple + + (X, a, i, j, k), + + where X is the integer corresponding to X and a is the index of the + rule X := alpha. + + The second type is of the form + + (X := beta . gamma, i, j, k), + + where beta is a prefix of the rule X := beta gamma. So this is + equivalent with a 6-tuple + + (X, a, b, i, j, k), + + where a is the index of the rule X := beta gamma and b is the + length of beta. + + The simplest way to implement this is to have two "sets", one for + each type. Each set is an array of arrays, where the root array's + indices correspond to non-terminals, and the leaf arrays correspond + to rules of that non-terminal. Then the rest is implemented + through hash-tables, where a hash table's values are yet another + hash table, and so on. + + I don't know if there are better implementations. If I think of + better ways in the future, I will re-implement the BSR sets. */ + +/* Note that we also need to implement "process descriptors" for the + GLL parser. Since a process descriptor is of the form + + (X := beta . gamma, i, j, k), + + where X := beta gamma is a rule and i is the length of beta, we can + represent a process descriptor as + + (X, a, i, j, k), + + i.e. as a first-type BSR set. But the meaning of the "i" is + different. */ + +int place(int x); + +#endif |