summaryrefslogtreecommitdiff
path: root/src/bsr.h
diff options
context:
space:
mode:
Diffstat (limited to 'src/bsr.h')
-rw-r--r--src/bsr.h77
1 files changed, 77 insertions, 0 deletions
diff --git a/src/bsr.h b/src/bsr.h
new file mode 100644
index 0000000..8578ab2
--- /dev/null
+++ b/src/bsr.h
@@ -0,0 +1,77 @@
+#ifndef BSR_H
+#define BSR_H
+
+/* This file implements the "Binary Subtree Representation" of a
+ derivation forest. See the following paper for the miraculous
+ details:
+
+ Derivation representation using binary subtree sets
+
+ by
+
+ Elizabeth Scott, Adrian Johnstone and L.Thomas van Binsbergen. */
+
+/* Note that there is an implementation of this GLL algorithm in the
+ following GitHub repository:
+
+ https://github.com/goccmack/gogll
+
+ But the implementation there uses array look-up to determine if a
+ BSR has already been added to a set. I guess the reason is that
+ this is not the bottleneck of performance in its applications. But
+ I am not satisfied with that solution. I want to take advantage of
+ hash-table's almost constant time look-up feature to implement
+ this. */
+
+/* A BSR set has two types.
+
+ The first type is of the form
+
+ (X := alpha, i, j, k),
+
+ where X := alpha is a grammar rule. This is equivalent with a
+ 5-tuple
+
+ (X, a, i, j, k),
+
+ where X is the integer corresponding to X and a is the index of the
+ rule X := alpha.
+
+ The second type is of the form
+
+ (X := beta . gamma, i, j, k),
+
+ where beta is a prefix of the rule X := beta gamma. So this is
+ equivalent with a 6-tuple
+
+ (X, a, b, i, j, k),
+
+ where a is the index of the rule X := beta gamma and b is the
+ length of beta.
+
+ The simplest way to implement this is to have two "sets", one for
+ each type. Each set is an array of arrays, where the root array's
+ indices correspond to non-terminals, and the leaf arrays correspond
+ to rules of that non-terminal. Then the rest is implemented
+ through hash-tables, where a hash table's values are yet another
+ hash table, and so on.
+
+ I don't know if there are better implementations. If I think of
+ better ways in the future, I will re-implement the BSR sets. */
+
+/* Note that we also need to implement "process descriptors" for the
+ GLL parser. Since a process descriptor is of the form
+
+ (X := beta . gamma, i, j, k),
+
+ where X := beta gamma is a rule and i is the length of beta, we can
+ represent a process descriptor as
+
+ (X, a, i, j, k),
+
+ i.e. as a first-type BSR set. But the meaning of the "i" is
+ different. */
+
+int place(int x);
+
+#endif