summaryrefslogtreecommitdiff
path: root/src/bsr.h
diff options
context:
space:
mode:
authorJSDurand <mmemmew@gmail.com>2022-01-22 02:36:54 +0800
committerJSDurand <mmemmew@gmail.com>2022-01-22 02:36:54 +0800
commitbad2b1934da66021cbc7f0d715686706bd444449 (patch)
tree78c340d5f834bbc5864a97dcb23ff4ec41e23072 /src/bsr.h
parent53865aad225ffbe5cf3c42736e5a2095092f9fff (diff)
Implemented a hash table with any type of keys
Diffstat (limited to 'src/bsr.h')
-rw-r--r--src/bsr.h38
1 files changed, 34 insertions, 4 deletions
diff --git a/src/bsr.h b/src/bsr.h
index 8578ab2..d6b96e5 100644
--- a/src/bsr.h
+++ b/src/bsr.h
@@ -20,7 +20,7 @@
BSR has already been added to a set. I guess the reason is that
this is not the bottleneck of performance in its applications. But
I am not satisfied with that solution. I want to take advantage of
- hash-table's almost constant time look-up feature to implement
+ hash-table's almost constant-time look-up feature to implement
this. */
/* A BSR set has two types.
@@ -53,8 +53,13 @@
each type. Each set is an array of arrays, where the root array's
indices correspond to non-terminals, and the leaf arrays correspond
to rules of that non-terminal. Then the rest is implemented
- through hash-tables, where a hash table's values are yet another
- hash table, and so on.
+ through hash-tables, whose keys are tuples of integers. For the
+ first type, the keys are the (i, k) pairs and the values are those
+ j. For the second, the keys are the (b, i, k) tupels and the
+ values are those j. The reason to use i and k as keys is that they
+ represent the left and the right extent of the BSR respectively, so
+ we index them in order to find them easily. In addition, this will
+ be useful if we want to extract SPPF out of a BSR set.
I don't know if there are better implementations. If I think of
better ways in the future, I will re-implement the BSR sets. */
@@ -70,7 +75,32 @@
(X, a, i, j, k),
i.e. as a first-type BSR set. But the meaning of the "i" is
- different. */
+ different.
+
+ However, we shall notice that there is a natural "grading" on the
+ process descriptors. The k-th grading consists of the descriptors
+ with the last element in the tuple equal to k. In the clustered
+ nonterminal parsing algorithm, the process descriptors that are
+ added as a consequence of dealing with a process descriptor of
+ grading k must be of grade greater than or equal to k. This means
+ that if all process descriptors of grade k have been processed and
+ the descriptors awaiting to be processed all have grades > k, then
+ we won't see any process descriptors of grade <= k again. So we
+ can actually keep a list of process descriptors that serve the role
+ of the set U_k in the afore-mentionned paper. This list
+ corresponds to the grading of the descriptors. This way we don't
+ need to store the k in the descriptor. Therefore a process
+ descriptor is represented as
+
+ (X, a, i, j).
+
+ Again we store an array of arrays corresponding to X and a. Thus
+ the hash table simply has keys i and values j.
+
+ The main benefit of this is that after we find that all process
+ descriptors in R_k have been processed, we can free those
+ descriptors in U_k. I think this is a good way to lower the space
+ requirements of this algorithm. */
int place(int x);