1 files changed, 34 insertions, 4 deletions
diff --git a/src/bsr.h b/src/bsr.h
index 8578ab2..d6b96e5 100644
--- a/src/bsr.h
+++ b/src/bsr.h
@@ -20,7 +20,7 @@
    BSR has already been added to a set.  I guess the reason is that
    this is not the bottleneck of performance in its applications.  But
    I am not satisfied with that solution.  I want to take advantage of
-   hash-table's almost constant time look-up feature to implement
+   hash-table's almost constant-time look-up feature to implement
    this. */
 
 /* A BSR set has two types.
@@ -53,8 +53,13 @@
    each type.  Each set is an array of arrays, where the root array's
    indices correspond to non-terminals, and the leaf arrays correspond
    to rules of that non-terminal.  Then the rest is implemented
-   through hash-tables, where a hash table's values are yet another
-   hash table, and so on.
+   through hash-tables, whose keys are tuples of integers.  For the
+   first type, the keys are the (i, k) pairs and the values are those
+   j.  For the second, the keys are the (b, i, k) tupels and the
+   values are those j.  The reason to use i and k as keys is that they
+   represent the left and the right extent of the BSR respectively, so
+   we index them in order to find them easily.  In addition, this will
+   be useful if we want to extract SPPF out of a BSR set.
 
    I don't know if there are better implementations.  If I think of
    better ways in the future, I will re-implement the BSR sets. */
@@ -70,7 +75,32 @@
    (X, a, i, j, k),
 
    i.e. as a first-type BSR set.  But the meaning of the "i" is
-   different. */
+   different.
+
+   However, we shall notice that there is a natural "grading" on the
+   process descriptors.  The k-th grading consists of the descriptors
+   with the last element in the tuple equal to k.  In the clustered
+   nonterminal parsing algorithm, the process descriptors that are
+   added as a consequence of dealing with a process descriptor of
+   grading k must be of grade greater than or equal to k.  This means
+   that if all process descriptors of grade k have been processed and
+   the descriptors awaiting to be processed all have grades > k, then
+   we won't see any process descriptors of grade <= k again.  So we
+   can actually keep a list of process descriptors that serve the role
+   of the set U_k in the afore-mentionned paper.  This list
+   corresponds to the grading of the descriptors.  This way we don't
+   need to store the k in the descriptor.  Therefore a process
+   descriptor is represented as
+
+   (X, a, i, j).
+
+   Again we store an array of arrays corresponding to X and a.  Thus
+   the hash table simply has keys i and values j.
+
+   The main benefit of this is that after we find that all process
+   descriptors in R_k have been processed, we can free those
+   descriptors in U_k.  I think this is a good way to lower the space
+   requirements of this algorithm. */
 
 int place(int x);