Age | Commit message (Collapse) | Author |
|
There were two main issues in the previous version.
One is that there are lots of duplications of nodes when manipulating
the forest. This does not mean that labels repeat: by the use of the
data type this cannot happen. What happened is that there were cloned
nodes whose children are exactly equal. In this case there is no need
to clone that node in the first place. This is now fixed by checking
carefully before cloning, so that we do not clone unnecessary nodes.
The other issue, which is perhaps more important, is that there are
nodes which are not closed. This means that when there should be a
reuction of grammar rules, the forest does not mark the corresponding
node as already reduced. The incorrect forests thus caused is hard to
fix: I tried several different approaches to fix it afterwards, but
all to no avail. I also tried to record enough information to fix
these nodes during the manipulations. It turned out that recording
nodes is a dead end, as I cannot properly syncronize the information
in the forest and the information in the chain-rule machine. Any
inconsistencies will result in incorrect operations later on.
The approach I finally adapt is to perform every possible reduction at
each step. This might lead to some more nodes than what we need. But
those are technically expected to be there after all, and it is easy
to filter them out, so it is fine, from my point of view at the
moment.
Therefore, what remains is to filter those nodes out and connect it to
the holy Emacs. :D
|
|
I decide to adopt a new approach of recording and updating item
derivation forests. Since this affects a lot of things, I decide to
commit before the refactor, so that I can create a branch for that
refactor.
|
|
I have an ostensibly working prototype now.
Further tests are needed to make sure that the algorithm meets the
time complexity requirement, though.
|
|
Now the grammar will record the left-linear expansions when generating
the nondeterministic finite automaton frmo its rules, and will record
whether an edge in the nondeterministic finite automaton comes from a
left-linear expansion. The latter is needed because while performing
a chain-rule derivation, we do not need the left-linear expanded
derivations in the "first layer". This might well have been the root
cause of the bad performance of the previous version of this package.
Also I have figured out how to properly generate and handle parse
forests while manipulating the "chain-rule machine".
|
|
I put functionalities that are not strictly core to separate crates,
so that the whole package becomes more modular, and makes it easier to
try other parsing algorithms in the future.
Also I have to figure the forests out before finishing the core
chain-rule algorithm, as the part about forests affects the labels of
the grammars directly. From my experiences in writing the previous
version, it is asking for trouble to change the labels type
dramatically at a later point: too many places need to be changed.
Thus I decide to figure the rough part of forests out.
Actually I only have to figure out how to attach forests fragments to
edges of the underlying atomic languages, and the more complex parts
of putting forests together can be left to the recorders, which is my
vision of assembling semi-ring values during the chain-rule machine.
It should be relatively easy to produce forests fragments from
grammars since we are just trying to extract some information from the
grammar, not to manipulate those information in some complicated way.
We have to do some manipulations in the process, though, in order to
make sure that the nulling and epsilon-removal processes do not
invalidate these fragments.
|