Age | Commit message (Collapse) | Author |
|
I seem to have finished the implementation of forests. Now it remains
the implementation of the chain-rule machine, of which I have a rough
plan now.
|
|
Now the grammar will record the left-linear expansions when generating
the nondeterministic finite automaton frmo its rules, and will record
whether an edge in the nondeterministic finite automaton comes from a
left-linear expansion. The latter is needed because while performing
a chain-rule derivation, we do not need the left-linear expanded
derivations in the "first layer". This might well have been the root
cause of the bad performance of the previous version of this package.
Also I have figured out how to properly generate and handle parse
forests while manipulating the "chain-rule machine".
|
|
I am about to re-start my system, so I save before any crashes
happen.
|
|
Now I have a new type of labelled graphs, which can index vertices by
labels, but not index edges by labels. The biggest difference is that
I do not have to keep a hashmap of edge targets by labels, and I do
not have to guard against the duplication of nodes with the same set
of edges. I guard against nodes with the same label, though.
Also, in this graph, both vertices and edges have one label at a time,
whereas in the previous labelled graph there can be a multitude of
edges between the same source and target nodes, but with different
labels.
Now it remains to test this type of graphs, and to think through how
we attach forest fragments to nondeterministic finite automata edges,
and how to join forest fragments together while skipping nullable
edges, in order to finish the "compilation" part.
|
|
I put functionalities that are not strictly core to separate crates,
so that the whole package becomes more modular, and makes it easier to
try other parsing algorithms in the future.
Also I have to figure the forests out before finishing the core
chain-rule algorithm, as the part about forests affects the labels of
the grammars directly. From my experiences in writing the previous
version, it is asking for trouble to change the labels type
dramatically at a later point: too many places need to be changed.
Thus I decide to figure the rough part of forests out.
Actually I only have to figure out how to attach forests fragments to
edges of the underlying atomic languages, and the more complex parts
of putting forests together can be left to the recorders, which is my
vision of assembling semi-ring values during the chain-rule machine.
It should be relatively easy to produce forests fragments from
grammars since we are just trying to extract some information from the
grammar, not to manipulate those information in some complicated way.
We have to do some manipulations in the process, though, in order to
make sure that the nulling and epsilon-removal processes do not
invalidate these fragments.
|
|
Some changes:
- The core crate is renamed to "chain".
- The crate "viz" is added, which will provide layered graph drawing
algorithms.
- A function is added to convert from a grammar to the regular
language of its left-linear closures.
- A function is added to convert from a nondeterministic finite
automaton to its "null" closure. A null closure is the same
automaton with edges added, as if some edges are "null". Whether an
edge is null is determined by a function.
Combined with the previous change, we can convert a grammar to the
regular language of the null closure of its left-linear closures.
---
Now it remains to test more grammars and add an Atom trait, before
finishing the part about compilations.
|