Age | Commit message (Collapse) | Author |
|
|
|
|
|
Those were added by accident.
|
|
|
|
Previously the functions `is_prefix` and `plant` did not take the
situation of packed nodes into considerations. That was because I
only dealt with non-packed nodes in the past: the fragment to test for
prefixes and for planting did not intersect the packed nodes in the
forest, and the grammar is so simple that the fragments do not contain
packed nodes.
Then a test revealed this situation, so I have to fix this lack of
considerations now. This commit attempts to fix this issue.
From the newly added unit-tests, it seems that this fix works. :)
|
|
I do not use a tool to automatically format the codes, so sometimes
the codes look ugly. This commit reformats the codes so that they
look better and shorter on each line.
|
|
Now the binding part is finished.
What remains is a bug encountered when planting a fragment to the
forest which intersects a packed node, which would lead to invalid
forests. This will also cause problem when planting a packed
fragment, but until now my testing grammars do not produce packed
fragments, so this problem is not encountered yet.
I am still figuring out efficient ways to solve this problem.
|
|
Adding a grammar and a document for testing purposes.
|
|
Add more directories under control of autotools.
|
|
There were two main issues in the previous version.
One is that there are lots of duplications of nodes when manipulating
the forest. This does not mean that labels repeat: by the use of the
data type this cannot happen. What happened is that there were cloned
nodes whose children are exactly equal. In this case there is no need
to clone that node in the first place. This is now fixed by checking
carefully before cloning, so that we do not clone unnecessary nodes.
The other issue, which is perhaps more important, is that there are
nodes which are not closed. This means that when there should be a
reuction of grammar rules, the forest does not mark the corresponding
node as already reduced. The incorrect forests thus caused is hard to
fix: I tried several different approaches to fix it afterwards, but
all to no avail. I also tried to record enough information to fix
these nodes during the manipulations. It turned out that recording
nodes is a dead end, as I cannot properly syncronize the information
in the forest and the information in the chain-rule machine. Any
inconsistencies will result in incorrect operations later on.
The approach I finally adapt is to perform every possible reduction at
each step. This might lead to some more nodes than what we need. But
those are technically expected to be there after all, and it is easy
to filter them out, so it is fine, from my point of view at the
moment.
Therefore, what remains is to filter those nodes out and connect it to
the holy Emacs. :D
|
|
Generally speaking the algorithm now works correctly and produces the
right shape of forest for the test ambiguous grammar as well. It does
not correctly perform the "reductions". It seems that I deliberately
disabled this part of the functionalities in a previous debugging
tour.
So I have to enable it again and see if it works.
|
|
|
|
I should have staged and committed these changes separately, but I am
too lazy to deal with that.
The main changes in this commit are that I added the derive macro that
automates the delegation of the Graph trait. This saves a lot of
boiler-plate codes.
The second main change, perhaps the most important one, is that I
found and tried to fix a bug that caused duplication of nodes. The
bug arises from splitting or cloning a node multiple times, and
immediately planting the same fragment under the new "sploned" node.
That is, when we try to splone the node again, we found that we need
to splone, because the node that was created by the same sploning
process now has a different label because of the planting of the
fragment. Then after the sploning, we plant the fragment again. This
makes the newly sploned node have the same label (except for the clone
index) and the same children as the node that was sploned and planted
in the previous rounds.
The fix is to check for the existence of a node that has the same set
of children as the about-to-be-sploned node, except for the last one,
which contains the about-to-be-planted fragment as a prefix. If that
is the case, treat it as an already existing node, so that we do not
have to splone the node again.
This is consistent with the principle to not create what we do not
need.
|
|
|
|
* DIARY: Added a diary that might serve as a record of my thoughts.
|
|
The macro `graph_derive` can automatically write the boiler-plate
codes for wrapper types one of whose sub-fields implements the `Graph`
trait. The generated implementation will delegate the `Graph`
operations to the sub-field which implements the `Graph` trait.
I plan to add more macros, corresponding to various other
graph-related traits, so that no such boiler-plate codes are needed,
at least for my use-cases.
|
|
Finished the function of performing extra reductions.
Still untested though.
|
|
In the chain-rule machine, we need to skip through edges whose labels
are "accepting", otherwise the time complexity will be high even for
simple grammars. This implies that we will skip some "jumping up" in
the item derivation forest. So we need to record these extra jumping
up, in order to jump up at a later point.
This Reducer type plays this role. But I still need more experiments
to see if this approach works out as I intended.
|
|
|
|
* chain/src/default.rs:
* chain/src/lib.rs: Add a parameter that controls whether or not the
chain-rule machine computes the item derivation forest as well.
Sometimes we only need to recognize whether an input belongs to the
grammar, but do not care about the derivations. This parameter can
speed up the machine in that case.
|
|
* chain/src/default.rs: Add a plan to fix things.
|
|
* chain/src/item/genins.rs: Some minor fixes according to clippy.
|
|
I decide to adopt a new approach of recording and updating item
derivation forests. Since this affects a lot of things, I decide to
commit before the refactor, so that I can create a branch for that
refactor.
|
|
Previously there was a minor bug: if the chain-rule machine ended in a
node without children, which node should be accepting because of edges
that have no children and hence were ignored, then since the node has
no children, it would be regarded as not accepting. Now this issue is
fixed by introducting real or imaginary edges, where an imaginary edge
is used to determine the acceptance of nodes without chidlren.
|
|
Previously cloning a node does not alter the root of the forest, while
it should alter the root if the cloned node was the root. This would
affect how we compare the equalities of forests. It indeed resulted
in anomalies that were hard to solve.
|
|
I need more than the ability to clone nodes: I also need to split the
nodes. Now this seems to be correctly added.
|
|
Finally the prototype parser has produced the first correct forest.
It is my first time to generate a correct forest, in fact, ever since
the beginning of this project.
|
|
It seems to be complete now, but still awaits more tests to see where
the errors are, which should be plenty, haha.
|
|
Now the forest can detect if a node is packed or cloned, and correctly
clones a node in those circumstances. But it still needs to be
tested.
|
|
It seems the performance is indeed linear for a simple grammar.
This is such a historical moment, for me, that I think it deserves a
separate commit, haha.
|
|
I have an ostensibly working prototype now.
Further tests are needed to make sure that the algorithm meets the
time complexity requirement, though.
|
|
I seem to have finished the implementation of forests. Now it remains
the implementation of the chain-rule machine, of which I have a rough
plan now.
|
|
An attempt to write a derive macro to automatically derive the various
graph traits.
|
|
Now the grammar will record the left-linear expansions when generating
the nondeterministic finite automaton frmo its rules, and will record
whether an edge in the nondeterministic finite automaton comes from a
left-linear expansion. The latter is needed because while performing
a chain-rule derivation, we do not need the left-linear expanded
derivations in the "first layer". This might well have been the root
cause of the bad performance of the previous version of this package.
Also I have figured out how to properly generate and handle parse
forests while manipulating the "chain-rule machine".
|
|
I am about to re-start my system, so I save before any crashes
happen.
|
|
Now I have a new type of labelled graphs, which can index vertices by
labels, but not index edges by labels. The biggest difference is that
I do not have to keep a hashmap of edge targets by labels, and I do
not have to guard against the duplication of nodes with the same set
of edges. I guard against nodes with the same label, though.
Also, in this graph, both vertices and edges have one label at a time,
whereas in the previous labelled graph there can be a multitude of
edges between the same source and target nodes, but with different
labels.
Now it remains to test this type of graphs, and to think through how
we attach forest fragments to nondeterministic finite automata edges,
and how to join forest fragments together while skipping nullable
edges, in order to finish the "compilation" part.
|
|
I put functionalities that are not strictly core to separate crates,
so that the whole package becomes more modular, and makes it easier to
try other parsing algorithms in the future.
Also I have to figure the forests out before finishing the core
chain-rule algorithm, as the part about forests affects the labels of
the grammars directly. From my experiences in writing the previous
version, it is asking for trouble to change the labels type
dramatically at a later point: too many places need to be changed.
Thus I decide to figure the rough part of forests out.
Actually I only have to figure out how to attach forests fragments to
edges of the underlying atomic languages, and the more complex parts
of putting forests together can be left to the recorders, which is my
vision of assembling semi-ring values during the chain-rule machine.
It should be relatively easy to produce forests fragments from
grammars since we are just trying to extract some information from the
grammar, not to manipulate those information in some complicated way.
We have to do some manipulations in the process, though, in order to
make sure that the nulling and epsilon-removal processes do not
invalidate these fragments.
|
|
Some changes:
- The core crate is renamed to "chain".
- The crate "viz" is added, which will provide layered graph drawing
algorithms.
- A function is added to convert from a grammar to the regular
language of its left-linear closures.
- A function is added to convert from a nondeterministic finite
automaton to its "null" closure. A null closure is the same
automaton with edges added, as if some edges are "null". Whether an
edge is null is determined by a function.
Combined with the previous change, we can convert a grammar to the
regular language of the null closure of its left-linear closures.
---
Now it remains to test more grammars and add an Atom trait, before
finishing the part about compilations.
|
|
just to save things in a commit
|
|
Basic GNU standard files are added, and we now stop worrying about
monadic anamorphisms.
The current focus is on testing the correctness of the algorithm, so I
need convenient support for manipulating, interpreting, examining, and
per chance animating nondeterministic automata.
|