Save before system restart.

I am about to re-start my system, so I save before any crashes happen.
author: JSDurand <mmemmew@gmail.com> 2023-01-06 23:42:28 +0800
committer: JSDurand <mmemmew@gmail.com> 2023-01-06 23:42:28 +0800
commit: f27d604d93ce583d4404e1874664e08382ea2f00 (patch)
tree: 6fa8df26af954e94f3604ffabde4961ee8108c41 /forest/src/design.org
parent: 7dd4935230e303aef8d295d992239d59d95b32d7 (diff)
1 files changed, 66 insertions, 0 deletions
diff --git a/forest/src/design.org b/forest/src/design.org
new file mode 100644
index 0000000..771ca4b
--- /dev/null
+++ b/forest/src/design.org
@@ -0,0 +1,66 @@
+#+TITLE: Format of a binarised shared packed parse forests
+#+AUTHOR: Durand
+#+DATE: <2023-01-05 Jeu 23:55>
+#+STARTUP: content
+
+* Introduction
+
+I want to design a format for shared packed parse forests.
+They are needed for the chain-rule machine because the graphs need to
+be labelled (indirectly) by fragments of forests so that we can
+correctly and efficiently compute the semi-ring values along the
+chain-rule machine operations.  Moreover, what we are really
+interested is the parse forests, so we will need to deal with this
+topic sooner or later, why not deal with it now?  ;P
+
+* Principles
+
+- The number of children of nodes is bounded by a constant, in theory.
+  So it is only bounded by the computer hardware (and the grammar):
+  this representation is closer to the grammar rules.
+- Labels of nodes are grammar slots, that is positions within the
+  rules.
+- Edges have no labels: they are simply not needed.
+- We need to have two lists of "plugs" that we can plug other forests
+  in.  For this we also need to know if a label is labelled on a node
+  that is a plug.  This implies the use of hashmaps I guess.
+- When there are alternations in the forest, we use a "packed node" to
+  represent this alternation.  Packed nodes are not labelled.  They
+  just serve the role of putting nodes together.
+
+* Some thoughts [1/3]
+
+We do not need to attach forest fragments to nodes of nondeterministic
+finite automata: we just attach to them lists of grammar slots, which
+represent a sequence of "nulling out" nullable non-terminals, until
+the main non-terminal appears as the last element of the list (or even
+a range of slots to be even more efficient).  A normal node would have
+a range with one element.
+
+That is it!  We do not need to worry about the forests now as we would
+know how to combine the forest segments when we are performing the
+chain-rule operations: we know exactly when we are expanding
+non-terminals.
+
+Some caveats:
+
+- [X] Every node in the forest needs to know its parents.  This is
+  needed for "reductions".  That is, after we have parsed the entire
+  expansion for a non-terminal, we need to go back to where that
+  non-terminal was and continue from there.
+- [ ] We need to record the expansions of those "virtual nodes".  That
+  is, the virtual nodes represent atomic languages such as [s]^{(t)}
+  where s is a non-terminal and t is a terminal.  To be more specific,
+  it represents the derivative of the left-linear closure of the
+  non-terminal s with respect to the terminal t.  The left-linear
+  closure of s is the set of all strings (consisting of terminals and
+  non-terminals alike) that are derivable from s alone, without
+  consuming any inputs, by expanding according to the grammar rules.
+  This means that we need to know if which non-terminals were expanded
+  in order to get to a state in [s]^{(t)}.
+- [ ] Each edge in the chain-rule machine needs to be labelled also
+  with a position in the forest.  This perfectly solves the problem of
+  those "plugs"!
+
+
+
author	JSDurand <mmemmew@gmail.com>	2023-01-06 23:42:28 +0800
committer	JSDurand <mmemmew@gmail.com>	2023-01-06 23:42:28 +0800
commit	f27d604d93ce583d4404e1874664e08382ea2f00 (patch)
tree	6fa8df26af954e94f3604ffabde4961ee8108c41 /forest/src/design.org
parent	7dd4935230e303aef8d295d992239d59d95b32d7 (diff)