From b2ca9a285f09d90259aaa7c4d09bb7335211020b Mon Sep 17 00:00:00 2001
From: JSDurand <mmemmew@gmail.com>
Date: Thu, 10 Aug 2023 11:14:54 +0800
Subject: Modify README

Working my way towards a user-friendly introduction to the package.
---
 README | 207 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

(limited to 'README')

diff --git a/README b/README
index 0da3d7a..1f7a05d 100644
--- a/README
+++ b/README
@@ -206,9 +206,216 @@ format, of course.
 Algorithm
 ---------
 
+[ This section is a little (too) involved.  Those who are not
+  interested in the details of the algorithm can skip to the next
+  section "Tokenization" directly, by searching for a continuous
+  sequence of the minus character '-' to jump to the next section, for
+  example. ]
+
 This package uses an algorithm that is not widely adapted, at least
 not to my knowledge.  So I think I will briefly explain how it works.
 
+Languages and derivatives
+=========================
+
+To understand the algorithm, we shall first talk about the derivatives
+of languages.
+
+For the sake of convenience, let Sigma denote a finite set of
+alphabets.  The set of all Unicode code points can be an example.  We
+call a sequence of elements in Sigma a "sentence", and we call a
+(possibly infinite) set of sentences a "language".  For example, the
+set of all grammatically correct English sentences is a language, to
+which we may give the name "English".
+
+Now, if L is a language and s is a sentence, we define the "derivative
+of L with respect to s", denoted as dL/ds, as the set of all sentences
+"t" such that the concatenated sentence "s . t" belongs to L.  We can
+re-formulate this definition in more mathematical terms:
+
+        -----------------------------------------------------
+        | dL/ds := { t is a sentence | s . t belongs to L } |
+        -----------------------------------------------------
+
+As a side note, the reason to call it the derivative is not only
+definitional similarity (to some extent), but also that this operation
+of taking the derivative of a language with respect to a sentence
+satisfies some similar properties as the "derivatives in Calculus".
+
+Parsing by taking derivatives
+=============================
+
+Why consider derivatives in a document about parser generators?
+Because we can parse sentences by taking derivatives.
+
+More precisely, let L be a language and let s be a sentence.  If the
+derivative dL/ds of L by s (which is a set of sentences) contains the
+empty sentence, then by definition this means that s belongs to L.
+This observation is the basis of parsing by taking derivatives.
+
+[ Those who have a rigorous mind are probably frowning at the above
+  paragraph because we say we can parse by taking derivatives, whereas
+  we actually only determine if the sentence belongs to the language.
+  This gap comes from that the details of actually constructing the
+  parse forests by taking derivatives are too complicated for me to
+  explain clearly here.  I might come up with a clear explanation in
+  the future, though. ]
+
+Regular languages
+=================
+
+With the above definition in mind, we now say that a language L is
+"regular" if and only if the set of derivatives of L is finite, i.e.
+
+              ------------------------------------------
+              | { dL/ds | s is a sentence } is finite. |
+              ------------------------------------------
+
+Note that this is a set of languages, namely, a set of sets of
+sentences.  And we say it is finite if it contains finitely many sets.
+
+For those familiar with regular expressions, a language is regular if
+and only if there is a regular expression such that a sentence belongs
+to the language if and only if it matches that regular expression.  I
+chose to define regular languages using derivatives as it is easier to
+reason about them in the following, in my opinion.
+
+Languages of grammars
+=====================
+
+If G is a grammar, we say that "G generates a language L", where L
+consists of all sentences generated by the grammar.  Denote the
+language generated by G as L_G.
+
+Derivatives of the languages of grammars
+========================================
+
+If s is a sentence, We ask the question:
+
+               ---------------------------------------
+               | What is the derivative of L_G by s? |
+               ---------------------------------------
+
+To answer this, note that if s and t are sentences, and if L is any
+language, then dL/d(s . t) is equal to d( dL/ds )/dt, the derivative
+by t of the derivative by s of L.  This can be seen by definition: a
+sentence r belongs to dL/d(s . t) if and only if (s . t . r) belongs
+to L, which is exactly the condition for (t . r) to belong to dL/ds.
+And this is the condition for r to belong to d( dL/ds )/dt.
+
+The above observation tells us that we can obtain dL_G/ds by taking
+derivatives of L_G by sentences consisting of one single alphabet,
+successively.
+
+Now consider dL_G/da.
+
+Suppose G consists of n rules
+
+                      --------------------------
+                      | X_1 => Y_1 ... Y_{m_1} |
+                      | ... => ...             |
+                      | X_n => Z_1 ... Z_{m_n} |
+                      --------------------------
+
+where X_i are variables, m_i is an integer depending on i, for i from
+1 to n, and Y_1, ..., Y_{m_1}, ..., Z_1, ..., Z_{m_n} are either
+variables or constants.
+
+It is then natural to try to determine dL_G/da by some kind of
+substitution.  So for i from 1 to n, let X_i also denote the language
+that is generated if we treat the variable X_i as the starting
+variable in the grammar.  For instance, if X_1 is the original
+starting variable for the grammar G, then L_G is also regarded as
+equal to X_1 under this view.
+
+Further, we observe that if both M_1 and M_2 are languages, and if M
+is their concatenation:
+
+          -------------------------------------------------
+          | M = { s . t | s belongs to M_1 and t to M_2 } |
+          -------------------------------------------------
+
+Then dM/da is equal to either
+
+                        ---------------------
+                        | ( dM_1/da ) . M_2 |
+                        ---------------------
+
+or
+
+           -----------------------------------------------
+           | the union of ( dM_1/da ) . M_2 and dM_2/da, |
+           -----------------------------------------------
+
+depending on whether or not M_1 contains the empty sentence.
+
+Consequently, the derivative dX_1/da is equal to the union of
+
+                  ----------------------------------
+                  | dY_1/da . Y_2 ... Y_{m_1},     |
+                  | ...,                           |
+                  | dY_i/da . Y_{i+1} ... Y_{m_1}, |
+                  ----------------------------------
+
+where i is the unique integer such that the language Y_i does not
+contain the empty sentence.  If no such i exists, then i = m_1.  And
+similarly for dX_2/da, ..., dX_n/da.
+
+Now we treat the languages dY_i/da as variables: if Y_i is a variable,
+this denotes the derivative by a of the language generated by Y_i; if
+Y_i is a constant, then this denotes either the language consisting of
+the empty sentence, or the empty set, depending upon whether Y_i is
+equal to a or not.  Then the previous paragraph defines the
+substitution rules of these new variables dY_i/da.  We call these
+rules as "the derived rules" of dY_i/da.
+
+It follows that we can obtain the derivative
+
+                        ---------------------
+                        | dL_G/da = dX_1/da |
+                        ---------------------
+
+by "substituting" these dY_i/da according to the derived rules.
+
+Combining the above, we see that the derivatives of the language L_G
+by any sentence are generated by grammars.  We denote the above
+grammar, which generates dL_G/da, as "the derived grammar" dG/da.  Now
+our problem is transformed to determine if the successive derivatives
+of grammars dG/da_1, d( dG/da_1 )/da_2, et cetera, generates the empty
+sentence.  This is relatively easy to do.
+
+Chain-rule of derivatives
+=========================
+
+All of the above was not new when the author proposed the algorithm.
+The core idea of the algorithm, in my opinion, is the "chain-rule" of
+the operation of taking derivatives of languages.
+
+In calculus, roughly speaking, the chain-rule of derivatives refers to
+the formula
+
+       --------------------------------------------------------
+       | df/dx = df/dy_1 . dy_1/dx + ... + df/dy_n . dy_n/dx, |
+       --------------------------------------------------------
+
+where f is a function in n variables y_1, ..., y_n, which are also
+functions in the variable x.
+
+If we replace the function f by a language L generated by a grammar G,
+the variables y_1, ..., y_n by the variables in G, x by an alphabet,
+the multiplication "." by the concatenation of languages, and the
+summation "+" by the union of languages, then we obtain a formula
+writing the derivative dG/da as a union of the concatenations
+
+                        ----------------------
+                        | dG/dy_i . dy_i/da, |
+                        ----------------------
+
+for i from 1 to n.  The only remaining question is:
+
+                     ---------------------------
+                     | What does dG/dy_i mean? |
+                     ---------------------------
 
 TODO: Provide more details.
 
-- 
cgit v1.2.3-18-g5258