Modify README

Working my way towards a user-friendly introduction to the package.
author: JSDurand <mmemmew@gmail.com> 2023-08-10 11:14:54 +0800
committer: JSDurand <mmemmew@gmail.com> 2023-08-10 11:14:54 +0800
commit: b2ca9a285f09d90259aaa7c4d09bb7335211020b (patch)
tree: ce18a3a79a6456b6d584accda888fe5f12add1f0
parent: b8a2d05a3c0d835556d5ddbd44e4a1e201302af5 (diff)
3 files changed, 1198 insertions, 0 deletions
diff --git a/README b/README
index 0da3d7a..1f7a05d 100644
--- a/README
+++ b/README
@@ -206,9 +206,216 @@ format, of course.
 Algorithm
 ---------
 
+[ This section is a little (too) involved.  Those who are not
+  interested in the details of the algorithm can skip to the next
+  section "Tokenization" directly, by searching for a continuous
+  sequence of the minus character '-' to jump to the next section, for
+  example. ]
+
 This package uses an algorithm that is not widely adapted, at least
 not to my knowledge.  So I think I will briefly explain how it works.
 
+Languages and derivatives
+=========================
+
+To understand the algorithm, we shall first talk about the derivatives
+of languages.
+
+For the sake of convenience, let Sigma denote a finite set of
+alphabets.  The set of all Unicode code points can be an example.  We
+call a sequence of elements in Sigma a "sentence", and we call a
+(possibly infinite) set of sentences a "language".  For example, the
+set of all grammatically correct English sentences is a language, to
+which we may give the name "English".
+
+Now, if L is a language and s is a sentence, we define the "derivative
+of L with respect to s", denoted as dL/ds, as the set of all sentences
+"t" such that the concatenated sentence "s . t" belongs to L.  We can
+re-formulate this definition in more mathematical terms:
+
+        -----------------------------------------------------
+        | dL/ds := { t is a sentence | s . t belongs to L } |
+        -----------------------------------------------------
+
+As a side note, the reason to call it the derivative is not only
+definitional similarity (to some extent), but also that this operation
+of taking the derivative of a language with respect to a sentence
+satisfies some similar properties as the "derivatives in Calculus".
+
+Parsing by taking derivatives
+=============================
+
+Why consider derivatives in a document about parser generators?
+Because we can parse sentences by taking derivatives.
+
+More precisely, let L be a language and let s be a sentence.  If the
+derivative dL/ds of L by s (which is a set of sentences) contains the
+empty sentence, then by definition this means that s belongs to L.
+This observation is the basis of parsing by taking derivatives.
+
+[ Those who have a rigorous mind are probably frowning at the above
+  paragraph because we say we can parse by taking derivatives, whereas
+  we actually only determine if the sentence belongs to the language.
+  This gap comes from that the details of actually constructing the
+  parse forests by taking derivatives are too complicated for me to
+  explain clearly here.  I might come up with a clear explanation in
+  the future, though. ]
+
+Regular languages
+=================
+
+With the above definition in mind, we now say that a language L is
+"regular" if and only if the set of derivatives of L is finite, i.e.
+
+              ------------------------------------------
+              | { dL/ds | s is a sentence } is finite. |
+              ------------------------------------------
+
+Note that this is a set of languages, namely, a set of sets of
+sentences.  And we say it is finite if it contains finitely many sets.
+
+For those familiar with regular expressions, a language is regular if
+and only if there is a regular expression such that a sentence belongs
+to the language if and only if it matches that regular expression.  I
+chose to define regular languages using derivatives as it is easier to
+reason about them in the following, in my opinion.
+
+Languages of grammars
+=====================
+
+If G is a grammar, we say that "G generates a language L", where L
+consists of all sentences generated by the grammar.  Denote the
+language generated by G as L_G.
+
+Derivatives of the languages of grammars
+========================================
+
+If s is a sentence, We ask the question:
+
+               ---------------------------------------
+               | What is the derivative of L_G by s? |
+               ---------------------------------------
+
+To answer this, note that if s and t are sentences, and if L is any
+language, then dL/d(s . t) is equal to d( dL/ds )/dt, the derivative
+by t of the derivative by s of L.  This can be seen by definition: a
+sentence r belongs to dL/d(s . t) if and only if (s . t . r) belongs
+to L, which is exactly the condition for (t . r) to belong to dL/ds.
+And this is the condition for r to belong to d( dL/ds )/dt.
+
+The above observation tells us that we can obtain dL_G/ds by taking
+derivatives of L_G by sentences consisting of one single alphabet,
+successively.
+
+Now consider dL_G/da.
+
+Suppose G consists of n rules
+
+                      --------------------------
+                      | X_1 => Y_1 ... Y_{m_1} |
+                      | ... => ...             |
+                      | X_n => Z_1 ... Z_{m_n} |
+                      --------------------------
+
+where X_i are variables, m_i is an integer depending on i, for i from
+1 to n, and Y_1, ..., Y_{m_1}, ..., Z_1, ..., Z_{m_n} are either
+variables or constants.
+
+It is then natural to try to determine dL_G/da by some kind of
+substitution.  So for i from 1 to n, let X_i also denote the language
+that is generated if we treat the variable X_i as the starting
+variable in the grammar.  For instance, if X_1 is the original
+starting variable for the grammar G, then L_G is also regarded as
+equal to X_1 under this view.
+
+Further, we observe that if both M_1 and M_2 are languages, and if M
+is their concatenation:
+
+          -------------------------------------------------
+          | M = { s . t | s belongs to M_1 and t to M_2 } |
+          -------------------------------------------------
+
+Then dM/da is equal to either
+
+                        ---------------------
+                        | ( dM_1/da ) . M_2 |
+                        ---------------------
+
+or
+
+           -----------------------------------------------
+           | the union of ( dM_1/da ) . M_2 and dM_2/da, |
+           -----------------------------------------------
+
+depending on whether or not M_1 contains the empty sentence.
+
+Consequently, the derivative dX_1/da is equal to the union of
+
+                  ----------------------------------
+                  | dY_1/da . Y_2 ... Y_{m_1},     |
+                  | ...,                           |
+                  | dY_i/da . Y_{i+1} ... Y_{m_1}, |
+                  ----------------------------------
+
+where i is the unique integer such that the language Y_i does not
+contain the empty sentence.  If no such i exists, then i = m_1.  And
+similarly for dX_2/da, ..., dX_n/da.
+
+Now we treat the languages dY_i/da as variables: if Y_i is a variable,
+this denotes the derivative by a of the language generated by Y_i; if
+Y_i is a constant, then this denotes either the language consisting of
+the empty sentence, or the empty set, depending upon whether Y_i is
+equal to a or not.  Then the previous paragraph defines the
+substitution rules of these new variables dY_i/da.  We call these
+rules as "the derived rules" of dY_i/da.
+
+It follows that we can obtain the derivative
+
+                        ---------------------
+                        | dL_G/da = dX_1/da |
+                        ---------------------
+
+by "substituting" these dY_i/da according to the derived rules.
+
+Combining the above, we see that the derivatives of the language L_G
+by any sentence are generated by grammars.  We denote the above
+grammar, which generates dL_G/da, as "the derived grammar" dG/da.  Now
+our problem is transformed to determine if the successive derivatives
+of grammars dG/da_1, d( dG/da_1 )/da_2, et cetera, generates the empty
+sentence.  This is relatively easy to do.
+
+Chain-rule of derivatives
+=========================
+
+All of the above was not new when the author proposed the algorithm.
+The core idea of the algorithm, in my opinion, is the "chain-rule" of
+the operation of taking derivatives of languages.
+
+In calculus, roughly speaking, the chain-rule of derivatives refers to
+the formula
+
+       --------------------------------------------------------
+       | df/dx = df/dy_1 . dy_1/dx + ... + df/dy_n . dy_n/dx, |
+       --------------------------------------------------------
+
+where f is a function in n variables y_1, ..., y_n, which are also
+functions in the variable x.
+
+If we replace the function f by a language L generated by a grammar G,
+the variables y_1, ..., y_n by the variables in G, x by an alphabet,
+the multiplication "." by the concatenation of languages, and the
+summation "+" by the union of languages, then we obtain a formula
+writing the derivative dG/da as a union of the concatenations
+
+                        ----------------------
+                        | dG/dy_i . dy_i/da, |
+                        ----------------------
+
+for i from 1 to n.  The only remaining question is:
+
+                     ---------------------------
+                     | What does dG/dy_i mean? |
+                     ---------------------------
 
 TODO: Provide more details.
 
diff --git a/algorithm.tex b/algorithm.tex
new file mode 100644
index 0000000..20fb1ac
--- /dev/null
+++ b/algorithm.tex
@@ -0,0 +1,486 @@
+\documentclass[12pt]{article}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{graphicx}
+\usepackage{grffile}
+\usepackage{longtable}
+\usepackage{wrapfig}
+\usepackage{rotating}
+\usepackage[normalem]{ulem}
+\usepackage{amsmath}
+\usepackage{mathrsfs}
+\usepackage{textcomp}
+\usepackage{amssymb}
+\usepackage{capt-of}
+\usepackage{hyperref}
+\usepackage{amsfonts, amsthm}
+\usepackage{enumitem}
+\usepackage{tikz}
+\author{Durand}
+\date{2023--08--05}
+\title{\vspace{-5ex}Algorithm used by the package REP}
+\hypersetup{
+  pdfauthor={Durand},
+  pdftitle={Algorithm used by the package REP},
+  pdfkeywords={},
+  pdfsubject={},
+  pdfcreator={Durand},
+  pdflang={English}}
+
+\newtheorem{lem}{Lemma}[section]
+\newtheorem{coro}[lem]{Corollary}
+\newtheorem{thm}{Theorem}[section]
+\newtheorem{prop}{Proposition}[section]
+\newtheorem{ex}{Exercise}[section]
+\theoremstyle{definition}
+\newtheorem*{rmk}{Remark}
+\newtheorem{defn}{Definition}[section]
+
+\newcommand{\n}{\mathbb{N}}
+\newcommand{\z}{\mathbb{Z}}
+\newcommand{\q}{\mathbb{Q}}
+\newcommand{\im}{\operatorname{Im}}
+\renewcommand{\labelenumi}{\arabic{enumi}.}
+\renewcommand{\labelenumii}{(\alph{enumii})}
+\begin{document}
+\maketitle
+
+\tableofcontents
+
+\begin{abstract}
+  This package uses an algorithm that is not widely adapted, at least
+  not to my knowledge.  So I think I will briefly explain how it
+  works.
+
+  The essence of the algorithm, in my opinion, is the chain-rule for
+  derivatives of languages.  And an optimization technique that makes
+  a great difference in the performance of the algorithm, not yet
+  implemented in this package, is the \textbf{context-free
+    memoization}.
+\end{abstract}
+
+\section{Grammars}\label{sec:grammars}
+
+Before explaining what parsers are, we need first to explain what
+grammars are.
+
+In short, a grammar can be viewed as a \emph{substitution machine}.
+That is, a grammar consists of a set of variables, constants, and
+rules
+
+\begin{equation*}
+X \mapsto Y \ldots Z,
+\end{equation*}
+
+where \(X\) is a variable and \(Y \ldots Z\) are either variables or
+constants.  The grammar then substitutes every occurence of \(X\) by
+the sequence at the right-hand side \(Y\ldots Z\).  And there is a
+\emph{starting variable}, usually denoted as \(S\).
+
+To sum up, we start from the sequence \(S\), and perform substitutions
+according to the rules.  If the substituted sequence cannot be further
+substituted, i.e. if the sequence consists of constants only, we say
+that this constant-only sequence is "generated" by the grammar.
+
+One can think of this substitution machine as macros in \TeX, except
+that this only allow macros without arguments.  Or one can think of
+the machine as macros in programming languages like \texttt{Lisp},
+again without arguments.
+
+Notice that a variable can have multiple rules to substitute, and a
+given sequence can be generated by different sets of substitutions.
+
+\section{Parsers}\label{sec:parsers}
+
+Parsers are \emph{inverses} of grammars.  More precisely, while
+grammars perform substitutions to generate sequences, parsers try to
+find out what substitutions the grammar performed in order to generate
+a given sequence.  Since there might be multiple ways to generate a
+given sequence, the parser needs to find all possible ways.
+
+In fact, the number of ways to generate a sequence can even be
+infinite.  For example, if the grammar has the following two rules:
+
+\begin{align*}
+S &\mapsto S\\
+S &\mapsto "".
+\end{align*}
+
+This grammar means that \(S\) can be subtituted by \(S\) itself, and
+it can be substituted by the empty sequence as well.  In this case,
+the grammar can only generate the empty sequence, but it can generate
+the empty sequence by infinitely many ways: it can first substitute
+\(S\) by itself any number of times, until it substitutes \(S\) by the
+empty sequence.
+
+Moreover, even if we do not consider the generations that produce
+empty sequences, the number of ways a grammar can generate a given
+sequence is in general \(O(n^3)\), where \(O\) means the
+\textbf{big-O} notation, and \(n\) is the length of the sequence to be
+generated.  This is why efficient general parsers that work for every
+grammar are hard to construct.
+
+\section{Parser Generators}\label{sec:parser-generators}
+
+Now we come to the topic of this package: parser generators.  Simply
+speaking, a parser generator takes a grammar as input, and produces a
+parser for that grammar as output.
+
+One can classify parser generators into two types: comiled and
+interpreted.  The former means that the generator produces source
+codes of a parser that needs to be compiled to produce a parser.  The
+latter means that the generator simply produces a parser that needs to
+be run by an interpreter.  The well-known \emph{tree-sitter} parser
+generator belongs to the first category, for example.
+
+This package is interpreted.  But there is a plan to also turn on a
+\emph{compiled mode} to potentially make the generated parsers more
+performant.
+
+\section{Augmented Backus-Naur Form}\label{sec:augm-back-naur}
+
+There are various ways to write grammars.  This package chooses the
+\emph{augmented backus-naur form}.  See
+\href{https://en.wikipedia.org/wiki/Augmented_Backus-Naur_Form}{Wiki}
+for an overview, and see
+\href{https://datatracker.ietf.org/doc/html/rfc5234}{RFC 5234} for its
+detailed definition, or
+\href{https://www.rfc-editor.org/rfc/rfc5234.txt}{the text version}.
+
+The reason for this choice is that the \textbf{RFC documents} use this
+format, so there are an abundance of mature grammars to use.  For
+example, the ABNF format itself has a grammar in the ABNF format, so
+we can use the generated parser to parse ABNF formats, immediately.
+
+There is also a plan to let the users choose their own favourite
+format, of course.
+
+\section{Languages and derivatives}\label{sec:lang-deriv}
+
+This section is involved and optional.  One can jump to the
+section~\ref{sec:tokenization} on Tokenization directly if the details
+of the algorithm are not so interesting.
+
+To understand the algorithm, we shall first talk about the derivatives
+of languages.
+
+For the sake of convenience, let \(\Sigma\) denote a finite set of
+alphabets.  The set of all Unicode code points can be an example.  We
+call a sequence of elements in \(\Sigma\) a \emph{sentence}, and we
+call a (possibly infinite) set of sentences a \emph{language}.  For
+example, the set of all grammatically correct English sentences is a
+language, to which we may give the name \emph{English}.
+
+Now, if \(L\) is a language and \(s\) is a sentence, we define the
+\textbf{derivative of L with respect to s}, denoted as
+\(\frac{dL}{ds}\), as the set of all sentences \(t\) such that the
+concatenated sentence \(s \cdot t\) belongs to \(L\).  We can
+re-formulate this definition in more mathematical terms:
+
+\[
+  \frac{dL}{ds} := \left\{t \text{ is a sentence } \mid s \cdot t
+    \text{ belongs to } L\right\}.
+\]
+
+As a side note, the reason to call it the \emph{derivative} is not
+only definitional similarity (to some extent), but also that this
+operation of taking the derivative of a language with respect to a
+sentence satisfies some similar properties as the \emph{derivatives in
+  Calculus}.
+
+\section{Parsing by taking derivatives}\label{sec:pars-taking-deriv}
+
+Why consider derivatives in a document about parser generators?
+Because we can parse sentences by taking derivatives.
+
+More precisely, let \(L\) be a language and let \(s\) be a sentence.
+If the derivative \(\frac{dL}{ds}\) of \(L\) by \(s\) (which is a set
+of sentences) contains the empty sentence, then by definition this
+means that \(s\) belongs to \(L\).  This observation is the basis of
+parsing by taking derivatives.
+
+\begin{quotation}
+  Those who have a rigorous mind are probably frowning at the above
+  paragraph because we say we can parse by taking derivatives, whereas
+  we actually only determine if the sentence belongs to the language.
+  This gap comes from that the details of actually constructing the
+  parse forests by taking derivatives are too complicated for me to
+  explain clearly here.  I might come up with a clear explanation in
+  the future, though.
+\end{quotation}
+
+\section{Regular languages}\label{sec:regular-languages}
+
+With the above definition in mind, we now say that a language \(L\) is
+\emph{regular} if and only if the set of derivatives of \(L\) is
+finite, i.e.
+
+\[
+  \left\{\frac{dL}{ds} \mid s \text{ is a sentence } \right\} \text{
+    is finite}.
+\]
+
+Note that this is a set of languages, namely, a set of sets of
+sentences.  And we say it is finite if it contains finitely many sets.
+
+For those familiar with regular expressions, a language is regular if
+and only if there is a regular expression such that a sentence belongs
+to the language if and only if it matches that regular expression.  I
+chose to define regular languages using derivatives as it is easier to
+reason about them in the following, in my opinion.
+
+\section{Languages of grammars}\label{sec:languages-grammars}
+
+If \(G\) is a grammar, we say that \emph{\(G\) generates a language
+  \(L\)}, where \(L\) consists of all sentences generated by the
+grammar.  Denote the language generated by \(G\) as \(L_G\).
+
+\section{Derivatives of the languages of grammars}
+\label{sec:deriv-lang-gramm}
+
+If \(s\) is a sentence, we ask the question:
+
+\begin{quotation}
+  What is the derivative \(\frac{dL_{G}}{ds}\) of \(L_G\) by \(s\)?
+\end{quotation}
+
+To answer this, note that if \(s\) and \(t\) are sentences, and if
+\(L\) is any language, then \(\frac{dL}{d(s\cdot t)}\) is equal to
+\(\frac d{dt}\frac{dL}{ds}\), the derivative by \(t\) of the
+derivative by \(s\) of \(L\).  This can be seen by definition: a
+sentence \(r\) belongs to \(\frac{dL}{d(s\cdot t)}\) if and only if
+\((s\cdot t\cdot r)\) belongs to \(L\), which is exactly the condition
+for \((t\cdot r)\) to belong to \(\frac{dL}{ds}\).  And this is the
+condition for \(r\) to belong to \(\frac d{dt} \frac{dL}{ds}\).
+
+The above observation tells us that we can obtain \(\frac{dL_G}{ds}\)
+by taking derivatives of \(L_G\) by sentences consisting of one single
+alphabet, successively.
+
+Now consider \(\frac{dL_G}{da}\).
+
+Suppose \(G\) consists of \(n\) rules
+
+\begin{equation}
+  \label{eq:rules_of_g}
+  \begin{aligned}
+    X_{1}&\mapsto Y_{1}\ldots Y_{m_{1}}\\
+    \vdots\\
+    X_{n}&\mapsto Z_{1} \ldots Z_{m_{n}},
+  \end{aligned}
+\end{equation}
+where \(X_i\) are variables, \(m_i\) is an integer depending on \(i\),
+for \(1\leq i\leq n\), and
+\(Y_1, \ldots, Y_{m_1}, \ldots, Z_1, \ldots, Z_{m_n}\) are either
+variables or constants.
+
+It is then natural to try to determine \(\frac{dL_G}{da}\) by some
+kind of substitution.  So for \(1\leq i\leq n\), let \(X_i\) also
+denote the language that is generated if we treat the variable \(X_i\)
+as the starting variable in the grammar.  For instance, if \(X_1\) is
+the original starting variable for the grammar \(G\), then \(L_G\) is
+also regarded as equal to \(X_1\) under this view.
+
+Further, we observe that if both \(M_1\) and \(M_2\) are languages,
+and if \(M\) is their concatenation:
+
+\begin{equation*}
+  M := \left\{ s \cdot t\mid s \in M_{1}, t\in M_{2}\right\}.
+\end{equation*}
+
+Then \(\frac{dM}{da}\) is equal to either
+\[\frac{dM_{1}}{da}\cdot M_{2}\] or
+\[\frac{dM_{1}}{da} \cdot M_{2}\bigcup \frac{dM_{2}}{da},\]
+
+depending on whether or not \(M_1\) contains the empty sentence.  This
+can be regarded as the \emph{product-rule} of the derivatives of
+languages, as an aside.
+
+Consequently, the derivative \(\frac{dX_1}{da}\) is equal to the union
+of
+
+\begin{align*}
+  &\frac{dY_{1}}{da}\cdot Y_{2}\cdots Y_{m_{1}},\\
+  &\frac{dY_{2}}{da}\cdot Y_{3}\cdots Y_{m_{1}},\\
+  &\vdots\\
+  &\frac{dY_{i}}{da}\cdot Y_{i+1}\cdots Y_{m_{1}},\\
+\end{align*}
+
+where \(i\) is the unique integer such that the language \(Y_{i}\)
+does not contain the empty sentence.  If no such \(i\) exists, then we
+define \(i=m_{1}\).  And \emph{mutatis mutandis} for
+\(\frac{dX_{2}}{da},\ldots, \frac{dX_{n}}{da}\).
+
+Now we treat the languages \(\frac{dY_i}{da}\) as variables: if
+\(Y_i\) is a variable, this denotes the derivative by \(a\) of the
+language generated by \(Y_i\); if \(Y_i\) is a constant, then this
+denotes either the language consisting of the empty sentence, or the
+empty set, depending upon whether \(Y_i\) is equal to \(a\) or not.
+Then the previous paragraph defines the substitution rules of these
+new variables \(\frac{dY_i}{da}\).  We call these rules as \emph{the
+  derived rules} of \(\frac{dY_i}{da}\).
+
+It follows that we can obtain the derivative
+\(\frac{dL_{G}}{da} = \frac{dX_{1}}{da}\) by \emph{substituting} these
+\(\frac{dY_i}{da}\) according to the derived rules.
+
+Combining the above, we see that the derivatives of the language
+\(L_G\) by any sentence are generated by grammars.  We denote the
+above grammar, which generates \(\frac{dL_G}{da}\), as \emph{the
+  derived grammar} \(\frac{dG}{da}\).  Now our problem is transformed
+to determine whether the successive derivatives of grammars
+\(\frac{dG}{da_1}, \frac d{da_2}\frac{dG}{da_1} ,\ldots\), generates
+the empty sentence or not.  This is relatively easy to do, once we can
+compute the successive derivatives efficiently.
+
+\section{Chain-rule of derivatives}\label{sec:chain-rule-deriv}
+
+All of the above was not new when the author proposed the algorithm.
+The core idea of the algorithm, in my opinion, is the
+\textbf{chain-rule} of the operation of taking derivatives of
+languages.
+
+In calculus, roughly speaking, the chain-rule of derivatives refers to
+the formula
+
+\begin{equation*}
+  \frac{df}{dx} =
+  \sum_{i=1}^{n} \frac{df}{dy_{i}} \cdot \frac{dy_{i}}{dx},
+\end{equation*}
+
+where \(f\) is a function in \(n\) variables \(y_1, ..., y_n\), which
+are also functions in the variable \(x\).
+
+If we replace the function \(f\) by a language \(L\) generated by a
+grammar \(G\), the variables \(y_1, \ldots , y_n\) by the variables in
+\(G\), \(x\) by an alphabet \(a\), the multiplication \(\cdot\) by the
+concatenation of languages, and the summation \(+\) by the union of
+languages, then we obtain a formula writing the derivative
+\(\frac{dG}{da}\) as a union of the concatenations:
+
+\begin{equation}\label{eq:chain_rule}
+  \frac{dG}{da} =
+  \bigcup_{i=1}^{n}\frac{dG}{dy_{i}} \cdot \frac{dy_{i}}{da}.
+\end{equation}
+
+The only remaining questions are: What does \(\frac{dG}{dy_{i}}\)
+actually mean, and is the formula~\eqref{eq:chain_rule} true?
+
+\subsection{Derivatives with respect to variables}
+\label{sec:deriv-wrt-var}
+
+We want to generalize the construction in the
+section~\ref{sec:deriv-lang-gramm} to be able to take the derivatives
+with respect to variables.
+
+First, for the sake of brevity, let \(\Sigma^{\star}\) denote the set
+of all sentences, i.e. possibly empty sequences of elements in
+\(\Sigma\).
+
+We define a new alphabet
+\(\Sigma' := \Sigma\bigcup \left\{X_{i}\,\vert\, 1\leq i\leq
+  n\right\}\) and a new set of variables
+\[
+  \left\{ Z_{i}\ \bigg\vert\ 1\leq i\leq n \right\},
+\]
+where \(X_{i}\) are the variables in \(G\), according to its
+rules~\ref{eq:rules_of_g}.
+
+Then for each rule \(X_{i}\mapsto Y_{1}\ldots Y_{m_{i}}\), we define
+\(j\) rules: for \(1\leq k\leq j\),
+\begin{equation}\label{eq:closed_grammar_rules}
+  Z_{i}\mapsto Z_{k} \cdot Y_{k+1}\cdots Y_{m_{i}},
+\end{equation}
+where \(j\) is the smallest integer such that the language \(Y_{j}\)
+does not contain the empty sequence \(\varepsilon\in\Sigma^{\star}\).
+Here if \(Y_{k}\) belongs to \(\Sigma\), then \(Z_{k}\) is replaced by
+\(Y_{k}\) itself, i.e. the right-hand side of that rule does not
+contain variables.
+
+This defines a grammar, which we call the \emph{closed grammar}
+\(\overline G\).
+
+With this closed grammar, we can naturally take derivatives with
+respect to the original variables \(X_{i}\), as they are now alphabets
+of this closed grammar \(\overline G\).
+
+\subsection{Proof of the chain-rule}\label{sec:proof-chain-rule}
+
+\begin{prop}\label{prop:proof-of-chain-rule}
+  \[
+    \frac{dL_{G}}{da} = \bigcup_{i=1}^{n} \frac{dX_{i}}{da} \cdot
+    \left( \frac{dL_{\overline G}}{dX_{i}} \bigcap
+      \operatorname{P}\left(\Sigma^{\star}\right) \right) ,\, \forall
+    a\in\Sigma.
+  \]
+\end{prop}
+
+\begin{proof}
+  For any \(i\), any sequence in \(\frac{dL_{\overline G}}{dX_{i}}\)
+  is generated from \(G\) by partial substitutions, and removing empty
+  sequences.  So the containement of the right-hand side in the
+  left-hand side is obvious.
+
+  Conversely, if a sentence \(s\in\Sigma^{\star}\) begins with \(a\),
+  then \(s\) must be obtained by substitution from some sentence in
+  \({(\Sigma')}^{\star}\), which means that it belongs to the
+  right-hand side.  So the proof is complete.
+\end{proof}
+
+\begin{rmk}
+  It follows that we can determine whether or not
+  \(\frac{dL_{G}}{da}\) contains the empty sequence by examining
+  \(\frac{dX_{i}}{da}\) and \(\frac{dL_{\overline G}}{dX_{i}}\)
+  respectively.
+
+  Further, since
+  \(\frac{dX_{i}}{da} = \frac{dZ_{i}}{da}\bigcap\operatorname{P}
+  (\Sigma^{\star})\) and
+  \(\frac{dL_{\overline G}}{dX_{i}} = \frac{dZ_{1}}{dX_{i}}\), we
+  conclude that we only have to examine \(\frac{dZ_{i}}{dX_{j}}\), for
+  \(1\leq i,j\leq n\).
+\end{rmk}
+
+What makes this grammar interesting is its regularity.
+
+\subsection{The closed grammar is regular}\label{sec:clos-gramm-regul}
+
+\begin{prop}
+  The closed grammar is regular.  In fact, the set
+  \[
+    \left\{\frac{dZ_{i}}{ds}\ \bigg\vert\
+      1\leq i\leq n,\,s\in{(\Sigma')}^{\star}\right\}
+  \]
+  is finite.
+\end{prop}
+
+\begin{proof}
+  Observe the rules~\eqref{eq:closed_grammar_rules}.  From the
+  definition, we see that the right-hand sides of rules only contain
+  at most one variable, \(Z_{k}\), and that variable must appear as
+  the first symbol in the sequence on the right-hand side.  This
+  observation is sufficient to show that the closed grammar is
+  regular: the rigorous-minded readers can prove this statement by
+  first showing that the language generated by the closed grammar is a
+  finite union of concatenations of languages of two types: either a
+  set consisting of only one finite sentence, or a set consisting of
+  zero or more self-repetitions of a finite sequence.  Then the reader
+  can easily verify that such languages are regular, by showing that
+  the derivatives of a language consisting of zero or more
+  self-repetitions of a finite sequence are finitely many.  A hint for
+  the last step is to notice that the derivatives of the language
+  \(s^{\star}\) of zero or more self-repetitions of a sequence
+  \(s:=s_{1},\ldots,s_{p}\) can be injected in the finite set
+  \[
+    \left\{\left\{s_{i},s_{i+1},\ldots,s_{p}\right\}\cdot s^{\star}
+      \mid 1\leq i\leq p+1\right\}.
+  \]
+\end{proof}
+
+\section{Tokenization}\label{sec:tokenization}
+
+There is another abstraction used by this package.
+
+
+
+\end{document}
+\ No newline at end of file
diff --git a/fdl-1.3.texi b/fdl-1.3.texi
new file mode 100644
index 0000000..eaf3da0
--- /dev/null
+++ b/fdl-1.3.texi
@@ -0,0 +1,505 @@
+@c The GNU Free Documentation License.
+@center Version 1.3, 3 November 2008
+
+@c This file is intended to be included within another document,
+@c hence no sectioning command or @node.
+
+@display
+Copyright @copyright{} 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
+@uref{https://fsf.org/}
+
+Everyone is permitted to copy and distribute verbatim copies
+of this license document, but changing it is not allowed.
+@end display
+
+@enumerate 0
+@item
+PREAMBLE
+
+The purpose of this License is to make a manual, textbook, or other
+functional and useful document @dfn{free} in the sense of freedom: to
+assure everyone the effective freedom to copy and redistribute it,
+with or without modifying it, either commercially or noncommercially.
+Secondarily, this License preserves for the author and publisher a way
+to get credit for their work, while not being considered responsible
+for modifications made by others.
+
+This License is a kind of ``copyleft'', which means that derivative
+works of the document must themselves be free in the same sense.  It
+complements the GNU General Public License, which is a copyleft
+license designed for free software.
+
+We have designed this License in order to use it for manuals for free
+software, because free software needs free documentation: a free
+program should come with manuals providing the same freedoms that the
+software does.  But this License is not limited to software manuals;
+it can be used for any textual work, regardless of subject matter or
+whether it is published as a printed book.  We recommend this License
+principally for works whose purpose is instruction or reference.
+
+@item
+APPLICABILITY AND DEFINITIONS
+
+This License applies to any manual or other work, in any medium, that
+contains a notice placed by the copyright holder saying it can be
+distributed under the terms of this License.  Such a notice grants a
+world-wide, royalty-free license, unlimited in duration, to use that
+work under the conditions stated herein.  The ``Document'', below,
+refers to any such manual or work.  Any member of the public is a
+licensee, and is addressed as ``you''.  You accept the license if you
+copy, modify or distribute the work in a way requiring permission
+under copyright law.
+
+A ``Modified Version'' of the Document means any work containing the
+Document or a portion of it, either copied verbatim, or with
+modifications and/or translated into another language.
+
+A ``Secondary Section'' is a named appendix or a front-matter section
+of the Document that deals exclusively with the relationship of the
+publishers or authors of the Document to the Document's overall
+subject (or to related matters) and contains nothing that could fall
+directly within that overall subject.  (Thus, if the Document is in
+part a textbook of mathematics, a Secondary Section may not explain
+any mathematics.)  The relationship could be a matter of historical
+connection with the subject or with related matters, or of legal,
+commercial, philosophical, ethical or political position regarding
+them.
+
+The ``Invariant Sections'' are certain Secondary Sections whose titles
+are designated, as being those of Invariant Sections, in the notice
+that says that the Document is released under this License.  If a
+section does not fit the above definition of Secondary then it is not
+allowed to be designated as Invariant.  The Document may contain zero
+Invariant Sections.  If the Document does not identify any Invariant
+Sections then there are none.
+
+The ``Cover Texts'' are certain short passages of text that are listed,
+as Front-Cover Texts or Back-Cover Texts, in the notice that says that
+the Document is released under this License.  A Front-Cover Text may
+be at most 5 words, and a Back-Cover Text may be at most 25 words.
+
+A ``Transparent'' copy of the Document means a machine-readable copy,
+represented in a format whose specification is available to the
+general public, that is suitable for revising the document
+straightforwardly with generic text editors or (for images composed of
+pixels) generic paint programs or (for drawings) some widely available
+drawing editor, and that is suitable for input to text formatters or
+for automatic translation to a variety of formats suitable for input
+to text formatters.  A copy made in an otherwise Transparent file
+format whose markup, or absence of markup, has been arranged to thwart
+or discourage subsequent modification by readers is not Transparent.
+An image format is not Transparent if used for any substantial amount
+of text.  A copy that is not ``Transparent'' is called ``Opaque''.
+
+Examples of suitable formats for Transparent copies include plain
+ASCII without markup, Texinfo input format, La@TeX{} input
+format, SGML or XML using a publicly available
+DTD, and standard-conforming simple HTML,
+PostScript or PDF designed for human modification.  Examples
+of transparent image formats include PNG, XCF and
+JPG@.  Opaque formats include proprietary formats that can be
+read and edited only by proprietary word processors, SGML or
+XML for which the DTD and/or processing tools are
+not generally available, and the machine-generated HTML,
+PostScript or PDF produced by some word processors for
+output purposes only.
+
+The ``Title Page'' means, for a printed book, the title page itself,
+plus such following pages as are needed to hold, legibly, the material
+this License requires to appear in the title page.  For works in
+formats which do not have any title page as such, ``Title Page'' means
+the text near the most prominent appearance of the work's title,
+preceding the beginning of the body of the text.
+
+The ``publisher'' means any person or entity that distributes copies
+of the Document to the public.
+
+A section ``Entitled XYZ'' means a named subunit of the Document whose
+title either is precisely XYZ or contains XYZ in parentheses following
+text that translates XYZ in another language.  (Here XYZ stands for a
+specific section name mentioned below, such as ``Acknowledgements'',
+``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
+of such a section when you modify the Document means that it remains a
+section ``Entitled XYZ'' according to this definition.
+
+The Document may include Warranty Disclaimers next to the notice which
+states that this License applies to the Document.  These Warranty
+Disclaimers are considered to be included by reference in this
+License, but only as regards disclaiming warranties: any other
+implication that these Warranty Disclaimers may have is void and has
+no effect on the meaning of this License.
+
+@item
+VERBATIM COPYING
+
+You may copy and distribute the Document in any medium, either
+commercially or noncommercially, provided that this License, the
+copyright notices, and the license notice saying this License applies
+to the Document are reproduced in all copies, and that you add no other
+conditions whatsoever to those of this License.  You may not use
+technical measures to obstruct or control the reading or further
+copying of the copies you make or distribute.  However, you may accept
+compensation in exchange for copies.  If you distribute a large enough
+number of copies you must also follow the conditions in section 3.
+
+You may also lend copies, under the same conditions stated above, and
+you may publicly display copies.
+
+@item
+COPYING IN QUANTITY
+
+If you publish printed copies (or copies in media that commonly have
+printed covers) of the Document, numbering more than 100, and the
+Document's license notice requires Cover Texts, you must enclose the
+copies in covers that carry, clearly and legibly, all these Cover
+Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
+the back cover.  Both covers must also clearly and legibly identify
+you as the publisher of these copies.  The front cover must present
+the full title with all words of the title equally prominent and
+visible.  You may add other material on the covers in addition.
+Copying with changes limited to the covers, as long as they preserve
+the title of the Document and satisfy these conditions, can be treated
+as verbatim copying in other respects.
+
+If the required texts for either cover are too voluminous to fit
+legibly, you should put the first ones listed (as many as fit
+reasonably) on the actual cover, and continue the rest onto adjacent
+pages.
+
+If you publish or distribute Opaque copies of the Document numbering
+more than 100, you must either include a machine-readable Transparent
+copy along with each Opaque copy, or state in or with each Opaque copy
+a computer-network location from which the general network-using
+public has access to download using public-standard network protocols
+a complete Transparent copy of the Document, free of added material.
+If you use the latter option, you must take reasonably prudent steps,
+when you begin distribution of Opaque copies in quantity, to ensure
+that this Transparent copy will remain thus accessible at the stated
+location until at least one year after the last time you distribute an
+Opaque copy (directly or through your agents or retailers) of that
+edition to the public.
+
+It is requested, but not required, that you contact the authors of the
+Document well before redistributing any large number of copies, to give
+them a chance to provide you with an updated version of the Document.
+
+@item
+MODIFICATIONS
+
+You may copy and distribute a Modified Version of the Document under
+the conditions of sections 2 and 3 above, provided that you release
+the Modified Version under precisely this License, with the Modified
+Version filling the role of the Document, thus licensing distribution
+and modification of the Modified Version to whoever possesses a copy
+of it.  In addition, you must do these things in the Modified Version:
+
+@enumerate A
+@item
+Use in the Title Page (and on the covers, if any) a title distinct
+from that of the Document, and from those of previous versions
+(which should, if there were any, be listed in the History section
+of the Document).  You may use the same title as a previous version
+if the original publisher of that version gives permission.
+
+@item
+List on the Title Page, as authors, one or more persons or entities
+responsible for authorship of the modifications in the Modified
+Version, together with at least five of the principal authors of the
+Document (all of its principal authors, if it has fewer than five),
+unless they release you from this requirement.
+
+@item
+State on the Title page the name of the publisher of the
+Modified Version, as the publisher.
+
+@item
+Preserve all the copyright notices of the Document.
+
+@item
+Add an appropriate copyright notice for your modifications
+adjacent to the other copyright notices.
+
+@item
+Include, immediately after the copyright notices, a license notice
+giving the public permission to use the Modified Version under the
+terms of this License, in the form shown in the Addendum below.
+
+@item
+Preserve in that license notice the full lists of Invariant Sections
+and required Cover Texts given in the Document's license notice.
+
+@item
+Include an unaltered copy of this License.
+
+@item
+Preserve the section Entitled ``History'', Preserve its Title, and add
+to it an item stating at least the title, year, new authors, and
+publisher of the Modified Version as given on the Title Page.  If
+there is no section Entitled ``History'' in the Document, create one
+stating the title, year, authors, and publisher of the Document as
+given on its Title Page, then add an item describing the Modified
+Version as stated in the previous sentence.
+
+@item
+Preserve the network location, if any, given in the Document for
+public access to a Transparent copy of the Document, and likewise
+the network locations given in the Document for previous versions
+it was based on.  These may be placed in the ``History'' section.
+You may omit a network location for a work that was published at
+least four years before the Document itself, or if the original
+publisher of the version it refers to gives permission.
+
+@item
+For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
+the Title of the section, and preserve in the section all the
+substance and tone of each of the contributor acknowledgements and/or
+dedications given therein.
+
+@item
+Preserve all the Invariant Sections of the Document,
+unaltered in their text and in their titles.  Section numbers
+or the equivalent are not considered part of the section titles.
+
+@item
+Delete any section Entitled ``Endorsements''.  Such a section
+may not be included in the Modified Version.
+
+@item
+Do not retitle any existing section to be Entitled ``Endorsements'' or
+to conflict in title with any Invariant Section.
+
+@item
+Preserve any Warranty Disclaimers.
+@end enumerate
+
+If the Modified Version includes new front-matter sections or
+appendices that qualify as Secondary Sections and contain no material
+copied from the Document, you may at your option designate some or all
+of these sections as invariant.  To do this, add their titles to the
+list of Invariant Sections in the Modified Version's license notice.
+These titles must be distinct from any other section titles.
+
+You may add a section Entitled ``Endorsements'', provided it contains
+nothing but endorsements of your Modified Version by various
+parties---for example, statements of peer review or that the text has
+been approved by an organization as the authoritative definition of a
+standard.
+
+You may add a passage of up to five words as a Front-Cover Text, and a
+passage of up to 25 words as a Back-Cover Text, to the end of the list
+of Cover Texts in the Modified Version.  Only one passage of
+Front-Cover Text and one of Back-Cover Text may be added by (or
+through arrangements made by) any one entity.  If the Document already
+includes a cover text for the same cover, previously added by you or
+by arrangement made by the same entity you are acting on behalf of,
+you may not add another; but you may replace the old one, on explicit
+permission from the previous publisher that added the old one.
+
+The author(s) and publisher(s) of the Document do not by this License
+give permission to use their names for publicity for or to assert or
+imply endorsement of any Modified Version.
+
+@item
+COMBINING DOCUMENTS
+
+You may combine the Document with other documents released under this
+License, under the terms defined in section 4 above for modified
+versions, provided that you include in the combination all of the
+Invariant Sections of all of the original documents, unmodified, and
+list them all as Invariant Sections of your combined work in its
+license notice, and that you preserve all their Warranty Disclaimers.
+
+The combined work need only contain one copy of this License, and
+multiple identical Invariant Sections may be replaced with a single
+copy.  If there are multiple Invariant Sections with the same name but
+different contents, make the title of each such section unique by
+adding at the end of it, in parentheses, the name of the original
+author or publisher of that section if known, or else a unique number.
+Make the same adjustment to the section titles in the list of
+Invariant Sections in the license notice of the combined work.
+
+In the combination, you must combine any sections Entitled ``History''
+in the various original documents, forming one section Entitled
+``History''; likewise combine any sections Entitled ``Acknowledgements'',
+and any sections Entitled ``Dedications''.  You must delete all
+sections Entitled ``Endorsements.''
+
+@item
+COLLECTIONS OF DOCUMENTS
+
+You may make a collection consisting of the Document and other documents
+released under this License, and replace the individual copies of this
+License in the various documents with a single copy that is included in
+the collection, provided that you follow the rules of this License for
+verbatim copying of each of the documents in all other respects.
+
+You may extract a single document from such a collection, and distribute
+it individually under this License, provided you insert a copy of this
+License into the extracted document, and follow this License in all
+other respects regarding verbatim copying of that document.
+
+@item
+AGGREGATION WITH INDEPENDENT WORKS
+
+A compilation of the Document or its derivatives with other separate
+and independent documents or works, in or on a volume of a storage or
+distribution medium, is called an ``aggregate'' if the copyright
+resulting from the compilation is not used to limit the legal rights
+of the compilation's users beyond what the individual works permit.
+When the Document is included in an aggregate, this License does not
+apply to the other works in the aggregate which are not themselves
+derivative works of the Document.
+
+If the Cover Text requirement of section 3 is applicable to these
+copies of the Document, then if the Document is less than one half of
+the entire aggregate, the Document's Cover Texts may be placed on
+covers that bracket the Document within the aggregate, or the
+electronic equivalent of covers if the Document is in electronic form.
+Otherwise they must appear on printed covers that bracket the whole
+aggregate.
+
+@item
+TRANSLATION
+
+Translation is considered a kind of modification, so you may
+distribute translations of the Document under the terms of section 4.
+Replacing Invariant Sections with translations requires special
+permission from their copyright holders, but you may include
+translations of some or all Invariant Sections in addition to the
+original versions of these Invariant Sections.  You may include a
+translation of this License, and all the license notices in the
+Document, and any Warranty Disclaimers, provided that you also include
+the original English version of this License and the original versions
+of those notices and disclaimers.  In case of a disagreement between
+the translation and the original version of this License or a notice
+or disclaimer, the original version will prevail.
+
+If a section in the Document is Entitled ``Acknowledgements'',
+``Dedications'', or ``History'', the requirement (section 4) to Preserve
+its Title (section 1) will typically require changing the actual
+title.
+
+@item
+TERMINATION
+
+You may not copy, modify, sublicense, or distribute the Document
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense, or distribute it is void, and
+will automatically terminate your rights under this License.
+
+However, if you cease all violation of this License, then your license
+from a particular copyright holder is reinstated (a) provisionally,
+unless and until the copyright holder explicitly and finally
+terminates your license, and (b) permanently, if the copyright holder
+fails to notify you of the violation by some reasonable means prior to
+60 days after the cessation.
+
+Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, receipt of a copy of some or all of the same material does
+not give you any rights to use it.
+
+@item
+FUTURE REVISIONS OF THIS LICENSE
+
+The Free Software Foundation may publish new, revised versions
+of the GNU Free Documentation License from time to time.  Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.  See
+@uref{https://www.gnu.org/licenses/}.
+
+Each version of the License is given a distinguishing version number.
+If the Document specifies that a particular numbered version of this
+License ``or any later version'' applies to it, you have the option of
+following the terms and conditions either of that specified version or
+of any later version that has been published (not as a draft) by the
+Free Software Foundation.  If the Document does not specify a version
+number of this License, you may choose any version ever published (not
+as a draft) by the Free Software Foundation.  If the Document
+specifies that a proxy can decide which future versions of this
+License can be used, that proxy's public statement of acceptance of a
+version permanently authorizes you to choose that version for the
+Document.
+
+@item
+RELICENSING
+
+``Massive Multiauthor Collaboration Site'' (or ``MMC Site'') means any
+World Wide Web server that publishes copyrightable works and also
+provides prominent facilities for anybody to edit those works.  A
+public wiki that anybody can edit is an example of such a server.  A
+``Massive Multiauthor Collaboration'' (or ``MMC'') contained in the
+site means any set of copyrightable works thus published on the MMC
+site.
+
+``CC-BY-SA'' means the Creative Commons Attribution-Share Alike 3.0
+license published by Creative Commons Corporation, a not-for-profit
+corporation with a principal place of business in San Francisco,
+California, as well as future copyleft versions of that license
+published by that same organization.
+
+``Incorporate'' means to publish or republish a Document, in whole or
+in part, as part of another Document.
+
+An MMC is ``eligible for relicensing'' if it is licensed under this
+License, and if all works that were first published under this License
+somewhere other than this MMC, and subsequently incorporated in whole
+or in part into the MMC, (1) had no cover texts or invariant sections,
+and (2) were thus incorporated prior to November 1, 2008.
+
+The operator of an MMC Site may republish an MMC contained in the site
+under CC-BY-SA on the same site at any time before August 1, 2009,
+provided the MMC is eligible for relicensing.
+
+@end enumerate
+
+@page
+@heading ADDENDUM: How to use this License for your documents
+
+To use this License in a document you have written, include a copy of
+the License in the document and put the following copyright and
+license notices just after the title page:
+
+@smallexample
+@group
+  Copyright (C)  @var{year}  @var{your name}.
+  Permission is granted to copy, distribute and/or modify this document
+  under the terms of the GNU Free Documentation License, Version 1.3
+  or any later version published by the Free Software Foundation;
+  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
+  Texts.  A copy of the license is included in the section entitled ``GNU
+  Free Documentation License''.
+@end group
+@end smallexample
+
+If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
+replace the ``with@dots{}Texts.''@: line with this:
+
+@smallexample
+@group
+    with the Invariant Sections being @var{list their titles}, with
+    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
+    being @var{list}.
+@end group
+@end smallexample
+
+If you have Invariant Sections without Cover Texts, or some other
+combination of the three, merge those two alternatives to suit the
+situation.
+
+If your document contains nontrivial examples of program code, we
+recommend releasing these examples in parallel under your choice of
+free software license, such as the GNU General Public License,
+to permit their use in free software.
+
+@c Local Variables:
+@c ispell-local-pdict: "ispell-dict"
+@c End:
author	JSDurand <mmemmew@gmail.com>	2023-08-10 11:14:54 +0800
committer	JSDurand <mmemmew@gmail.com>	2023-08-10 11:14:54 +0800
commit	b2ca9a285f09d90259aaa7c4d09bb7335211020b (patch)
tree	ce18a3a79a6456b6d584accda888fe5f12add1f0
parent	b8a2d05a3c0d835556d5ddbd44e4a1e201302af5 (diff)