README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218

======================================================================
                       Rust, Emacs, and Parsers
======================================================================

About
-----

This is a parser generator to be used as an Emacs dynamic package.
First I shall explain how to compile this package, and then I shall
explain briefly the background information and how to use this
package.

COMPILE
-------

Download compiled library
=========================

// TODO: Make the website.

One can download a compiled dynamic library from the website:
<https://jsdurand.xyz/rep.html>.

Manual compilation
==================

First clone this package:

            ----------------------------------------------
            | git clone https://git.jsdurand.xyz/rep.git |
            ----------------------------------------------

Then go into the cloned directory:

                              ----------
                              | cd rep |
                              ----------

[ Make sure the Rust tools are already installed.  If not, then
  install the Rust compiler tools from the instructions on the website
  <https://www.rust-lang.org/tools/install>. ]

Then run the autotools:

                         -------------------
                         | autoreconf -vif |
                         -------------------

// TODO: Make the two build options.

Now there are two build options: debug and release.  The debug build
is for debugging the package, so turns off optimisations and includes
debug information.  This is not recommended for normal use.  The
release build is the default, and is more efficient than the debug
build.

If one wants to use the debug build, run:

                       -----------------------
                       | ./configure --debug |
                       -----------------------

Otherwise just run:

                           ---------------
                           | ./configure |
                           ---------------

Finally run 'make' to build the package:

                               --------
                               | make |
                               --------

Now there should be a file 'rep.so' located in the 'src' directory.
And one can directly load that file or move that dynamic library to
another more convenient location of choice.

The commands to build this package are summarized as follows:

            ----------------------------------------------
            | git clone https://git.jsdurand.xyz/rep.git |
            | cd rep                                     |
            | autoreconf -vif                            |
            | ./configure                                |
            | make                                       |
            ----------------------------------------------

Build on Windows
================

Since autotools do not work well for windows, from my experiences,
users of windows need another convenient way to compile the package.
So I wrote a script to automate this process: just run

// TODO: Figure out how to do this.

Test
----

The package contains an Emacs Lisp file 'src/test.el' that
demonstrates a simple scenario for testing.  Simply open the file in
an Emacs instance and run the Emacs Lisp forms one by one to see if it
works.  :D

======================================================================
                       About parser generators
======================================================================

Grammars
--------

Before explaining what parsers are, we need first to explain what
grammars are.

In short, a grammar can be viewed as a "substitution machine".  That
is, a grammar consists of a set of variables, constants, and rules

                           ----------------
                           | X => Y ... Z |
                           ----------------

where X is a variable and Y ... Z are either variables or constants.
The grammar then substitutes every occurence of X by the sequence at
the right-hand side Y ... Z.  And there is a "starting variable",
usually denoted as "S".

To sum up, we start from the sequence "S", and perform substitutions
according to the rules.  If the substituted sequence cannot be further
substituted, i.e. if the sequence consists of constants only, we say
that this constant-only sequence is "generated" by the grammar.

One can think of this substitution machine as macros in TeX, except
that this only allow macros without arguments.  Or one can think of
the machine as macros in programming languages, again without
arguments.

Notice that a variable can have multiple rules to substitute, and a
given sequence can be generated by different sets of substitutions.

Parsers
-------

Parsers are "inverses" of grammars.  More precisely, while grammars
perform substitutions to generate sequences, parsers try to find out
what substitutions the grammar performed in order to generate a given
sequence.  Since there might be multiple ways to generate a given
sequence, the parser needs to find all possible ways.

In fact, the number of ways to generate a sequence can even be
infinite.  For example, if the grammar has the following two rules:

                             -----------
                             | S => S  |
                             | S => "" |
                             -----------

This grammar means that S can be subtituted by S itself, and it can be
substituted by the empty sequence as well.  In this case, the grammar
can only generate the empty sequence, but it can generate the empty
sequence by infinitely many ways: it can first substitute S by itself
any number of times, until it substitutes S by the empty sequence.

Moreover, even if we do not consider the generations that produce
empty sequences, the number of ways a grammar can generate a given
sequence is in general O(n^3), where "O" means the big-O notation, and
n is the length of the sequence to be generated.  This is why general
parsers are hard to construct for every grammar.

Parser Generators
-----------------

Now we come to the topic of this package: parser generators.  Simply
speaking, a parser generator takes a grammar as input, and produces a
parser for that grammar as output.

One can classify parser generators into two types: comiled and
interpreted.  The former means that the generator produces source
codes of a parser that needs to be compiled to produce a parser.  The
latter means that the generator simply produces a parser that needs to
be run by an interpreter.  The well-known "tree-sitter" parser
generator belongs to the first category, for example.

This package is interpreted.  But there is a plan to also turn on a
"compiled mode" to potentially make the generated parsers more
performant.

Augmented Backus-Naur Form
--------------------------

There are various ways to write grammars.  This package chooses the
augmented backus-naur form.  See Wiki for an overview:
<https://en.wikipedia.org/wiki/Augmented_Backus-Naur_Form>, and see
RFC 5234 for its detailed definition:
<https://datatracker.ietf.org/doc/html/rfc5234> or
<https://www.rfc-editor.org/rfc/rfc5234.txt>.

The reason for this choice is that the RFC documents use this format,
so there are an abundance of mature grammars to use.  For example, the
ABNF format itself has a grammar in the ABNF format, so we can use the
generated parser to parse ABNF formats, immediately.

There is also a plan to let the users choose their own favourite
format, of course.

Algorithm
---------

This package uses an algorithm that is not widely adapted, at least
not to my knowledge.  So I think I will briefly explain how it works.


TODO: Provide more details.

Tokenization
------------

There is another abstraction used by this package.