summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJSDurand <mmemmew@gmail.com>2023-07-22 11:37:54 +0800
committerJSDurand <mmemmew@gmail.com>2023-07-22 11:37:54 +0800
commit9d80a43a469dd474691c95e9899db09449076df7 (patch)
treeeb61f44c3b5aff9b53071ff55fd061a8c413d537
parentca1a2fa607a3ce95d8cf68f1a7a481d62b0ecf72 (diff)
Add some auxiliary data.
Try to fix some minor issues.
-rw-r--r--DIARY37
-rw-r--r--Makefile.am2
-rw-r--r--README182
-rw-r--r--configure.ac2
-rw-r--r--src/Makefile.am20
5 files changed, 240 insertions, 3 deletions
diff --git a/DIARY b/DIARY
index 13c9c6e..afef8e2 100644
--- a/DIARY
+++ b/DIARY
@@ -1,3 +1,7 @@
+======================================================================
+ 2023-06-02
+======================================================================
+
This is a "diary" that records my thoughts when trying to debug the
forest manipulations. Basically the forests are wrongly "merged", and
produce weird clones.
@@ -23,3 +27,36 @@ the end-users, but postponed to later since that was not essential for
my developments. Now this seems to be quite important for me to
properly observe the forests, so it is perhaps time to implement this
feature first.
+
+
+
+======================================================================
+ 2023-07-22
+======================================================================
+
+Now a stable version of the package is in place, so I can start
+working on some non-essential features, like the tokenization process
+and the visulization feature.
+
+Amongst these, the tokenization process is the most important, as the
+users will not expect to work with tokens: they expect to use
+characters. So in some sense this package is not ready for the
+end-users yet.
+
+My thought is to offer two modes: one for operating on characters and
+the other for operating on tokens directly. This can be achieved by
+letting the data type representing terminals contain more information
+on how to match characters.
+
+My thought is to attach either a vector of "machine codes" or a single
+character to terminals. If a terminal is attached a single character,
+it can only match that character.
+
+On the other hand, if a terminal corresponds to a vector of machine
+codes, then it means to execute this sequence of codes to determine if
+a character matches. We can even simplify the format of machine codes
+to a sequence of ranges, and a character only matches if it is
+contained in one of the ranges.
+
+After this is finished, we can bump the version to 0.1.3 and simplify
+the build process. Hope it all goes well.
diff --git a/Makefile.am b/Makefile.am
index ec980e8..3336e91 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,4 +1,4 @@
-SUBDIRS=graph nfa grammar graph_macro chain
+SUBDIRS=graph nfa grammar graph_macro chain src
.PHONY: rs
diff --git a/README b/README
index f73834f..bc1507b 100644
--- a/README
+++ b/README
@@ -1,3 +1,183 @@
+======================================================================
+ Rust, Emacs, and Parsers
+======================================================================
+
+About
+-----
+
This is a parser generator to be used as an Emacs dynamic package.
+First I shall explain how to compile this package, and then I shall
+explain briefly the background information and how to use this
+package.
+
+COMPILE
+-------
+
+Download compiled library
+=========================
+
+// TODO: Make the website.
+
+One can download a compiled dynamic library from the website:
+<https://jsdurand.xyz/rep.html>.
+
+Manual compilation
+==================
+
+First clone this package:
+
+ git clone https://git.jsdurand.xyz/rep.git
+
+Then go into the cloned directory:
+
+ cd rep
+
+[ Make sure the Rust tools are already installed. If not, then
+ install the Rust compiler tools from the instructions on the website
+ <https://www.rust-lang.org/tools/install>. ]
+
+Then run the autotools:
+
+ autoreconf -vif
+
+// TODO: Make the two build options and fix make.
+
+Now there are two build options: debug and release. The debug build
+is for debugging the package, so turns off optimisations and includes
+debug information. This is not recommended for normal use. The
+release build is the default, and is more efficient than the debug
+build.
+
+If one wants to use the debug build, run:
+
+ ./configure --debug
+
+Otherwise just run:
+
+ ./configure
+
+Finally run 'make' to build the package:
+
+ make
+
+Now there should be a file 'rep.so' located in the 'src' directory.
+
+Build on Windows
+================
+
+Since autotools do not work well for windows, from my experiences,
+users of windows need another convenient way to compile the package.
+So I wrote a script to automate this process: just run
+
+// TODO: Figure out how to do this.
+
+Test
+----
+
+The package contains an Emacs Lisp file 'src/test.el' that
+demonstrates a simple scenario for testing. Simply open the file in
+an Emacs instance and run the Emacs Lisp forms one by one to see if it
+works. :D
+
+======================================================================
+ About parser generators
+======================================================================
+
+Grammars
+--------
+
+Before explaining what parsers are, we need first to explain what
+grammars are.
+
+In short, a grammar can be viewed as a "substitution machine". That
+is, a grammar consists of a set of variables, constants, and rules
+
+ X => Y ... Z
+
+where X is a variable and Y ... Z are either variables or constants.
+The grammar then substitutes every occurence of X by the sequence at
+the right-hand side Y ... Z. And there is a "starting variable",
+usually denoted as "S".
+
+To sum up, we start from the sequence "S", and perform substitutions
+according to the rules. If the substituted sequence cannot be further
+substituted, i.e. if the sequence consists of constants only, we say
+that this constant-only sequence is "generated" by the grammar.
+
+One can think of this substitution machine as macros in TeX, except
+that this only allow macros without arguments. Or one can think of
+the machine as macros in programming languages, again without
+arguments.
+
+Notice that a variable can have multiple rules to substitute, and a
+given sequence can be generated by different sets of substitutions.
+
+Parsers
+-------
+
+Parsers are "inverses" of grammars. More precisely, while grammars
+perform substitutions to generate sequences, parsers try to find out
+what substitutions the grammar performed in order to generate a given
+sequence. Since there might be multiple ways to generate a given
+sequence, the parser needs to find all possible ways.
+
+In fact, the number of ways to generate a sequence can even be
+infinite. For example, if the grammar has the following two rules:
+
+ S => S
+ S => ""
+
+This grammar means that S can be subtituted by S itself, and it can be
+substituted by the empty sequence as well. In this case, the grammar
+can only generate the empty sequence, but it can generate the empty
+sequence by infinitely many ways: it can first substitute S by itself
+any number of times, until it substitutes S by the empty sequence.
+
+Moreover, even if we do not consider the generations that produce
+empty sequences, the number of ways a grammar can generate a given
+sequence is in general O(n^3), where "O" means the big-O notation, and
+n is the length of the sequence to be generated. This is why general
+parsers are hard to construct for every grammar.
+
+Parser Generators
+-----------------
+
+Now we come to the topic of this package: parser generators. Simply
+speaking, a parser generator takes a grammar as input, and produces a
+parser for that grammar as output.
+
+One can classify parser generators into two types: comiled and
+interpreted. The former means that the generator produces source
+codes of a parser that needs to be compiled to produce a parser. The
+latter means that the generator simply produces a parser that needs to
+be run by an interpreter. The well-known "tree-sitter" parser
+generator belongs to the first category, for example.
+
+This package is interpreted. But there is a plan to also turn on a
+"compiled mode" to potentially make the generated parsers more
+performant.
+
+Augmented Backus-Naur Form
+--------------------------
+
+There are various ways to write grammars. This package chooses the
+augmented backus-naur form. See Wiki for an overview:
+<https://en.wikipedia.org/wiki/Augmented_Backus-Naur_Form>, and see
+RFC 5234 for its detailed definition:
+<https://datatracker.ietf.org/doc/html/rfc5234> or
+<https://www.rfc-editor.org/rfc/rfc5234.txt>.
+
+The reason for this choice is that the RFC documents use this format,
+so there are a lot of example grammars.
+
+There is also a plan to let the users choose their own favourite
+format, of course.
+
+Algorithm
+---------
+
+This package uses an algorithm that is not widely adapted, at least
+not to my knowledge. So I think I will briefly explain how it works.
+
-TODO: Provide more details. \ No newline at end of file
+TODO: Provide more details.
diff --git a/configure.ac b/configure.ac
index 4f629ce..f26ea11 100644
--- a/configure.ac
+++ b/configure.ac
@@ -19,7 +19,7 @@ AS_IF([test "$CARGO" = "notfound"], [AC_MSG_ERROR([cargo is required])])
AC_PATH_PROG([RUSTC], [rustc], [notfound])
AS_IF([test "$RUSTC" = "notfound"], [AC_MSG_ERROR([rustc is required])])
-AC_CONFIG_FILES([Makefile graph/Makefile nfa/Makefile chain/Makefile grammar/Makefile graph_macro/Makefile])
+AC_CONFIG_FILES([Makefile src/Makefile graph/Makefile nfa/Makefile chain/Makefile grammar/Makefile graph_macro/Makefile])
AC_OUTPUT
diff --git a/src/Makefile.am b/src/Makefile.am
new file mode 100644
index 0000000..114b168
--- /dev/null
+++ b/src/Makefile.am
@@ -0,0 +1,20 @@
+CC=gcc
+
+CFLAGS=-g -O0
+
+rep.so: binding.c ../target/debug/librep.dylib helper.o big_endian.o
+ $(CC) $(CFLAGS) $+ -shared -export-dynamic -o $@
+
+test: test.c ../target/debug/librep.dylib helper.o big_endian.o
+ $(CC) $(CFLAGS) $+ -o $@
+
+.PHONY: clean rs windows
+
+clean:
+ rm *.o
+
+rs:
+ cargo build
+
+windows:
+ cargo build --target=x86_64-pc-windows-gnu