diff options
Diffstat (limited to 'README')
-rw-r--r-- | README | 182 |
1 files changed, 181 insertions, 1 deletions
@@ -1,3 +1,183 @@ +====================================================================== + Rust, Emacs, and Parsers +====================================================================== + +About +----- + This is a parser generator to be used as an Emacs dynamic package. +First I shall explain how to compile this package, and then I shall +explain briefly the background information and how to use this +package. + +COMPILE +------- + +Download compiled library +========================= + +// TODO: Make the website. + +One can download a compiled dynamic library from the website: +<https://jsdurand.xyz/rep.html>. + +Manual compilation +================== + +First clone this package: + + git clone https://git.jsdurand.xyz/rep.git + +Then go into the cloned directory: + + cd rep + +[ Make sure the Rust tools are already installed. If not, then + install the Rust compiler tools from the instructions on the website + <https://www.rust-lang.org/tools/install>. ] + +Then run the autotools: + + autoreconf -vif + +// TODO: Make the two build options and fix make. + +Now there are two build options: debug and release. The debug build +is for debugging the package, so turns off optimisations and includes +debug information. This is not recommended for normal use. The +release build is the default, and is more efficient than the debug +build. + +If one wants to use the debug build, run: + + ./configure --debug + +Otherwise just run: + + ./configure + +Finally run 'make' to build the package: + + make + +Now there should be a file 'rep.so' located in the 'src' directory. + +Build on Windows +================ + +Since autotools do not work well for windows, from my experiences, +users of windows need another convenient way to compile the package. +So I wrote a script to automate this process: just run + +// TODO: Figure out how to do this. + +Test +---- + +The package contains an Emacs Lisp file 'src/test.el' that +demonstrates a simple scenario for testing. Simply open the file in +an Emacs instance and run the Emacs Lisp forms one by one to see if it +works. :D + +====================================================================== + About parser generators +====================================================================== + +Grammars +-------- + +Before explaining what parsers are, we need first to explain what +grammars are. + +In short, a grammar can be viewed as a "substitution machine". That +is, a grammar consists of a set of variables, constants, and rules + + X => Y ... Z + +where X is a variable and Y ... Z are either variables or constants. +The grammar then substitutes every occurence of X by the sequence at +the right-hand side Y ... Z. And there is a "starting variable", +usually denoted as "S". + +To sum up, we start from the sequence "S", and perform substitutions +according to the rules. If the substituted sequence cannot be further +substituted, i.e. if the sequence consists of constants only, we say +that this constant-only sequence is "generated" by the grammar. + +One can think of this substitution machine as macros in TeX, except +that this only allow macros without arguments. Or one can think of +the machine as macros in programming languages, again without +arguments. + +Notice that a variable can have multiple rules to substitute, and a +given sequence can be generated by different sets of substitutions. + +Parsers +------- + +Parsers are "inverses" of grammars. More precisely, while grammars +perform substitutions to generate sequences, parsers try to find out +what substitutions the grammar performed in order to generate a given +sequence. Since there might be multiple ways to generate a given +sequence, the parser needs to find all possible ways. + +In fact, the number of ways to generate a sequence can even be +infinite. For example, if the grammar has the following two rules: + + S => S + S => "" + +This grammar means that S can be subtituted by S itself, and it can be +substituted by the empty sequence as well. In this case, the grammar +can only generate the empty sequence, but it can generate the empty +sequence by infinitely many ways: it can first substitute S by itself +any number of times, until it substitutes S by the empty sequence. + +Moreover, even if we do not consider the generations that produce +empty sequences, the number of ways a grammar can generate a given +sequence is in general O(n^3), where "O" means the big-O notation, and +n is the length of the sequence to be generated. This is why general +parsers are hard to construct for every grammar. + +Parser Generators +----------------- + +Now we come to the topic of this package: parser generators. Simply +speaking, a parser generator takes a grammar as input, and produces a +parser for that grammar as output. + +One can classify parser generators into two types: comiled and +interpreted. The former means that the generator produces source +codes of a parser that needs to be compiled to produce a parser. The +latter means that the generator simply produces a parser that needs to +be run by an interpreter. The well-known "tree-sitter" parser +generator belongs to the first category, for example. + +This package is interpreted. But there is a plan to also turn on a +"compiled mode" to potentially make the generated parsers more +performant. + +Augmented Backus-Naur Form +-------------------------- + +There are various ways to write grammars. This package chooses the +augmented backus-naur form. See Wiki for an overview: +<https://en.wikipedia.org/wiki/Augmented_Backus-Naur_Form>, and see +RFC 5234 for its detailed definition: +<https://datatracker.ietf.org/doc/html/rfc5234> or +<https://www.rfc-editor.org/rfc/rfc5234.txt>. + +The reason for this choice is that the RFC documents use this format, +so there are a lot of example grammars. + +There is also a plan to let the users choose their own favourite +format, of course. + +Algorithm +--------- + +This package uses an algorithm that is not widely adapted, at least +not to my knowledge. So I think I will briefly explain how it works. + -TODO: Provide more details.
\ No newline at end of file +TODO: Provide more details. |