diff options
author | JSDurand <mmemmew@gmail.com> | 2022-11-15 12:01:28 +0800 |
---|---|---|
committer | JSDurand <mmemmew@gmail.com> | 2022-11-15 12:01:28 +0800 |
commit | cb7bcfad4ab0041aaf3fde3185e27ee46bb37788 (patch) | |
tree | a4fd99b138b72617b6c4c2b04f5d2655d0fedcc5 |
Initial commit
Basic GNU standard files are added, and we now stop worrying about
monadic anamorphisms.
The current focus is on testing the correctness of the algorithm, so I
need convenient support for manipulating, interpreting, examining, and
per chance animating nondeterministic automata.
38 files changed, 3806 insertions, 0 deletions
@@ -0,0 +1 @@ +Jean Sévère Durand
\ No newline at end of file @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + <program> Copyright (C) <year> <name of author> + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<https://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<https://www.gnu.org/licenses/why-not-lgpl.html>.
\ No newline at end of file diff --git a/Cargo.toml b/Cargo.toml new file mode 100644 index 0000000..e672e74 --- /dev/null +++ b/Cargo.toml @@ -0,0 +1,36 @@ +[package] +name = "rep" +version = "0.1.0" +edition = "2021" +authors = ["JSDurand <durand@jsdurand.xyz>"] +description = "Rust, Emacs, and Parsers" +license = "GPL-3.0-or-later" +keywords = ["emacs", "parser"] +# We require a minimal Rust version of 1.65 as we need the feature of +# generic associated types, which are not stablized until version +# 1.65. +rust-version = "1.65" +# testing the new resolver, even though this has no dependencies ;p + +[workspace] +members = ["graph", "receme", "nfa", "repcore"] +resolver = "2" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[dependencies] +receme = { path = "receme" } + +[dev-dependencies] +criterion = "0.4" + +[features] +default = [] +tokenizer = [] + +[profile.bench] +debug = true + +[[bench]] +name = "bench_receme" +harness = false diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..a064c8c --- /dev/null +++ b/ChangeLog @@ -0,0 +1,21 @@ +2022-11-15 Jean Sévère Durand <durand@jsdurand.xyz> + + * nfa: Stop worrying about monadic anamorphisms. + + I was trying to design a way to use monadic anamorphisms to build + and parse regular expressions. But, after some more thoughts, I + can only think about implementations that affect the performance + and are quite specifically tailored to my use-cases. This means + the design is neither efficient nor generic. So what is the use + of it anyways? + + In the end, I decided to mildly generalize my usual pattern of + recursive descent parsing. After all, my current focus is to + implement a version of NFA that can show me derivatives of the + atomic languages in a human-friendly and easy-to-use way. This + will help me catch errors in my algorithms. + +2022-11-13 Jean Sévère Durand <durand@jsdurand.xyz> + + * gnu-standards: Add basic files required by the GNU standard. + diff --git a/DESIGN.org b/DESIGN.org new file mode 100644 index 0000000..18834cb --- /dev/null +++ b/DESIGN.org @@ -0,0 +1,114 @@ +#+AUTHOR: Durand +#+EMAIL: durand@jsdurand.xyz +#+DATE: <2022-09-30 Ven 14:45> +#+STARTUP: fold + +This document records an attempt to design a parser generator library. + +* Goals + +- Modular + + The library should be modular enough that the following components + of the parser generator can be easily substituted by either external + crates or even external foreign functions. + + + Regular expression matchers + + This project is to have a parser generator. A regular expression + matcher is not the core goal of the project. So I shall write the + library in a way that can use other means to match regular + expressions, should some users wish to do so. + + Since I do not want to use external dependencies by default, I + will also write a small regular expression matcher library to + serve as the default matcher. This will allow me to try different + optimization techniques for regular expression matchers as well. + + + Nondeterministic Fintie Automata + + The project needs to manipulate nondeterministic fintie automata. + As for regular expression matchers, I will write my own library + for dealing with NFA's, and in the mean time preserve the + possibility to "plug in" other external crates for this purpose as + well. + + + Graph data type + + The internal data of the parser generator, as used by the current + algorithm, is a directed graph. While this is easy to represent + using the adjacent list representation, I will not underestimate + the potential optimizations brought by external libraries. + + Moreover, the underlying data structures of regular expressions + and nondeterministic fintie automata are graphs as well. So they + should be able to share the same data structure in the default + implementation, and to plug in external crates, should the need + arise. + +- Abstract + + One of the deficiencies of the current version of this project is + that it does not mirror the abstract ideas of the algorithm closely. + This means that, when the implementation of the algorithm goes + wrong, I cannot precisely pinpoint the cause for the erraneous + behaviour. + + The reason for the departure from the abstract ideas was that I + thought this could lead to some performance gains. Apparently I was + too obsessed with premature optimizations that I ignored the + potential problems caused. + +** Update + +Now I realize that a nondeterministic fintie automaton is equivalent +with a regular expression, so it makes no sense to deal with NFAs +without dealing with regular expressions. Also, dealing with regular +expressions directly makes it easier to extract the "atomic languages" +of grammars and to present them in a readable way. + +But still the current focus is not on matching regular expressions, +but on the structures of regular expressions. + +* Details + +According to the above goals, I have to write three sub-crates, or +workspace members, for regular expression matchers, NFA's, and for +graphs, respectively. + +Each of these three crates is easy to write, in my opinion. I just +have to pack the old codes into separate crates, to separate concerns. +Once this is done, I can easily plug in other crates as well, as one +can just wrap up external crates. + +Moreover, in order to achieve the goal of abstraction, I need modules +for "atoms" and "languages". Atoms represent the regular language of +derivatives of the grammar. This is perhaps the most obscure part, in +my experience. Languages represent the derivative of the grammar with +respect to inputs. From my experience, it is not a good idea to +calculate the semiring values after the derivatives have been +calculated. Instead, we shall calculate the semiring values at the +same time as calculating the derivatives of the grammar, and record +the values in a dedicated recorder. + +* Tokenizers + +In my opinion, tokenization is just a speed-up process. The user +should not worry about the distinction between tokens and +non-terminals. This means I want the parser generator to work at the +character, or byte, level by default. When the user wants to use a +dedicated tokenizer, to have some speed-ups for example, the user can +supply a function that eats a string, or an array of bytes, and +returns an array of tokens. + +This decision means that we don't need regular expressions by default. +We shall add the regular expressions back later, when we want to +decrease the constant factor involved in the algorithmic complexities. +When we do want to do so, then, we need a way to automatically deduce +regular expression terminals. In the current version, a non-terminal +whose rules only involve with terminals will be automatically +converted to a regular expression terminal. I think this mechanism +can be adapted in the future. + +* Grammars + @@ -0,0 +1,368 @@ +Installation Instructions +************************* + + Copyright (C) 1994-1996, 1999-2002, 2004-2016 Free Software +Foundation, Inc. + + Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. This file is offered as-is, +without warranty of any kind. + +Basic Installation +================== + + Briefly, the shell command './configure && make && make install' +should configure, build, and install this package. The following +more-detailed instructions are generic; see the 'README' file for +instructions specific to this package. Some packages provide this +'INSTALL' file but do not implement all of the features documented +below. The lack of an optional feature in a given package is not +necessarily a bug. More recommendations for GNU packages can be found +in *note Makefile Conventions: (standards)Makefile Conventions. + + The 'configure' shell script attempts to guess correct values for +various system-dependent variables used during compilation. It uses +those values to create a 'Makefile' in each directory of the package. +It may also create one or more '.h' files containing system-dependent +definitions. Finally, it creates a shell script 'config.status' that +you can run in the future to recreate the current configuration, and a +file 'config.log' containing compiler output (useful mainly for +debugging 'configure'). + + It can also use an optional file (typically called 'config.cache' and +enabled with '--cache-file=config.cache' or simply '-C') that saves the +results of its tests to speed up reconfiguring. Caching is disabled by +default to prevent problems with accidental use of stale cache files. + + If you need to do unusual things to compile the package, please try +to figure out how 'configure' could check whether to do them, and mail +diffs or instructions to the address given in the 'README' so they can +be considered for the next release. If you are using the cache, and at +some point 'config.cache' contains results you don't want to keep, you +may remove or edit it. + + The file 'configure.ac' (or 'configure.in') is used to create +'configure' by a program called 'autoconf'. You need 'configure.ac' if +you want to change it or regenerate 'configure' using a newer version of +'autoconf'. + + The simplest way to compile this package is: + + 1. 'cd' to the directory containing the package's source code and type + './configure' to configure the package for your system. + + Running 'configure' might take a while. While running, it prints + some messages telling which features it is checking for. + + 2. Type 'make' to compile the package. + + 3. Optionally, type 'make check' to run any self-tests that come with + the package, generally using the just-built uninstalled binaries. + + 4. Type 'make install' to install the programs and any data files and + documentation. When installing into a prefix owned by root, it is + recommended that the package be configured and built as a regular + user, and only the 'make install' phase executed with root + privileges. + + 5. Optionally, type 'make installcheck' to repeat any self-tests, but + this time using the binaries in their final installed location. + This target does not install anything. Running this target as a + regular user, particularly if the prior 'make install' required + root privileges, verifies that the installation completed + correctly. + + 6. You can remove the program binaries and object files from the + source code directory by typing 'make clean'. To also remove the + files that 'configure' created (so you can compile the package for + a different kind of computer), type 'make distclean'. There is + also a 'make maintainer-clean' target, but that is intended mainly + for the package's developers. If you use it, you may have to get + all sorts of other programs in order to regenerate files that came + with the distribution. + + 7. Often, you can also type 'make uninstall' to remove the installed + files again. In practice, not all packages have tested that + uninstallation works correctly, even though it is required by the + GNU Coding Standards. + + 8. Some packages, particularly those that use Automake, provide 'make + distcheck', which can by used by developers to test that all other + targets like 'make install' and 'make uninstall' work correctly. + This target is generally not run by end users. + +Compilers and Options +===================== + + Some systems require unusual options for compilation or linking that +the 'configure' script does not know about. Run './configure --help' +for details on some of the pertinent environment variables. + + You can give 'configure' initial values for configuration parameters +by setting variables in the command line or in the environment. Here is +an example: + + ./configure CC=c99 CFLAGS=-g LIBS=-lposix + + *Note Defining Variables::, for more details. + +Compiling For Multiple Architectures +==================================== + + You can compile the package for more than one kind of computer at the +same time, by placing the object files for each architecture in their +own directory. To do this, you can use GNU 'make'. 'cd' to the +directory where you want the object files and executables to go and run +the 'configure' script. 'configure' automatically checks for the source +code in the directory that 'configure' is in and in '..'. This is known +as a "VPATH" build. + + With a non-GNU 'make', it is safer to compile the package for one +architecture at a time in the source code directory. After you have +installed the package for one architecture, use 'make distclean' before +reconfiguring for another architecture. + + On MacOS X 10.5 and later systems, you can create libraries and +executables that work on multiple system types--known as "fat" or +"universal" binaries--by specifying multiple '-arch' options to the +compiler but only a single '-arch' option to the preprocessor. Like +this: + + ./configure CC="gcc -arch i386 -arch x86_64 -arch ppc -arch ppc64" \ + CXX="g++ -arch i386 -arch x86_64 -arch ppc -arch ppc64" \ + CPP="gcc -E" CXXCPP="g++ -E" + + This is not guaranteed to produce working output in all cases, you +may have to build one architecture at a time and combine the results +using the 'lipo' tool if you have problems. + +Installation Names +================== + + By default, 'make install' installs the package's commands under +'/usr/local/bin', include files under '/usr/local/include', etc. You +can specify an installation prefix other than '/usr/local' by giving +'configure' the option '--prefix=PREFIX', where PREFIX must be an +absolute file name. + + You can specify separate installation prefixes for +architecture-specific files and architecture-independent files. If you +pass the option '--exec-prefix=PREFIX' to 'configure', the package uses +PREFIX as the prefix for installing programs and libraries. +Documentation and other data files still use the regular prefix. + + In addition, if you use an unusual directory layout you can give +options like '--bindir=DIR' to specify different values for particular +kinds of files. Run 'configure --help' for a list of the directories +you can set and what kinds of files go in them. In general, the default +for these options is expressed in terms of '${prefix}', so that +specifying just '--prefix' will affect all of the other directory +specifications that were not explicitly provided. + + The most portable way to affect installation locations is to pass the +correct locations to 'configure'; however, many packages provide one or +both of the following shortcuts of passing variable assignments to the +'make install' command line to change installation locations without +having to reconfigure or recompile. + + The first method involves providing an override variable for each +affected directory. For example, 'make install +prefix=/alternate/directory' will choose an alternate location for all +directory configuration variables that were expressed in terms of +'${prefix}'. Any directories that were specified during 'configure', +but not in terms of '${prefix}', must each be overridden at install time +for the entire installation to be relocated. The approach of makefile +variable overrides for each directory variable is required by the GNU +Coding Standards, and ideally causes no recompilation. However, some +platforms have known limitations with the semantics of shared libraries +that end up requiring recompilation when using this method, particularly +noticeable in packages that use GNU Libtool. + + The second method involves providing the 'DESTDIR' variable. For +example, 'make install DESTDIR=/alternate/directory' will prepend +'/alternate/directory' before all installation names. The approach of +'DESTDIR' overrides is not required by the GNU Coding Standards, and +does not work on platforms that have drive letters. On the other hand, +it does better at avoiding recompilation issues, and works well even +when some directory options were not specified in terms of '${prefix}' +at 'configure' time. + +Optional Features +================= + + If the package supports it, you can cause programs to be installed +with an extra prefix or suffix on their names by giving 'configure' the +option '--program-prefix=PREFIX' or '--program-suffix=SUFFIX'. + + Some packages pay attention to '--enable-FEATURE' options to +'configure', where FEATURE indicates an optional part of the package. +They may also pay attention to '--with-PACKAGE' options, where PACKAGE +is something like 'gnu-as' or 'x' (for the X Window System). The +'README' should mention any '--enable-' and '--with-' options that the +package recognizes. + + For packages that use the X Window System, 'configure' can usually +find the X include and library files automatically, but if it doesn't, +you can use the 'configure' options '--x-includes=DIR' and +'--x-libraries=DIR' to specify their locations. + + Some packages offer the ability to configure how verbose the +execution of 'make' will be. For these packages, running './configure +--enable-silent-rules' sets the default to minimal output, which can be +overridden with 'make V=1'; while running './configure +--disable-silent-rules' sets the default to verbose, which can be +overridden with 'make V=0'. + +Particular systems +================== + + On HP-UX, the default C compiler is not ANSI C compatible. If GNU CC +is not installed, it is recommended to use the following options in +order to use an ANSI C compiler: + + ./configure CC="cc -Ae -D_XOPEN_SOURCE=500" + +and if that doesn't work, install pre-built binaries of GCC for HP-UX. + + HP-UX 'make' updates targets which have the same time stamps as their +prerequisites, which makes it generally unusable when shipped generated +files such as 'configure' are involved. Use GNU 'make' instead. + + On OSF/1 a.k.a. Tru64, some versions of the default C compiler cannot +parse its '<wchar.h>' header file. The option '-nodtk' can be used as a +workaround. If GNU CC is not installed, it is therefore recommended to +try + + ./configure CC="cc" + +and if that doesn't work, try + + ./configure CC="cc -nodtk" + + On Solaris, don't put '/usr/ucb' early in your 'PATH'. This +directory contains several dysfunctional programs; working variants of +these programs are available in '/usr/bin'. So, if you need '/usr/ucb' +in your 'PATH', put it _after_ '/usr/bin'. + + On Haiku, software installed for all users goes in '/boot/common', +not '/usr/local'. It is recommended to use the following options: + + ./configure --prefix=/boot/common + +Specifying the System Type +========================== + + There may be some features 'configure' cannot figure out +automatically, but needs to determine by the type of machine the package +will run on. Usually, assuming the package is built to be run on the +_same_ architectures, 'configure' can figure that out, but if it prints +a message saying it cannot guess the machine type, give it the +'--build=TYPE' option. TYPE can either be a short name for the system +type, such as 'sun4', or a canonical name which has the form: + + CPU-COMPANY-SYSTEM + +where SYSTEM can have one of these forms: + + OS + KERNEL-OS + + See the file 'config.sub' for the possible values of each field. If +'config.sub' isn't included in this package, then this package doesn't +need to know the machine type. + + If you are _building_ compiler tools for cross-compiling, you should +use the option '--target=TYPE' to select the type of system they will +produce code for. + + If you want to _use_ a cross compiler, that generates code for a +platform different from the build platform, you should specify the +"host" platform (i.e., that on which the generated programs will +eventually be run) with '--host=TYPE'. + +Sharing Defaults +================ + + If you want to set default values for 'configure' scripts to share, +you can create a site shell script called 'config.site' that gives +default values for variables like 'CC', 'cache_file', and 'prefix'. +'configure' looks for 'PREFIX/share/config.site' if it exists, then +'PREFIX/etc/config.site' if it exists. Or, you can set the +'CONFIG_SITE' environment variable to the location of the site script. +A warning: not all 'configure' scripts look for a site script. + +Defining Variables +================== + + Variables not defined in a site shell script can be set in the +environment passed to 'configure'. However, some packages may run +configure again during the build, and the customized values of these +variables may be lost. In order to avoid this problem, you should set +them in the 'configure' command line, using 'VAR=value'. For example: + + ./configure CC=/usr/local2/bin/gcc + +causes the specified 'gcc' to be used as the C compiler (unless it is +overridden in the site shell script). + +Unfortunately, this technique does not work for 'CONFIG_SHELL' due to an +Autoconf limitation. Until the limitation is lifted, you can use this +workaround: + + CONFIG_SHELL=/bin/bash ./configure CONFIG_SHELL=/bin/bash + +'configure' Invocation +====================== + + 'configure' recognizes the following options to control how it +operates. + +'--help' +'-h' + Print a summary of all of the options to 'configure', and exit. + +'--help=short' +'--help=recursive' + Print a summary of the options unique to this package's + 'configure', and exit. The 'short' variant lists options used only + in the top level, while the 'recursive' variant lists options also + present in any nested packages. + +'--version' +'-V' + Print the version of Autoconf used to generate the 'configure' + script, and exit. + +'--cache-file=FILE' + Enable the cache: use and save the results of the tests in FILE, + traditionally 'config.cache'. FILE defaults to '/dev/null' to + disable caching. + +'--config-cache' +'-C' + Alias for '--cache-file=config.cache'. + +'--quiet' +'--silent' +'-q' + Do not print messages saying which checks are being made. To + suppress all normal output, redirect it to '/dev/null' (any error + messages will still be shown). + +'--srcdir=DIR' + Look for the package's source code in directory DIR. Usually + 'configure' can determine that directory automatically. + +'--prefix=DIR' + Use DIR as the installation prefix. *note Installation Names:: for + more details, including other options available for fine-tuning the + installation locations. + +'--no-create' +'-n' + Run the configure checks, but stop before creating any output + files. + +'configure' also accepts some other, not widely useful, options. Run +'configure --help' for more details. diff --git a/Makefile.am b/Makefile.am new file mode 100644 index 0000000..168da9d --- /dev/null +++ b/Makefile.am @@ -0,0 +1 @@ +SUBDIRS=graph @@ -0,0 +1 @@ +TODO: Announce news here.
\ No newline at end of file @@ -0,0 +1,3 @@ +This is a parser generator to be used as an Emacs dynamic package. + +TODO: Provide more details.
\ No newline at end of file diff --git a/benches/bench_receme.rs b/benches/bench_receme.rs new file mode 100644 index 0000000..c9583a9 --- /dev/null +++ b/benches/bench_receme.rs @@ -0,0 +1,293 @@ +use criterion::{black_box, criterion_group, criterion_main, Criterion}; + +use receme::{ + catana::{Ana, Cata}, + functor::Functor, + hylo::Hylo, + tree::{DFTree, TEStrategy, Tree, TreeIndex}, +}; + +#[derive(Debug, Clone)] +enum Expr<T> { + Add(T, T), + Lit(isize), +} + +impl<T> Functor<T> for Expr<T> { + type Target<S> = Expr<S>; + + fn fmap<S>(self, mut f: impl FnMut(T) -> S) -> Self::Target<S> { + match self { + Expr::Add(a, b) => Expr::Add(f(a), f(b)), + Expr::Lit(value) => Expr::Lit(value), + } + } +} + +#[derive(Debug, Clone)] +enum ExBox { + Add(Box<ExBox>, Box<ExBox>), + Lit(isize), +} + +impl ExBox { + fn lit(value: isize) -> Self { + Self::Lit(value) + } + + fn add(a: ExBox, b: ExBox) -> Self { + Self::Add(Box::new(a), Box::new(b)) + } +} + +pub fn bench(c: &mut Criterion) { + fn construct_tree() -> Tree<Expr<TreeIndex>> { + let strategy: TEStrategy = TEStrategy::UnsafeArena; + + let elements: Vec<Expr<TreeIndex>> = vec![ + Expr::Add(1, 2).fmap(TreeIndex::new), + Expr::Lit(1), + Expr::Add(3, 4).fmap(TreeIndex::new), + Expr::Lit(3), + Expr::Add(5, 6).fmap(TreeIndex::new), + Expr::Lit(10), + Expr::Lit(-14), + ]; + + Tree::new(elements, strategy) + } + + fn construct_box_tree() -> Box<ExBox> { + Box::new(ExBox::add( + ExBox::lit(1), + ExBox::add(ExBox::lit(3), ExBox::add(ExBox::lit(10), ExBox::lit(-14))), + )) + } + + let tree = construct_tree(); + + let safe_tree = { + let mut tree = tree.clone(); + tree.set_strategy(Default::default()); + tree + }; + + let depth_first_tree = { + let mut tree = tree.clone(); + tree.set_strategy(TEStrategy::DepthFirst); + tree + }; + + let boxtree = construct_box_tree(); + + c.bench_function("bench cata", |b| { + b.iter(|| { + let result = tree.clone().cata(|expr: Expr<isize>| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }); + + black_box(result); + }) + }); + + c.bench_function("bench cata safe", |b| { + b.iter(|| { + let result = safe_tree.clone().cata(|expr: Expr<isize>| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }); + + black_box(result); + }) + }); + + c.bench_function("bench cata depth first", |b| { + b.iter(|| { + let result = depth_first_tree + .clone() + .cata(|expr: Expr<isize>| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }); + + black_box(result); + }) + }); + + c.bench_function("bench cata loop", |b| { + b.iter(|| { + let tree = tree.clone(); + + let mut stack: Vec<Result<usize, usize>> = vec![Ok(0)]; + let mut result_stack: Vec<isize> = vec![]; + + while let Some(top) = stack.pop() { + let top_is_ok_p = top.is_ok(); + + let top = match top { + Ok(top) => top, + Err(top) => top, + }; + + let top_node = tree.nth(top).unwrap(); + + match top_node { + Expr::Add(a, b) => { + if top_is_ok_p { + stack.push(Err(top)); + stack.push(Ok(**b)); + stack.push(Ok(**a)); + } else { + let a = result_stack.pop().unwrap(); + let b = result_stack.pop().unwrap(); + + result_stack.push(a + b); + } + } + Expr::Lit(v) => result_stack.push(*v), + } + } + + let result = result_stack.pop().unwrap(); + + black_box(result); + }) + }); + + c.bench_function("bench ana box loop", |b| { + b.iter(|| { + let boxtree = boxtree.clone(); + + let mut elements = vec![]; + + let mut stack = vec![boxtree]; + + while let Some(bt) = stack.pop() { + match *bt { + ExBox::Add(a, b) => { + let len = elements.len(); + + elements.push(Expr::Add(TreeIndex::new(len + 1), TreeIndex::new(len + 2))); + + stack.push(b); + stack.push(a); + } + ExBox::Lit(v) => elements.push(Expr::Lit(v)), + } + } + + let tree: Tree<Expr<TreeIndex>> = Tree::new(elements, Default::default()); + + black_box(tree); + }) + }); + + c.bench_function("bench ana boxed", |b| { + b.iter(|| { + let boxtree = boxtree.clone(); + + let tree = Tree::ana(boxtree, |bt: Box<ExBox>| match *bt { + ExBox::Add(a, b) => Expr::Add(a, b), + ExBox::Lit(v) => Expr::Lit(v), + }); + + black_box(tree); + }) + }); + + c.bench_function("bench ana depth first boxed", |b| { + b.iter(|| { + let boxtree = boxtree.clone(); + + let tree = DFTree::ana(boxtree, |bt: Box<ExBox>| match *bt { + ExBox::Add(a, b) => Expr::Add(a, b), + ExBox::Lit(v) => Expr::Lit(v), + }); + + black_box(tree); + }) + }); + + c.bench_function("bench hylo", |b| { + b.iter(|| { + let boxtree = boxtree.clone(); + + black_box(Tree::hylo( + boxtree, + |expr| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }, + |bt: Box<ExBox>| match *bt { + ExBox::Add(a, b) => Expr::Add(a, b), + ExBox::Lit(v) => Expr::Lit(v), + }, + )) + }) + }); + + c.bench_function("bench hylo loop", |b| { + b.iter(|| { + let boxtree = boxtree.clone(); + + let mut elements = vec![]; + + let mut stack = vec![boxtree]; + + while let Some(bt) = stack.pop() { + match *bt { + ExBox::Add(a, b) => { + let len = elements.len(); + + elements.push(Expr::Add(TreeIndex::new(len + 1), TreeIndex::new(len + 2))); + + stack.push(b); + stack.push(a); + } + ExBox::Lit(v) => elements.push(Expr::Lit(v)), + } + } + + let tree: Tree<Expr<TreeIndex>> = Tree::new(elements, Default::default()); + + let mut stack: Vec<Result<usize, usize>> = vec![Ok(0)]; + let mut result_stack: Vec<isize> = vec![]; + + while let Some(top) = stack.pop() { + let top_is_ok_p = top.is_ok(); + let top = match top { + Ok(top) => top, + Err(top) => top, + }; + + let top_node = tree.nth(top).unwrap(); + + match top_node { + Expr::Add(a, b) => { + if top_is_ok_p { + stack.push(Err(top)); + stack.push(Ok(**b)); + stack.push(Ok(**a)); + } else { + let a = result_stack.pop().unwrap(); + let b = result_stack.pop().unwrap(); + + result_stack.push(a + b); + } + } + Expr::Lit(v) => result_stack.push(*v), + } + } + + let result = result_stack.pop().unwrap(); + + assert_eq!(result, 0); + + black_box(result); + }) + }); +} + +criterion_group!(benches, bench); +criterion_main!(benches); diff --git a/configure.ac b/configure.ac new file mode 100644 index 0000000..fbf987e --- /dev/null +++ b/configure.ac @@ -0,0 +1,28 @@ +AC_PREREQ([2.60]) + +AC_INIT([REP], m4_esyscmd([./find-version.sh]), [durand@jsdurand.xyz]) + +dnl patsubst(m4_esyscmd([./find-version.sh]), ` "') + +AC_COPYRIGHT([This package is covered by GPL v3.]) + +AC_CONFIG_AUX_DIR([build-aux]) + +AM_INIT_AUTOMAKE([subdir-objects]) + +AM_SILENT_RULES([yes]) + +dnl AC_CONFIG_SUBDIRS([graph], [receme], [nfa], [repcore]) +dnl AC_CONFIG_SUBDIRS([graph]) + +AC_PATH_PROG([CARGO], [cargo], [notfound]) +AS_IF([test "$CARGO" = "notfound"], [AC_MSG_ERROR([cargo is required])]) + +AC_PATH_PROG([RUSTC], [rustc], [notfound]) +AS_IF([test "$RUSTC" = "notfound"], [AC_MSG_ERROR([rustc is required])]) + +AC_CONFIG_FILES([Makefile graph/Makefile]) + +AC_OUTPUT + + diff --git a/find-version.sh b/find-version.sh new file mode 100755 index 0000000..4ac390e --- /dev/null +++ b/find-version.sh @@ -0,0 +1,20 @@ +#!/bin/sh +":"; exec emacs --quick --script "$0" -- "$@" # -*- mode: emacs-lisp; lexical-binding: t; -*- + +(with-temp-buffer + (insert-file-contents "Cargo.toml") + (goto-char (point-min)) + (cond + ((search-forward "version =" nil t) + (re-search-forward " *" (line-end-position) t) + (cond + ((= (char-after) 34)) + ((error "Invalid syntax at %d" (point)))) + (let ((end (line-end-position))) + (cond + ((= (char-before end) 34)) + ((error "Invalid syntax at %d" (1- end)))) + (princ + (buffer-substring-no-properties + (1+ (point)) (1- end))))) + ((print "Unknown")))) diff --git a/graph/Cargo.toml b/graph/Cargo.toml new file mode 100644 index 0000000..d8f0622 --- /dev/null +++ b/graph/Cargo.toml @@ -0,0 +1,12 @@ +[package] +name = "graph" +version = "0.1.0" +edition = "2021" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[dependencies] + +[features] + + diff --git a/graph/Makefile.am b/graph/Makefile.am new file mode 100644 index 0000000..776b911 --- /dev/null +++ b/graph/Makefile.am @@ -0,0 +1,12 @@ +.PHONY: dev rel + +all: dev + +dev: + @CARGO@ build + +rel: + @CARGO@ build --release + +clean: + @CARGO@ clean diff --git a/graph/src/adlist.rs b/graph/src/adlist.rs new file mode 100644 index 0000000..c16ceb2 --- /dev/null +++ b/graph/src/adlist.rs @@ -0,0 +1,181 @@ +#![warn(missing_docs)] +//! This file implements a data type that implements the trait +//! [`Graph`][super::Graph]. This data type represents graphs using +//! adjacency lists internally. + +use super::{ExtGraph, Graph}; +use crate::error::Error; + +// #[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)] +// struct ALEdge { +// to: usize, +// } + +// impl ALEdge { +// fn new(to: usize) -> Self { +// Self { to } +// } +// } + +#[derive(Debug, Clone, Default)] +struct ALNode { + children: Vec<usize>, +} + +impl ALNode { + fn new(children: Vec<usize>) -> Self { + Self { children } + } +} + +/// The graph implemented using adjacency lists. +#[derive(Debug, Clone, Default)] +pub struct ALGraph { + nodes: Vec<ALNode>, +} + +impl Graph for ALGraph { + type Iter<'a> = std::iter::Copied<std::slice::Iter<'a, usize>>; + + #[inline] + fn is_empty(&self) -> bool { + self.nodes.is_empty() + } + + #[inline] + fn nodes_len(&self) -> usize { + self.nodes.len() + } + + #[inline] + fn children_of(&self, node_id: usize) -> Result<Self::Iter<'_>, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.children.iter().copied()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + #[inline] + fn degree(&self, node_id: usize) -> Result<usize, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.children.len()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + #[inline] + fn is_empty_node(&self, node_id: usize) -> Result<bool, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.children.is_empty()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + fn has_edge(&self, source: usize, target: usize) -> Result<bool, Error> { + if !self.has_node(source) { + Err(Error::IndexOutOfBounds(source, self.nodes_len())) + } else if !self.has_node(target) { + Err(Error::IndexOutOfBounds(target, self.nodes_len())) + } else { + Ok(self.nodes.get(source).unwrap().children.contains(&target)) + } + } +} + +impl ExtGraph for ALGraph { + fn extend(&mut self, edges: impl IntoIterator<Item = usize>) -> Result<usize, Error> { + let mut new_node_children = Vec::new(); + + for edge_to in edges.into_iter() { + if !self.has_node(edge_to) { + return Err(Error::IndexOutOfBounds(edge_to, self.nodes_len())); + } + + new_node_children.push(edge_to); + } + + let new_node = ALNode::new(new_node_children); + + self.nodes.push(new_node); + + Ok(self.nodes.len() - 1) + } +} + +#[cfg(test)] +mod algraph_test { + use super::*; + + #[test] + fn test_graph_apis() -> Result<(), Error> { + let mut graph = ALGraph::default(); + + assert!(graph.is_empty()); + + graph.extend(std::iter::empty())?; + + graph.extend([0].iter().copied())?; + graph.extend([0, 1].iter().copied())?; + graph.extend([0, 2].iter().copied())?; + graph.extend([1, 2].iter().copied())?; + graph.extend([1, 2, 3].iter().copied())?; + + let graph = graph; + + assert_eq!(graph.nodes_len(), 6); + + assert_eq!(graph.children_of(5)?.collect::<Vec<_>>(), vec![1, 2, 3]); + + assert_eq!(graph.degree(4)?, 2); + + assert!(graph.is_empty_node(0)?); + assert!(!graph.is_empty_node(1)?); + + assert!(graph.has_edge(3, 2)?); + assert!(!graph.has_edge(3, 1)?); + assert_eq!(graph.has_edge(3, 6), Err(Error::IndexOutOfBounds(6, 6))); + + Ok(()) + } + + #[test] + fn test_extending_algraph_normal() -> Result<(), Error> { + let mut graph = ALGraph::default(); + + let new = graph.extend(std::iter::empty())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + let new = graph.extend([0].iter().copied())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + let new = graph.extend([0, 1].iter().copied())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + Ok(()) + } + + #[test] + fn test_extending_algraph_error() -> Result<(), Error> { + let mut graph = ALGraph::default(); + + graph.extend(std::iter::empty())?; + + graph.extend([0].iter().copied())?; + + assert_eq!( + graph.extend([2].iter().copied()), + Err(Error::IndexOutOfBounds(2, 2)) + ); + + Ok(()) + } +} diff --git a/graph/src/adset.rs b/graph/src/adset.rs new file mode 100644 index 0000000..58fed4c --- /dev/null +++ b/graph/src/adset.rs @@ -0,0 +1,229 @@ +#![warn(missing_docs)] +//! This file implements a data type that implements the trait +//! [`Graph`][super::Graph]. This data type represents graphs using +//! adjacency sets internally. +//! +//! I need this because the derivatives languages should not allow +//! duplications of languages, so it is more convenient if the +//! underlying graph type **cannot** represent duplicate edges. + +use super::{ExtGraph, Graph}; +use crate::error::Error; + +// If one wants to use another implementation for a set, import that +// as Set, and nothing else needs to be changed, ideally. +use std::collections::{hash_set::Iter, HashSet as Set}; + +#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)] +struct ASEdge { + to: usize, +} + +impl ASEdge { + fn new(to: usize) -> Self { + Self { to } + } +} + +#[derive(Debug, Clone, Default)] +struct ASNode { + children: Set<ASEdge>, +} + +impl ASNode { + fn new(children: Set<ASEdge>) -> Self { + Self { children } + } +} + +/// The graph implemented using adjacency sets. +#[derive(Debug, Clone, Default)] +pub struct ASGraph { + nodes: Vec<ASNode>, +} + +/// A delegation of iterators. +/// +/// This is here to avoid using a boxed pointer, in order to save some +/// allocations. +pub struct ASIter<'a> { + iter: Iter<'a, ASEdge>, +} + +impl<'a> Iterator for ASIter<'a> { + type Item = usize; + + fn next(&mut self) -> Option<Self::Item> { + self.iter.next().map(|edge| edge.to) + } + + fn size_hint(&self) -> (usize, Option<usize>) { + self.iter.size_hint() + } +} + +impl<'a> ExactSizeIterator for ASIter<'a> { + fn len(&self) -> usize { + self.iter.len() + } +} + +impl Graph for ASGraph { + type Iter<'a> = ASIter<'a>; + + #[inline] + fn is_empty(&self) -> bool { + self.nodes.is_empty() + } + + #[inline] + fn nodes_len(&self) -> usize { + self.nodes.len() + } + + fn children_of(&self, node_id: usize) -> Result<Self::Iter<'_>, Error> { + match self.nodes.get(node_id) { + Some(node) => { + let iter = node.children.iter(); + Ok(Self::Iter { iter }) + } + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + #[inline] + fn degree(&self, node_id: usize) -> Result<usize, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.children.len()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + #[inline] + fn is_empty_node(&self, node_id: usize) -> Result<bool, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.children.is_empty()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes_len())), + } + } + + #[inline] + fn has_edge(&self, source: usize, target: usize) -> Result<bool, Error> { + if !self.has_node(source) { + Err(Error::IndexOutOfBounds(source, self.nodes_len())) + } else if !self.has_node(target) { + Err(Error::IndexOutOfBounds(target, self.nodes_len())) + } else { + Ok(self + .nodes + .get(source) + .unwrap() + .children + .contains(&ASEdge::new(target))) + } + } +} + +impl ExtGraph for ASGraph { + fn extend(&mut self, edges: impl IntoIterator<Item = usize>) -> Result<usize, Error> { + let mut new_node_children = Set::default(); + + for edge_to in edges.into_iter() { + if !self.has_node(edge_to) { + return Err(Error::IndexOutOfBounds(edge_to, self.nodes_len())); + } + + new_node_children.insert(ASEdge::new(edge_to)); + } + + let new_node = ASNode::new(new_node_children); + + self.nodes.push(new_node); + + Ok(self.nodes.len() - 1) + } +} + +#[cfg(test)] +mod asgraph_test { + use super::*; + + #[test] + fn test_graph_apis() -> Result<(), Error> { + let mut graph = ASGraph::default(); + + assert!(graph.is_empty()); + + graph.extend(std::iter::empty())?; + + graph.extend([0].iter().copied())?; + graph.extend([0, 1].iter().copied())?; + graph.extend([0, 2].iter().copied())?; + graph.extend([1, 2].iter().copied())?; + graph.extend([1, 2, 3].iter().copied())?; + + let graph = graph; + + assert_eq!(graph.nodes_len(), 6); + + assert_eq!(graph.children_of(5)?.collect::<Set<_>>(), { + let mut set = Set::default(); + set.insert(1); + set.insert(3); + set.insert(2); + set + }); + + assert_eq!(graph.degree(4)?, 2); + + assert!(graph.is_empty_node(0)?); + assert!(!graph.is_empty_node(1)?); + + assert!(graph.has_edge(3, 2)?); + assert!(!graph.has_edge(3, 1)?); + assert_eq!(graph.has_edge(3, 6), Err(Error::IndexOutOfBounds(6, 6))); + + Ok(()) + } + + #[test] + fn test_extending_algraph_normal() -> Result<(), Error> { + let mut graph = ASGraph::default(); + + let new = graph.extend(std::iter::empty())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + let new = graph.extend([0].iter().copied())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + let new = graph.extend([0, 1].iter().copied())?; + + println!("new index = {new}"); + + println!("new graph = {graph:?}"); + + Ok(()) + } + + #[test] + fn test_extending_algraph_error() -> Result<(), Error> { + let mut graph = ASGraph::default(); + + graph.extend(std::iter::empty())?; + + graph.extend([0].iter().copied())?; + + assert_eq!( + graph.extend([2].iter().copied()), + Err(Error::IndexOutOfBounds(2, 2)) + ); + + Ok(()) + } +} diff --git a/graph/src/error.rs b/graph/src/error.rs new file mode 100644 index 0000000..2162685 --- /dev/null +++ b/graph/src/error.rs @@ -0,0 +1,26 @@ +#![warn(missing_docs)] +//! This file implements the error data type of the graph library. + +use std::fmt::{self, Display}; + +/// The error type for methods of the trait [`Graph`][`super::Graph`]. +#[derive(Debug, Clone, PartialEq, Eq, Ord, PartialOrd)] +pub enum Error { + /// The index is out of bounds. + /// + /// The first component is the index that is out of bounds, and + /// the second component is the current length of nodes. + IndexOutOfBounds(usize, usize), +} + +impl Display for Error { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Error::IndexOutOfBounds(index, len) => { + write!(f, "index {index} out of bounds {len} ") + } + } + } +} + +impl std::error::Error for Error {} diff --git a/graph/src/labelled.rs b/graph/src/labelled.rs new file mode 100644 index 0000000..1cb2461 --- /dev/null +++ b/graph/src/labelled.rs @@ -0,0 +1,426 @@ +#![warn(missing_docs)] +//! This file implements a labelled graph. See the +//! [trait][super::LabelGraph] for details. +//! +//! Since the method +//! [`find_children_with_label`][super::LabelGraph::find_children_with_label] +//! needs to be implemented efficiently, we store the mappings between +//! labels and edges in both directions. + +#[allow(unused_imports)] +use super::{Graph, GraphLabel, LabelExtGraph, LabelGraph}; +#[allow(unused_imports)] +use crate::error::Error; + +// We use BTreeMap and BTreeSet here as we need to exclude duplicate +// edge sets, while an ordinary hashmap and hashset do not allow +// hashing. +use std::collections::{ + btree_map::{Iter as MapIter, Keys}, + btree_set::Iter, + BTreeMap as Map, BTreeSet as Set, HashMap as HMap, +}; + +#[derive(Debug, Clone, Default)] +struct DLNode<T: GraphLabel> { + by_target: Map<usize, Set<T>>, + by_label: Map<T, Set<usize>>, + flat: Vec<(T, usize)>, +} + +impl<T: GraphLabel> DLNode<T> { + fn new( + by_target: Map<usize, Set<T>>, + by_label: Map<T, Set<usize>>, + flat: Vec<(T, usize)>, + ) -> Self { + Self { + by_target, + by_label, + flat, + } + } +} + +/// Mapping a set of edges to an index of node. +type EdgeMap<T> = HMap<Set<(T, usize)>, usize>; + +/// Double direction Labelled Graph. +/// +/// Each node is supposed to have a unique edge set. Constructing +/// methods such as from the trait +/// [`LabelExtGraph`][super::LabelExtGraph] already handles the +/// elimination of duplication. +#[derive(Debug, Clone)] +pub struct DLGraph<T: GraphLabel> { + nodes: Vec<DLNode<T>>, + edges_table: EdgeMap<T>, +} + +impl<T: GraphLabel> DLGraph<T> { + #[inline] + /// Return an empty graph. + pub fn new() -> Self { + Self { + nodes: Vec::new(), + edges_table: HMap::default(), + } + } +} + +impl<T: GraphLabel> Default for DLGraph<T> { + #[inline] + fn default() -> Self { + Self::new() + } +} + +impl<T: GraphLabel> Graph for DLGraph<T> { + // Not using a boxed pointer is supposed to save some allocations. + type Iter<'a> = std::iter::Copied<Keys<'a, usize, Set<T>>> where T: 'a; + + #[inline] + fn is_empty(&self) -> bool { + self.nodes.is_empty() + } + + #[inline] + fn nodes_len(&self) -> usize { + self.nodes.len() + } + + #[inline] + fn children_of(&self, node_id: usize) -> Result<Self::Iter<'_>, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(node.by_target.keys().copied()), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes.len())), + } + } + + #[inline] + /// Return the number of "children" of a node, or an error if the + /// node is not a member of the graph. + /// + /// This counts edges with different labels as different edges. + fn degree(&self, node_id: usize) -> Result<usize, Error> { + self.nodes + .get(node_id) + .ok_or(Error::IndexOutOfBounds(node_id, self.nodes.len())) + .map(|node| node.flat.len()) + } + + #[inline] + fn is_empty_node(&self, node_id: usize) -> Result<bool, Error> { + self.nodes + .get(node_id) + .ok_or(Error::IndexOutOfBounds(node_id, self.nodes.len())) + .map(|node| node.flat.is_empty()) + } + + fn has_edge(&self, source: usize, target: usize) -> Result<bool, Error> { + match self.nodes.get(source) { + Some(source_node) => { + if self.nodes.get(target).is_none() { + return Err(Error::IndexOutOfBounds(target, self.nodes.len())); + } + + Ok(source_node.by_target.contains_key(&target)) + } + None => Err(Error::IndexOutOfBounds(source, self.nodes.len())), + } + } +} + +/// A delegation of iterators. +/// +/// This is used to avoid a boxed pointer to an iterator. +#[derive(Default, Debug)] +pub struct LabelIndexIter<'a> { + iter: Option<std::iter::Copied<Iter<'a, usize>>>, +} + +impl<'a> Iterator for LabelIndexIter<'a> { + type Item = usize; + + #[inline] + fn next(&mut self) -> Option<Self::Item> { + self.iter.as_mut().map(|iterator| iterator.next()).flatten() + } + + #[inline] + fn size_hint(&self) -> (usize, Option<usize>) { + match &self.iter { + Some(iter) => iter.size_hint(), + None => (0, Some(0)), + } + } +} + +impl<'a> ExactSizeIterator for LabelIndexIter<'a> { + #[inline] + fn len(&self) -> usize { + match &self.iter { + Some(iter) => iter.len(), + None => 0, + } + } +} + +impl<'a> LabelIndexIter<'a> { + fn new(iter: std::iter::Copied<Iter<'a, usize>>) -> Self { + let iter = Some(iter); + Self { iter } + } +} + +// A convenience method +impl<'a> From<&'a Set<usize>> for LabelIndexIter<'a> { + fn from(set: &'a Set<usize>) -> Self { + Self::new(set.iter().copied()) + } +} + +#[derive(Debug)] +/// A delegation of iterators. +/// +/// This is used to avoid a boxed pointer to an iterator. +pub struct LabelIter<'a, T> { + iter: MapIter<'a, T, Set<usize>>, +} + +impl<'a, T> ExactSizeIterator for LabelIter<'a, T> { + #[inline] + fn len(&self) -> usize { + self.iter.len() + } +} + +impl<'a, T> LabelIter<'a, T> { + fn new(iter: MapIter<'a, T, Set<usize>>) -> Self { + Self { iter } + } +} + +impl<'a, T> Iterator for LabelIter<'a, T> { + type Item = (&'a T, LabelIndexIter<'a>); + + #[inline] + fn next(&mut self) -> Option<Self::Item> { + self.iter.next().map(|(label, set)| (label, set.into())) + } + + #[inline] + fn size_hint(&self) -> (usize, Option<usize>) { + self.iter.size_hint() + } +} + +impl<T: GraphLabel> LabelGraph<T> for DLGraph<T> { + type Iter<'a> = LabelIndexIter<'a> where T: 'a; + + type LabelIter<'a> = LabelIter<'a,T> where T: 'a; + + fn edge_label(&self, source: usize, target: usize) -> Result<Vec<T>, Error> { + if self.has_edge(source, target)? { + Ok(self + .nodes + .get(source) + .unwrap() + .by_target + .get(&target) + .unwrap() + .iter() + .copied() + .collect()) + } else { + Ok(Vec::new()) + } + } + + fn find_children_with_label( + &self, + node_id: usize, + label: &T, + ) -> Result<<Self as LabelGraph<T>>::Iter<'_>, Error> { + match self + .nodes + .get(node_id) + .ok_or(Error::IndexOutOfBounds(node_id, self.nodes.len()))? + .by_label + .get(label) + { + Some(set) => Ok(set.into()), + None => Ok(Default::default()), + } + } + + #[inline] + fn labels_of(&self, node_id: usize) -> Result<Self::LabelIter<'_>, Error> { + match self.nodes.get(node_id) { + Some(node) => Ok(Self::LabelIter::new(node.by_label.iter())), + None => Err(Error::IndexOutOfBounds(node_id, self.nodes.len())), + } + } +} + +impl<T: GraphLabel> LabelExtGraph<T> for DLGraph<T> { + fn extend(&mut self, edges: impl IntoIterator<Item = (T, usize)>) -> Result<usize, Error> { + let mut by_target: Map<usize, Set<T>> = Map::default(); + let mut by_label: Map<T, Set<usize>> = Map::default(); + let mut flat = Vec::new(); + let mut edges_set = Set::new(); + + for (label, to) in edges { + if !self.has_node(to) { + return Err(Error::IndexOutOfBounds(to, self.nodes.len())); + } + + edges_set.insert((label, to)); + + if let Some(set) = by_target.get(&to) { + if !set.contains(&label) { + flat.push((label, to)); + by_target.get_mut(&to).unwrap().insert(label); + by_label + .entry(label) + .or_insert_with(Default::default) + .insert(to); + } + } else { + flat.push((label, to)); + by_target + .entry(to) + .or_insert_with(Default::default) + .insert(label); + by_label + .entry(label) + .or_insert_with(Default::default) + .insert(to); + } + } + + match self.edges_table.get(&edges_set) { + Some(old_index) => Ok(*old_index), + None => { + let new_node = DLNode::new(by_target, by_label, flat); + let new_index = self.nodes_len(); + + self.edges_table.insert(edges_set, new_index); + + self.nodes.push(new_node); + + Ok(new_index) + } + } + } +} + +#[cfg(test)] +mod label_test { + use super::*; + + macro_rules! set { + () => { Set::<usize>::default() }; + ($($num:literal),*) => { + { + let mut set: Set<usize> = Set::default(); + $(set.insert($num);)* + set + } + }; + } + + macro_rules! map { + () => { Map::<usize, Set<usize>>::default() }; + ($(($key:literal, $value:expr)),*) => { + { + let mut map: Map<usize, Set<usize>> = Map::default(); + $(map.insert($key, $value);)* + map + } + }; + } + + #[test] + fn test_graph_apis() -> Result<(), Error> { + let mut graph: DLGraph<usize> = Default::default(); + + // testing empty graph + assert!(graph.is_empty()); + + // testing adding an empty node + assert_eq!(graph.extend(std::iter::empty())?, 0); + + // testing nodes_len + assert_eq!(graph.nodes_len(), 1); + + // testing extension + + assert_eq!(graph.extend([(0, 0)].iter().copied())?, 1); + assert_eq!(graph.extend([(1, 0), (1, 1)].iter().copied())?, 2); + assert_eq!(graph.extend([(3, 0), (3, 2)].iter().copied())?, 3); + assert_eq!(graph.extend([(1, 1), (1, 2)].iter().copied())?, 4); + assert_eq!(graph.extend([(2, 1), (3, 2), (2, 3)].iter().copied())?, 5); + + // testing adding a duplicated edge set + assert_eq!(graph.extend([(2, 1), (2, 3), (3, 2)].iter().copied())?, 5); + assert_eq!(graph.extend([(3, 2), (3, 0)].iter().copied())?, 3); + + let graph = graph; + + // ensuring the correct length + assert_eq!(graph.nodes_len(), 6); + + // testing children_of + assert_eq!(graph.children_of(5)?.collect::<Set<_>>(), set!(1, 3, 2)); + + // testing find_children_with_label + assert_eq!( + graph.find_children_with_label(5, &2)?.collect::<Set<_>>(), + set!(1, 3) + ); + + // testing edge_label + assert_eq!( + graph.edge_label(5, 2)?.into_iter().collect::<Set<_>>(), + set!(3) + ); + assert!(matches!( + graph.edge_label(6, 2), + Err(Error::IndexOutOfBounds(6, 6)) + )); + + // testing degree + assert_eq!(graph.degree(4)?, 2); + + // testing is_empty_node + assert!(graph.is_empty_node(0)?); + assert!(!graph.is_empty_node(1)?); + + // testing has_edge + assert!(graph.has_edge(3, 2)?); + assert!(!graph.has_edge(3, 1)?); + assert!(matches!( + graph.has_edge(3, 6), + Err(Error::IndexOutOfBounds(6, 6)) + )); + + // testing labels_of + let mut label_map: Map<usize, Set<usize>> = Map::default(); + + for (label, children) in graph.labels_of(5)? { + label_map.insert(*label, children.collect()); + } + + let compare_map = map!((2, set!(1, 3)), (3, set!(2))); + + assert_eq!(label_map, compare_map); + + assert!(matches!( + graph.labels_of(6), + Err(Error::IndexOutOfBounds(6, 6)) + )); + + Ok(()) + } +} diff --git a/graph/src/lib.rs b/graph/src/lib.rs new file mode 100644 index 0000000..7b74ee1 --- /dev/null +++ b/graph/src/lib.rs @@ -0,0 +1,203 @@ +#![warn(missing_docs)] +//! This crate implements a trait API for graphs that the crate "rep" +//! needs. +//! +//! Also a default implementation for the trait is provided, so that +//! by default no external crates are needed, whereas it is easy to +//! use external crates, if so derired. + +use std::hash::Hash; + +pub mod error; + +pub mod adset; + +pub use adset::ASGraph; + +pub mod adlist; + +pub use adlist::ALGraph; + +pub mod labelled; + +pub use labelled::DLGraph; + +use error::Error; + +/// The expected behaviour of an immutable graph. +pub trait Graph: Default { + /// A type that iterates through the node indices. + type Iter<'a>: Iterator<Item = usize> + 'a + where + Self: 'a; + + /// Return true if and only if the graph has no nodes. + fn is_empty(&self) -> bool; + + /// Return the length of nodes in the graph. + fn nodes_len(&self) -> usize; + + #[inline] + /// Return the length of edges in the graph. + /// + /// This is optional. Implementations need not support this. + fn edges_len(&self) -> Option<usize> { + None + } + + #[inline] + /// Return an iterator of nodes represented as unsigned integers. + /// + /// If a custom application needs to have labels on nodes, just + /// associate the label to the node internally, and define an + /// extension trait that allows querying those additional labels. + /// + /// This design choice is based on the idea that this library + /// should be minimal and only provide the core of a graph. As a + /// label is not really core, it is not included here. + fn nodes(&self) -> Box<dyn Iterator<Item = usize> + '_> { + Box::new(0..self.nodes_len()) + } + + /// Return an iterator of edges going out of a node. + /// + /// Return an error if the node is not known to the graph. + fn children_of(&self, node_id: usize) -> Result<Self::Iter<'_>, Error>; + + #[inline] + /// Return an iterator of edges represented as pairs (FROM, TO). + /// + /// The default implementation iterates through the nodes and then + /// iterates through their children. If the implementation has a + /// more efficient method, overwrite this method. + fn edges(&self) -> Box<dyn Iterator<Item = (usize, usize)> + '_> { + Box::new(self.nodes().flat_map(|node| { + self.children_of(node) + // If this node is invalid, this means the + // implementation of `nodes` is wrong, so it is + // appropriate to panic here. + .unwrap() + .map(move |child| (node, child)) + })) + } + + #[inline] + /// Return true if and only if the node is in the graph. + fn has_node(&self, node_id: usize) -> bool { + (0..self.nodes_len()).contains(&node_id) + } + + /// Return the number of "children" of a node, or an error if the + /// node is not a member of the graph. + fn degree(&self, node_id: usize) -> Result<usize, Error>; + + /// Return a boolean indicating if the node has no children, or an + /// error if the node is not a member of the graph. + fn is_empty_node(&self, node_id: usize) -> Result<bool, Error>; + + /// Return true if and only if there is an edge from the source to + /// the target. + /// + /// Return an error if either the source or the target is an + /// invalid node. + fn has_edge(&self, source: usize, target: usize) -> Result<bool, Error>; +} + +/// A graph that can be extended, but not mutated, in the sense that +/// existing nodes and edges will not be modified nor removed, but new +/// nodes can be added. The index of the new node will be returned. +/// +/// Implementations can choose to keep a set of sets of edges, so that +/// new nodes will not have duplicate edge sets. In this case, the +/// returned new node index is not necessarily equal to +/// self.nodes_len() - 1, and hence the return type is designed in +/// this way. +pub trait ExtGraph: Graph { + /// Add a new node with `edges`. + /// + /// If an edge from `edges` points to a non-existent node, return + /// an error. + fn extend(&mut self, edges: impl IntoIterator<Item = usize>) -> Result<usize, Error>; +} + +/// The type of labels should be comparable and hashable. +pub trait GraphLabel: Hash + Eq + PartialEq + Clone + Copy + Ord + PartialOrd {} + +impl<T: Hash + Eq + PartialEq + Clone + Copy + Ord + PartialOrd> GraphLabel for T {} + +/// A labelled graph is just a graph with labels associated to +/// vertices and / or edges. +/// +/// This trait defines what the package needs out of a labelled graph. +/// +/// Any implementation should be able to handle a set of types for +/// labels, so this trait is generic over the label type. +pub trait LabelGraph<T: GraphLabel>: Graph { + /// A type that iterates through the node indices. + type Iter<'a>: Iterator<Item = usize> + 'a + where + Self: 'a; + + /// A type that iterates through labels. + type LabelIter<'a>: Iterator<Item = (&'a T, <Self as LabelGraph<T>>::Iter<'a>)> + 'a + where + Self: 'a, + T: 'a; + + #[inline] + /// Return the label of a vertex or an error if the node is + /// invalid. + /// + /// The default implementation always returns None for a valid + /// node. + fn vertex_label(&self, node_id: usize) -> Result<Option<T>, Error> { + if self.has_node(node_id) { + Ok(None) + } else { + Err(Error::IndexOutOfBounds(node_id, self.nodes_len())) + } + } + + #[inline] + /// Return the label of an edge or an error if some node is + /// invalid. + /// + /// The default implementation always returns an empty vector for + /// valid nodes. + fn edge_label(&self, source: usize, target: usize) -> Result<Vec<T>, Error> { + self.has_edge(source, target).map(|_| Vec::new()) + } + + /// Return an iterator of edges out of a node, whose associated + /// label is as given. + /// + /// The efficiency of this method matters in implementations. + fn find_children_with_label( + &self, + node_id: usize, + label: &T, + ) -> Result<<Self as LabelGraph<T>>::Iter<'_>, Error>; + + /// Return an iterator of labels of edges out of a node. + /// + /// The efficiency of this method matters in implementations. + fn labels_of(&self, node_id: usize) -> Result<Self::LabelIter<'_>, Error>; +} + +/// A labelled graph that can be extended, but not mutated, in the +/// sense that existing nodes and edges will not be modified nor +/// removed, but new nodes can be added. The index of the new node +/// will be returned. +/// +/// Implementations can choose to keep a set of sets of edges, so that +/// new nodes will not have duplicate edge sets. In this case, the +/// returned new node index is not necessarily equal to +/// self.nodes_len() - 1, and hence the return type is designed in +/// this way. +pub trait LabelExtGraph<T: GraphLabel>: LabelGraph<T> { + /// Add a new node with `edges`. + /// + /// If an edge from `edges` points to a non-existent node, return + /// an error. + fn extend(&mut self, edges: impl IntoIterator<Item = (T, usize)>) -> Result<usize, Error>; +} diff --git a/nfa/Cargo.toml b/nfa/Cargo.toml new file mode 100644 index 0000000..b1387b6 --- /dev/null +++ b/nfa/Cargo.toml @@ -0,0 +1,13 @@ +[package] +name = "nfa" +version = "0.1.0" +edition = "2021" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[dependencies] +graph = { path = "../graph", optional = true } + +[features] +default = ["default-graph"] +default-graph = ["dep:graph"] diff --git a/nfa/src/default/mod.rs b/nfa/src/default/mod.rs new file mode 100644 index 0000000..805540b --- /dev/null +++ b/nfa/src/default/mod.rs @@ -0,0 +1,123 @@ +//! This file provides a structure that implements the trait +//! [`NFA`][super::Nfa]. +//! +//! It is used as the default implementation. + +use graph::{error::Error as GError, DLGraph, Graph, GraphLabel, LabelGraph}; + +use super::{error::Error, Nfa, Regex}; + +// TODO: Store the regular expression in the NFA as well. +// +// The current focus of the project is to understand the growth rate +// of the algorithm, to know whether I made a mistake in the previous +// iteration of the implementation, or the algorithm is not as fast as +// the author estimated, which is not quite likely, of course. +// +// Thus I shall establish a friendly interface that allows me to view +// and debug the atomic languages and the languages, transparently. + +#[non_exhaustive] +#[derive(Debug)] +/// Default NFA implementation. +pub struct DefaultNFA<T: GraphLabel> { + graph: DLGraph<T>, +} + +impl<T: GraphLabel> Default for DefaultNFA<T> { + fn default() -> Self { + let graph = Default::default(); + Self { graph } + } +} + +impl<T: GraphLabel> Graph for DefaultNFA<T> { + type Iter<'a> = <DLGraph<T> as Graph>::Iter<'a> where T: 'a; + + #[inline] + fn is_empty(&self) -> bool { + self.graph.is_empty() + } + + #[inline] + fn nodes_len(&self) -> usize { + self.graph.nodes_len() + } + + #[inline] + fn children_of(&self, node_id: usize) -> Result<Self::Iter<'_>, GError> { + self.graph.children_of(node_id) + } + + #[inline] + fn degree(&self, node_id: usize) -> Result<usize, GError> { + self.graph.degree(node_id) + } + + #[inline] + fn is_empty_node(&self, node_id: usize) -> Result<bool, GError> { + self.graph.is_empty_node(node_id) + } + + #[inline] + fn has_edge(&self, source: usize, target: usize) -> Result<bool, GError> { + self.graph.has_edge(source, target) + } +} + +impl<T: GraphLabel> LabelGraph<T> for DefaultNFA<T> { + type Iter<'a> = <DLGraph<T> as LabelGraph<T>>::Iter<'a> where T: 'a; + + type LabelIter<'a> = <DLGraph<T> as LabelGraph<T>>::LabelIter<'a> where T: 'a; + + // TODO: Return the label from the contained regular language. + #[inline] + fn vertex_label(&self, node_id: usize) -> Result<Option<T>, GError> { + if self.has_node(node_id) { + todo!() + } else { + Err(GError::IndexOutOfBounds(node_id, self.nodes_len())) + } + } + + #[inline] + fn edge_label(&self, source: usize, target: usize) -> Result<Vec<T>, GError> { + self.graph.edge_label(source, target) + } + + #[inline] + fn find_children_with_label( + &self, + node_id: usize, + label: &T, + ) -> Result<<Self as LabelGraph<T>>::Iter<'_>, GError> { + self.graph.find_children_with_label(node_id, label) + } + + #[inline] + fn labels_of(&self, node_id: usize) -> Result<Self::LabelIter<'_>, GError> { + self.graph.labels_of(node_id) + } +} + +impl<T: GraphLabel> Nfa<T> for DefaultNFA<T> { + #[allow(unused)] + fn to_nfa(regex: impl Regex<T>) -> Self { + todo!() + } + + fn remove_epsilon(&mut self) -> Result<(), Error> { + todo!() + } + + fn remove_dead(&mut self) -> Result<(), Error> { + todo!() + } + + fn nulling(&mut self) -> Result<(), Error> { + todo!() + } +} + +#[cfg(test)] +mod default_nfa_test {} diff --git a/nfa/src/error.rs b/nfa/src/error.rs new file mode 100644 index 0000000..6112878 --- /dev/null +++ b/nfa/src/error.rs @@ -0,0 +1,30 @@ +//! This file implements the error type for the crate. + +use graph::error::Error as GError; + +use std::fmt::{Display, Formatter}; + +#[derive(Debug, Clone, Eq, PartialEq, Ord, PartialOrd)] +pub enum Error { + UnknownNode(usize), + UnsupportedOperation, + Graph(GError), +} + +impl Display for Error { + fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { + match self { + Error::UnknownNode(id) => write!(f, "unknown node: {id}"), + Error::UnsupportedOperation => write!(f, "unsupported operation"), + Error::Graph(e) => write!(f, "graph error: {e}"), + } + } +} + +impl std::error::Error for Error {} + +impl From<GError> for Error { + fn from(e: GError) -> Self { + Self::Graph(e) + } +} diff --git a/nfa/src/lib.rs b/nfa/src/lib.rs new file mode 100644 index 0000000..ef207cf --- /dev/null +++ b/nfa/src/lib.rs @@ -0,0 +1,94 @@ +#![warn(missing_docs)] +//! This crate implements non-deterministic finite automata. +//! +//! By default this uses the graph from the crate [`graph`]. To use +//! another external graph, add a module in which the external graph +//! implements the Graph trait from the [`graph`] crate, and then use +//! that external graph type as [`Graph`][graph::Graph] here. + +mod error; + +extern crate graph; + +use core::fmt::Display; + +use graph::{Graph, GraphLabel, LabelGraph}; + +use error::Error; + +/// The expected behaviour of a regular language. +/// +/// Nondeterministic finite automata are equivalent to regular +/// languages. Since regular languages are easier to understand for a +/// human being, nondeterministic finite automata include the data for +/// the equivalent regular languages. +pub trait Regex<T: GraphLabel>: Graph + Display { + /// Return the label of a vertex, or an error if the node is + /// invalid. + fn vertex_label(&self, node_id: usize) -> Result<T, Error>; + + #[inline] + /// Return the root node of the regular language. + /// + /// Implementations can follow different conventions for the root + /// node, and hence this function. + /// + /// If the regular language is empty, the implementation should + /// return None. + /// + /// The default implementation uses the convention that the root + /// node is always the first node. + fn root(&self) -> Option<usize> { + if self.is_empty() { + None + } else { + Some(0) + } + } + + // TODO: add functions that determine if certain "positions" in a + // regular language satisfy some special properties, like at the + // end of a Kleene star, or at the end of a regular language, et + // cetera. These will be needed later. +} + +/// The expected behvaiour of a nondeterministic finite automaton. +/// +/// Every NFA is a special labelled graph, so this trait extends the +/// [`LabelGraph`][graph::LabelGraph] trait. +pub trait Nfa<T: GraphLabel>: LabelGraph<T> { + /// Remove all empty transitions from the nondeterministic finite + /// automaton. + fn remove_epsilon(&mut self) -> Result<(), Error>; + + /// Return a state-minimal NFA equivalent with the original one. + /// + /// This is not required. It is just to allow me to experiment + /// with NFA optimization algorithms. + fn minimize(&self) -> Result<Self, Error> { + Err(Error::UnsupportedOperation) + } + + /// Build a nondeterministic finite automaton out of a regular + /// language. + fn to_nfa(regex: impl Regex<T>) -> Self; + + /// Remove all dead states from the nondeterministic finite + /// automaton. + /// + /// A state is dead if there are no edges going to the state. + fn remove_dead(&mut self) -> Result<(), Error>; + + /// For each empty transition from A to B, and for every edge from + /// B to C, say, add an edge from A to C. + /// + /// This is needed specifically by the algorithm to produce a set + /// of atomic languages that represent "null-closed" derived + /// languages. + fn nulling(&mut self) -> Result<(), Error>; +} + +pub mod default; + +#[cfg(test)] +mod nfa_tests {} diff --git a/receme/Cargo.toml b/receme/Cargo.toml new file mode 100644 index 0000000..dccf28c --- /dev/null +++ b/receme/Cargo.toml @@ -0,0 +1,9 @@ +[package] +name = "receme" +version = "0.1.0" +edition = "2021" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[dependencies] + diff --git a/receme/src/algebra.rs b/receme/src/algebra.rs new file mode 100644 index 0000000..49e0932 --- /dev/null +++ b/receme/src/algebra.rs @@ -0,0 +1,14 @@ +//! This file defines the algebra trait. +//! +//! If F is an endo-functor, then an F-algebra is a natural +//! transformation from F to the identity functor. + +use super::functor::Functor; + +/// An algebra is a function from F(T) to T. +/// +/// This is a "trait alias". Since Rust does not support trait alias +/// yet, we define an empty trait with an automatic implementation. +pub trait Algebra<T, F: Functor<T>>: FnMut(F) -> T {} + +impl<T, F: Functor<T>, A: FnMut(F) -> T> Algebra<T, F> for A {} diff --git a/receme/src/catana.rs b/receme/src/catana.rs new file mode 100644 index 0000000..e3d4728 --- /dev/null +++ b/receme/src/catana.rs @@ -0,0 +1,27 @@ +//! This file defines behaviours of catamorphisms and anamorphisms. +//! +//! A catamorphism collapses a recursive structure into a flat value, +//! whereas an anamorphism expands a value into a recursive structure. + +use crate::{algebra::Algebra, coalgebra::Coalgebra, functor::Functor}; + +/// A type that implements Cata is capable of being collapsed into a single value. +/// +/// The catamorphism consumes `self`. +pub trait Cata<T, F: Functor<T>, A: Algebra<T, F>> { + /// A catamorphism takes a recursive structure and an algebra for + /// the recursive structure, and returns a single, flat, collapsed + /// value. + fn cata(self, alg: A) -> T; +} + +/// A type that implements Ana is capable of being expanded from a +/// flat value. +/// +/// The anamorphism consumes `self`. +pub trait Ana<T, F: Functor<T>, C: Coalgebra<T, F>> { + /// An anamorphism takes a single, flat, collapsed value and a + /// co-algebra for a recursive structure, and returns that + /// recursive structure. + fn ana(value: T, coalg: C) -> Self; +} diff --git a/receme/src/coalgebra.rs b/receme/src/coalgebra.rs new file mode 100644 index 0000000..5974d90 --- /dev/null +++ b/receme/src/coalgebra.rs @@ -0,0 +1,14 @@ +//! This file defines the co-algebra trait. +//! +//! If F is an endo-functor, then an F-co-algebra is a natural +//! transformation from the identity functor to F. + +use super::functor::Functor; + +/// A co-algebra is a function from T to F(T). +/// +/// This is a "trait alias". Since Rust does not support trait alias +/// yet, we define an empty trait with an automatic implementation. +pub trait Coalgebra<T, F: Functor<T>>: FnMut(T) -> F {} + +impl<T, F: Functor<T>, C: FnMut(T) -> F> Coalgebra<T, F> for C {} diff --git a/receme/src/coralgebra.rs b/receme/src/coralgebra.rs new file mode 100644 index 0000000..a4a73d2 --- /dev/null +++ b/receme/src/coralgebra.rs @@ -0,0 +1,28 @@ +//! This file defines the behaviours of R-co-algebras. +//! +//! For a functor F, an R-co-algebra is a natural transformation from +//! the identity functor to F1, where F1 is the functor that sends T +//! to F(F(T) + T), where the + is the categorical co-product, +//! represented in Rust as `Result`. And the F(T) variant means to +//! short-circuit the expansion. + +use crate::functor::Functor; + +/// An R-co-algebra is a function from T to F(Result<F(T), T>). +/// +/// This is a "trait alias". Since Rust does not support trait alias +/// yet, we define an empty trait with an automatic implementation. +pub trait Coralgebra<T, F, G>: FnMut(T) -> G +where + F: Functor<T>, + G: Functor<Result<T, F>>, +{ +} + +impl<T, F, G, R> Coralgebra<T, F, G> for R +where + F: Functor<T>, + G: Functor<Result<T, F>>, + R: FnMut(T) -> G, +{ +} diff --git a/receme/src/functor.rs b/receme/src/functor.rs new file mode 100644 index 0000000..95e2555 --- /dev/null +++ b/receme/src/functor.rs @@ -0,0 +1,64 @@ +//! This file defines the Functor trait. +//! +//! This is the basis of the recursion scheme. +//! +//! More precisely, this file provides two versions of functor traits: +//! one whose `map` function consumes `self`, and one whose `map` does +//! not. + +/// A functor can map over its generic parameter. +/// +/// It can map from Functor(T) to Functor(S). +pub trait Functor<T> { + /// The target of the map. + /// + /// Since Rust has no higher-kinded polymorphism, we have to + /// express this type explicitly. + /// + /// # Note + /// + /// This is a generic associated type, so we need a minimal Rust + /// version of 1.65, when this feature was first introduced to + /// stable Rust. + type Target<S>: Functor<S>; + + /// Map from Functor(T) to Functor(S). + /// + /// # Note + /// + /// This consumes `self`. If one wants not to consume `self`, + /// then consider the trait [`FunctorRef`]. + fn fmap<S>(self, f: impl FnMut(T) -> S) -> Self::Target<S>; +} + +/// A functor can map over its generic type parameter. +/// +/// It can map from Functor(T) to Functor(S). +/// +/// This is similar to [`Functor`], but the +/// [`fmap`][FunctorRef<T>::fmap_ref] method takes a reference and +/// does not consume `self`. +pub trait FunctorRef<T> { + /// The target of the map. + /// + /// Since Rust has no higher-kinded polymorphism, we have to + /// express this type explicitly. + /// + /// # Note + /// + /// This is a generic associated type, so we need a minimal Rust + /// version of 1.65, when this feature was first introduced to + /// stable Rust. + type Target<S>: Functor<S>; + + /// Map from Functor(T) to Functor(S). + /// + /// # Note + /// + /// This does notconsume `self`. If one wants to consume `self`, + /// then consider the trait [`Functor`]. + /// + /// To avoid having to specify the trait when calling the method, + /// we give it a distinct name. + fn fmap_ref<S>(&self, f: impl FnMut(T) -> S) -> Self::Target<S>; +} diff --git a/receme/src/hylo.rs b/receme/src/hylo.rs new file mode 100644 index 0000000..8d26c19 --- /dev/null +++ b/receme/src/hylo.rs @@ -0,0 +1,41 @@ +//! This file defines the behaviour of hylomorphisms. +//! +//! A hylomorphism is a composition of an anamorphism with a +//! catamorphism. But it is separated into a distinct trait as we can +//! implement hylomorphisms more efficiently by short-circuiting +//! during the expansion by the anamorphism. + +use crate::{ + algebra::Algebra, + catana::{Ana, Cata}, + coalgebra::Coalgebra, + functor::Functor, +}; + +/// A type implementing Hylo is able to expand from a value of type +/// `U` into a recursive structure holding values of type `G`, and +/// also to collapse from that recursive structure to a value of type +/// `T`. +/// +/// Types which implement [`Cata`] and [`Ana`] compatibly can +/// automatically implement [`Hylo`] as follows. +/// +/// But of course this is not efficient, and types should implement in +/// a more efficient manner, if available. +/// +/// ```ignore +/// Inter::ana(value, coalg).cata(alg) +/// ``` +pub trait Hylo<T, S, U, F, G, H, A, C>: Cata<T, F, A> + Ana<U, H, C> +where + F: Functor<T>, + G: Functor<S, Target<T> = F>, + H: Functor<U, Target<S> = G>, + A: Algebra<T, F>, + C: Coalgebra<U, H>, +{ + /// Expand from `value` to an intermediate recursive structure, by + /// means of a coalgebra, and then collapse into the result value, + /// by means of an algebra. + fn hylo(value: U, alg: A, coalg: C) -> T; +} diff --git a/receme/src/lib.rs b/receme/src/lib.rs new file mode 100644 index 0000000..be1f028 --- /dev/null +++ b/receme/src/lib.rs @@ -0,0 +1,227 @@ +#![warn(missing_docs)] +//! This crate implements some recursion schemes in Rust. +//! +//! The name "receme" is a mix of "Recursion Scheme". +//! +//! See [this series of five blog +//! articles](https://blog.sumtypeofway.com/posts/introduction-to-recursion-schemes.html) +//! for an introduction to recursion schemes, and see [this series of +//! three articles](https://recursion.wtf/posts/rust_schemes/) for +//! where I got inspired to write this library. +//! +//! Note that, since Rust does not have higher-kinded polymorphism, it +//! is sometimes cumbersome to implement some notions, though. +//! +//! # Another crate +//! +//! The author of the above-mentionned three-article series has +//! already implemented the recursion schemes in Rust, in [this +//! repository](https://github.com/inanna-malick/recursion), so why do +//! it myself? +//! +//! One reason is that I want my package to not depend on anything +//! other than the default Rust toolchains. This is perhaps not a +//! very convincing reason, but I just want to do so. +//! +//! Another reason is that I want the design to be modular: if there +//! is another crate that provides a similar functionality, I can +//! quickly switch the underlying mechanism to adopt to the new crate +//! instead. +//! +//! Consequently I decided to write this library, and provide a +//! default implementation. This way by default the package does not +//! depend on external crates, and if so demanded, can switch to use +//! an external crate instantaneously, at least hopefully. + +// The following modules are for traits. +pub mod algebra; +pub mod catana; +pub mod coalgebra; +pub mod coralgebra; +pub mod functor; +pub mod hylo; +pub mod parapo; +pub mod ralgebra; + +// pub mod futhis; + +// The following modules are for default implementations. +pub mod tree; + +// TODO: benchmarks + +#[cfg(test)] +mod test_expr_tree { + use super::{ + catana::{Ana, Cata}, + functor::Functor, + hylo::Hylo, + tree::{DFTree, TEStrategy, Tree, TreeIndex}, + }; + + // Just for testing const generics and fixed size arrays, that is + // to say, just for fun. + + // fn demo<T, const N: usize>(v: Vec<T>) -> Result<[T; N], String> { + // v.try_into() + // .map_err(|v: Vec<T>| format!("expected a vector of length {N}, but got {}", v.len())) + // } + + // #[test] + // fn test_demo() -> Result<(), String> { + // let v: Vec<usize> = vec![1, 2, 3]; + // let w: Vec<usize> = vec![1, 2]; + + // assert_eq!(demo::<_, 2>(w)?, [1, 2]); + // assert_eq!(demo::<_, 3>(v)?, [1, 2, 3]); + + // Ok(()) + // } + + #[derive(Debug, Clone)] + enum Expr<T> { + Add(T, T), + Lit(isize), + } + + impl<T> Functor<T> for Expr<T> { + type Target<S> = Expr<S>; + + fn fmap<S>(self, mut f: impl FnMut(T) -> S) -> Self::Target<S> { + match self { + Expr::Add(a, b) => Expr::Add(f(a), f(b)), + Expr::Lit(value) => Expr::Lit(value), + } + } + } + + #[test] + fn test_cata() { + /// This is an Expr-algebra, but only for a specific type, + /// `isize`. + fn eval(expr: Expr<isize>) -> isize { + match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(value) => value, + } + } + + /// Use a temporary function to construct a tree. + /// + /// Should use an anamorphism for this purpose, later. + fn construct_tree() -> Tree<Expr<TreeIndex>> { + use Expr::{Add, Lit}; + + let strategy: TEStrategy = TEStrategy::UnsafeArena; + + // This represents the following expression + // + // Add(1, Add(3, Add(10, -1))). + let elements = vec![ + Add(1, 2).fmap(TreeIndex::new), + Lit(1), + Add(3, 4).fmap(TreeIndex::new), + Lit(3), + Add(5, 6).fmap(TreeIndex::new), + Lit(10), + Lit(-1), + ]; + + Tree::new(elements, strategy) + } + + let tree = construct_tree(); + + let result = tree.cata(eval); + + assert_eq!(result, 13isize); + } + + #[test] + fn test_ana() { + // Just a ugly hack, haha. + let mut vector: Vec<Expr<TreeIndex>> = vec![ + Expr::Add(1, 2).fmap(TreeIndex::new), + Expr::Lit(1), + Expr::Add(3, 4).fmap(TreeIndex::new), + Expr::Lit(3), + Expr::Add(5, 6).fmap(TreeIndex::new), + Expr::Lit(10), + Expr::Lit(-14), + ]; + + let mut vector1: Vec<Expr<TreeIndex>> = vec![ + Expr::Add(1, 2).fmap(TreeIndex::new), + Expr::Lit(1), + Expr::Add(3, 4).fmap(TreeIndex::new), + Expr::Lit(3), + Expr::Add(5, 6).fmap(TreeIndex::new), + Expr::Lit(10), + Expr::Lit(-14), + ]; + + let mut tree = Tree::ana(TreeIndex::new(0), |value: TreeIndex| { + // This is safe since we visit each valid node exactly + // once. + std::mem::replace(&mut vector[*value], Expr::Lit(0)) + }); + + tree.set_strategy(TEStrategy::UnsafeArena); + + let tree = tree; + + println!("tree is {tree:#?}"); + + let result = tree.cata(|expr| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }); + + assert_eq!(result, 0); + + // test df_tree + let dftree = DFTree::ana(TreeIndex::new(0), |value: TreeIndex| { + // This is safe since we visit each valid node exactly + // once. + std::mem::replace(&mut vector1[*value], Expr::Lit(0)) + }); + + let tree = dftree.to_tree(); + + println!("dftree = {tree:#?}"); + + let result = tree.cata(|expr| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }); + + assert_eq!(result, 0); + } + + #[test] + fn test_hylo() { + // Again using the ugly hack + let vector: Vec<Expr<TreeIndex>> = vec![ + Expr::Add(1, 2).fmap(TreeIndex::new), + Expr::Lit(1), + Expr::Add(3, 4).fmap(TreeIndex::new), + Expr::Lit(3), + Expr::Add(5, 6).fmap(TreeIndex::new), + Expr::Lit(10), + Expr::Lit(14), + ]; + + fn eval_expr(mut v: Vec<Expr<TreeIndex>>) -> isize { + Tree::hylo( + TreeIndex::new(0), + |expr| match expr { + Expr::Add(a, b) => a + b, + Expr::Lit(v) => v, + }, + |value: TreeIndex| std::mem::replace(&mut v[*value], Expr::Lit(0)), + ) + } + + assert_eq!(eval_expr(vector), 28); + } +} diff --git a/receme/src/parapo.rs b/receme/src/parapo.rs new file mode 100644 index 0000000..11cc613 --- /dev/null +++ b/receme/src/parapo.rs @@ -0,0 +1,44 @@ +//! This file defines behaviours of paramorphisms and apomorphisms. +//! +//! A paramorphism is a generalization of a catamorphism. Instead of +//! using an algebra, which is a function from F(T) to T, we use an +//! R-algebra, which is a function from F(T, F(T)) to T. The extra +//! F(T) represents the contexts where the collapsing happens. +//! +//! An apomorphism is a generalization of anamorphism. It is the +//! categorical dual of a paramorphism. So it uses an R-co-algebra, +//! which is a function from T to F(F(T) + T), where the latter sum is +//! the categorical co-product. + +use crate::{coralgebra::Coralgebra, functor::Functor, ralgebra::RAlgebra}; + +/// A type that implements Para is capable of being collapsed to a +/// single value by use of an R-Algebra. +/// +/// The paramorphism consumes `self`. +pub trait Para<T, F, R> +where + F: Functor<T>, + R: RAlgebra<T, F>, +{ + /// A paramorphism takes a recursive structure and an R-algebra + /// for that recursive structure, and returns a single, flat, + /// collapsed value. + fn para(self, ralg: R) -> T; +} + +/// A type that implements Apo is capable of being expanded from a +/// flat value by use of an R-co-algebra. +/// +/// The apomorphism consumes `self`. +pub trait Apo<T, F, G, C> +where + F: Functor<T>, + G: Functor<Result<T, F>>, + C: Coralgebra<T, F, G>, +{ + /// An apomorphism takes single, flat, collapsed value and an + /// R-co-algebra for a recursive structure, and returns that + /// recursive structure. + fn apo(value: T, rcoalg: C) -> Self; +} diff --git a/receme/src/ralgebra.rs b/receme/src/ralgebra.rs new file mode 100644 index 0000000..989089e --- /dev/null +++ b/receme/src/ralgebra.rs @@ -0,0 +1,25 @@ +//! This file defines the behaviours of R-algebras. +//! +//! For a functor F, an R-algebra is a natural transformation from F1 +//! to the identity functor, where F1 is the functor that sends T to +//! F((F(T), T)). This extra F(T) represents the context during the +//! computations. + +use crate::functor::Functor; + +/// An R-algebra is a function from F((F(T), T)) to T. +/// +/// This is a "trait alias". Since Rust does not support trait alias +/// yet, we define an empty trait with an automatic implementation. +pub trait RAlgebra<T, F>: FnMut((T, F)) -> T +where + F: Functor<T>, +{ +} + +impl<T, F, R> RAlgebra<T, F> for R +where + F: Functor<T>, + R: FnMut((T, F)) -> T, +{ +} diff --git a/receme/src/tree.rs b/receme/src/tree.rs new file mode 100644 index 0000000..eb64f31 --- /dev/null +++ b/receme/src/tree.rs @@ -0,0 +1,368 @@ +//! This file implements a recursive structure that implements the +//! recursion scheme traits, representing trees. +//! +//! The tree is backed by a vector. + +use crate::{ + algebra::Algebra, + catana::{Ana, Cata}, + coalgebra::Coalgebra, + functor::Functor, + hylo::Hylo, +}; + +use std::{collections::VecDeque, mem::MaybeUninit, ops::Deref}; + +/// The evaluation strategy for the tree. +#[derive(Default, Debug, Copy, Clone)] +pub enum TEStrategy { + #[default] + /// This strategy uses an arena, and uses an `Option<T>` to store + /// the data. + /// + /// # Comparison: + /// + /// Since it is an arena, it saves allocations, compared to the + /// variant [`DepthFirst`][TEStrategy::DepthFirst]. But it needs + /// indices to operate, so uses more memory. + /// + /// On the other hand, it uses an option, so is slower than the + /// variant [`UnsafeArena`][TEStrategy::UnsafeArena], but avoids + /// unsafe code altogether. Applications can first use this + /// variant to make sure the algorithm works, before converting to + /// use the unsafe variant. + SafeArena, + /// This strategy uses an arena, and uses an `MaybeUninit` to + /// store the data. + /// + /// # Comparison: + /// + /// Since it is an arena, it saves allocations, compared to the + /// variant [`DepthFirst`][TEStrategy::DepthFirst]. But it needs + /// indices to operate, so uses more memory. + /// + /// On the other hand, it uses a `MaybeUninit`, so is faster than + /// the variant [`SafeArena`][TEStrategy::SafeArena], but uses + /// unsafe code. Applications can first use the safe variant to + /// make sure the algorithm works, before converting to use this + /// variant. + UnsafeArena, + /// This strategy uses a plain vector. + /// + /// # Comparison: + /// + /// Since it is a plain vector, it uses more allocations, compared + /// to other variants. But it does not use indices, so consumes + /// less memory. + /// + /// # Warning + /// + /// Since it uses no indices, it relies on the depth-first order + /// of the elements to correctly find elements. This puts a + /// requirement on the implementation of the [`Functor`] trait. + DepthFirst, +} + +/// A tree is just a wrapper around a vector. +/// +/// # Warning +/// +/// The tree is supposed to be stored in topological order. This +/// order is used in a critical way in the implementations of +/// recursion schemes. Violations of this assumption are fatal to +/// using those trait methods. +#[derive(Clone, Debug)] +pub struct Tree<T> { + elements: Vec<T>, + strategy: TEStrategy, +} + +impl<T> Tree<T> { + #[inline] + /// Construct a new tree. + pub fn new(elements: Vec<T>, strategy: TEStrategy) -> Self { + Self { elements, strategy } + } + + /// Just a function for testing. + /// + /// # Warning + /// + /// This is definitely going to be removed in the future. + pub fn nth(&self, n: usize) -> Option<&T> { + self.elements.get(n) + } + + #[inline] + /// Retrieve the strategy of the tree. + pub fn strategy(&self) -> TEStrategy { + self.strategy + } + + #[inline] + /// Set the strategy of the tree. + pub fn set_strategy(&mut self, strategy: TEStrategy) { + self.strategy = strategy; + } +} + +// Manual implementation can avoid unnecessary requirement on the +// parameter `T`. +impl<T> Default for Tree<T> { + fn default() -> Self { + let elements = Vec::new(); + let strategy = TEStrategy::default(); + + Self { elements, strategy } + } +} + +#[derive(Debug, Copy, Clone)] +/// A thin wrapper around `usize`, to index vectors. +/// +/// By means of the [*newtype +/// pattern*](https://doc.rust-lang.org/rust-by-example/generics/new_types.html) +/// in Rust, it is supposed to be treated as a simple `usize` in the +/// compiled codes. +pub struct TreeIndex(usize); + +impl TreeIndex { + /// Wrap an index in this type. + pub fn new(index: usize) -> Self { + Self(index) + } +} + +impl Deref for TreeIndex { + type Target = usize; + + fn deref(&self) -> &Self::Target { + &self.0 + } +} + +impl<T, F, G, A> Cata<T, F, A> for Tree<G> +where + F: Functor<T>, + G: Functor<TreeIndex, Target<T> = F>, + A: Algebra<T, F>, +{ + fn cata(self, mut alg: A) -> T { + // First deal with the safe case + match self.strategy { + TEStrategy::SafeArena => { + let mut results: Vec<Option<T>> = std::iter::repeat_with(Default::default) + .take(self.elements.len()) + .collect(); + + for (index, node) in self.elements.into_iter().enumerate().rev() { + let algebra_result = { + let node = node.fmap::<T>(|index| { + std::mem::replace(&mut results[*index], None).unwrap() + }); + + alg(node) + }; + + // Artificially use this value to satisfy the compiler. + let _ = std::mem::replace(&mut results[index], Some(algebra_result)); + } + + std::mem::replace(&mut results[0], None).unwrap() + } + TEStrategy::UnsafeArena => { + let mut results: Vec<MaybeUninit<T>> = std::iter::repeat_with(MaybeUninit::uninit) + .take(self.elements.len()) + .collect(); + + for (index, node) in self.elements.into_iter().enumerate().rev() { + let algebra_result = { + let node = node.fmap::<T>(|index| unsafe { + std::mem::replace(&mut results[*index], MaybeUninit::uninit()) + .assume_init() + }); + + alg(node) + }; + + results[index].write(algebra_result); + } + + unsafe { std::mem::replace(&mut results[0], MaybeUninit::uninit()).assume_init() } + } + TEStrategy::DepthFirst => { + let mut results_stack: Vec<T> = Vec::new(); + + for node in self.elements.into_iter().rev() { + // Replace each node data with the value from the + // results stack. + let mapped_node = node.fmap(|_| results_stack.pop().unwrap()); + + results_stack.push(alg(mapped_node)); + } + + results_stack.pop().unwrap() + } + } + } +} + +impl<T, F, G, C> Ana<T, F, C> for Tree<G> +where + F: Functor<T, Target<TreeIndex> = G>, + G: Functor<TreeIndex>, + C: Coalgebra<T, F>, +{ + /// An anamorphism takes a single, flat, collapsed value and a + /// co-algebra for a recursive structure, and returns that + /// recursive structure. + /// + /// # Descriptions + /// + /// This always generates a tree which uses the default strategy. + /// If one wants to use a different strategy, set the strategy + /// after generating the tree. + /// + /// # See also + /// + /// To use the depth first strategy to generate the tree, use the + /// wrapper struct [`DFTree`]. + fn ana(value: T, mut coalg: C) -> Self { + let mut queue = VecDeque::new(); + + queue.push_back(value); + + let mut elements = vec![]; + + let strategy = TEStrategy::default(); + + while let Some(value) = queue.pop_back() { + let expanded_layer = coalg(value); + + let mapped_layer = expanded_layer.fmap::<TreeIndex>(|value| { + queue.push_back(value); + + TreeIndex(elements.len() + queue.len()) + }); + + elements.push(mapped_layer); + } + + Self { elements, strategy } + } +} + +/// To generate a tree with the strategy +/// [`DepthFirst`][TEStrategy::DepthFirst], we use a wrapper struct +/// which implements [`Ana`] in the desired manner. +#[derive(Debug, Clone)] +pub struct DFTree<T>(Tree<T>); + +impl<T> DFTree<T> { + #[inline] + /// Convert to the underlying tree. + pub fn to_tree(self) -> Tree<T> { + self.0 + } + + #[inline] + /// Wrap a tree. + pub fn new(tree: Tree<T>) -> Self { + Self(tree) + } +} + +impl<T, F, G, C> Ana<T, F, C> for DFTree<G> +where + F: Functor<T, Target<TreeIndex> = G>, + G: Functor<TreeIndex>, + C: Coalgebra<T, F>, +{ + /// An anamorphism takes a single, flat, collapsed value and a + /// co-algebra for a recursive structure, and returns that + /// recursive structure. + /// + /// # Descriptions + /// + /// This always generates a tree which uses the depth first + /// strategy. If one wants to use a different strategy, set the + /// strategy after generating the tree. + /// + /// # See also + /// + /// To use the default strategy to generate the tree, use the + /// original struct [`Tree`]. + fn ana(value: T, mut coalg: C) -> Self { + let mut stack = Vec::new(); + + stack.push(value); + + let mut elements = vec![]; + + let strategy = TEStrategy::DepthFirst; + + while let Some(value) = stack.pop() { + let expanded_layer = coalg(value); + + let mut local_stack = Vec::new(); + + let mapped_layer = expanded_layer.fmap::<TreeIndex>(|value| { + local_stack.push(value); + + // The index is of no meaning here, since we rely on + // the depth-first order. + TreeIndex(0) + }); + + stack.extend(local_stack.into_iter().rev()); + + elements.push(mapped_layer); + } + + Self::new(Tree::new(elements, strategy)) + } +} + +impl<T, U, F, G, H, A, C> Hylo<T, TreeIndex, U, F, G, H, A, C> for Tree<G> +where + F: Functor<T>, + G: Functor<TreeIndex, Target<T> = F>, + H: Functor<U, Target<TreeIndex> = G>, + A: Algebra<T, F>, + C: Coalgebra<U, H>, +{ + fn hylo(value: U, mut alg: A, mut coalg: C) -> T { + // The hylomorphism ignores the tree. Maybe I will add + // different implementations later on. + + let mut result_stack: Vec<T> = Vec::new(); + let mut value_node_stack: Vec<Result<U, G>> = vec![Ok(value)]; + + while let Some(value_or_node) = value_node_stack.pop() { + match value_or_node { + Ok(value) => { + let node = coalg(value); + + let mut local_values: Vec<U> = Vec::new(); + + let mapped_node = node.fmap(|node_value| { + local_values.push(node_value); + TreeIndex::new(0) + }); + + value_node_stack.push(Err(mapped_node)); + value_node_stack.extend(local_values.into_iter().rev().map(Ok)); + } + Err(node) => { + let mapped_node = node.fmap(|_| result_stack.pop().unwrap()); + + result_stack.push(alg(mapped_node)); + } + } + } + + result_stack.pop().unwrap() + } +} + +// TODO: Para, Apo, Histo, and Futu await us. diff --git a/repcore/Cargo.toml b/repcore/Cargo.toml new file mode 100644 index 0000000..7416ad5 --- /dev/null +++ b/repcore/Cargo.toml @@ -0,0 +1,8 @@ +[package] +name = "repcore" +version = "0.1.0" +edition = "2021" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[dependencies] diff --git a/repcore/src/lib.rs b/repcore/src/lib.rs new file mode 100644 index 0000000..7d12d9a --- /dev/null +++ b/repcore/src/lib.rs @@ -0,0 +1,14 @@ +pub fn add(left: usize, right: usize) -> usize { + left + right +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn it_works() { + let result = add(2, 2); + assert_eq!(result, 4); + } +} diff --git a/src/lib.rs b/src/lib.rs new file mode 100644 index 0000000..7d12d9a --- /dev/null +++ b/src/lib.rs @@ -0,0 +1,14 @@ +pub fn add(left: usize, right: usize) -> usize { + left + right +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn it_works() { + let result = add(2, 2); + assert_eq!(result, 4); + } +} |