↑ LouiaServer
Page Created: 2/12/2025   Last Modified: 2/25/2025   Last Generated: 2/25/2025
Louia: A Fast Spartan Server Written in Lua
Program Description
http://greatfractal.com/LouiaServer.htmlspartan://greatfractal.com/louia.gmi
(This page is an extremely rough draft and is full of all kinds of errors. I will try to improve the documentation over time if I release future versions. Please note that this page was originally generated in HTML. If you are reading this as a README file inside the source tarball or served via .txt file, it may have odd formatting and will not contain any example hyperlinks.)
Louia is a Lua 5.1 based Spartan protocol TCP server designed to run on Linux, first released in February 2025 and programmed in St. Louis, Missouri, hence the name Lua + Louis = Louia. It can run under both normal Lua 5.1 and under LuaJIT for additional speed.
It is low-resource, has few dependencies, and supports coroutines for non-blocking, cooperatively "multithreaded" operation, but only supports a single domain (no virtual hosting) unless multiple instances are used. It has both input-filtering and a file whitelist mode.
If you are reading this from the website, a link to the source code tarball can be found at the bottom of this page which contains the "louia.lua" file.
Usage:
lua louia.lua domain[:port] /sitedir [/logdir]
Examples:
lua louia.lua example.com /mysite
luajit louia.lua example.com:3000 /my/site /my/logs
The port is optional and will default to 300. A trailing slash should not be placed on the directory names. The logdir is optional and will disable logging if not present.
Linux dependencies
- Lua 5.1 or LuaJIT, and telnet is also helpful for testing
It also relies on the "find" command which should be present on most Linux systems which itself relies on the "sh" shell (therefore "find" and "sh" may need to be unblocked if a sandbox is used.)
Lua dependencies
- LuaSocket
Example dependency install for Void Linux:
- xbps-install -S lua51
- xbps-install -S lua51-luasocket
- xbps-install -S LuaJIT (optional, but faster)
- xbps-install -S inetutils-telnet (optional)
Operation
As shown above, the .gmi files and other files are just placed in a site directory, and then it is run with at least the first two parameters, the domain name (and optional port) and the full path to the site directory. If a 3rd logging directory parameter is present, it will append logs to a file called spartan.log in that directory (forever in one big file, there is no log rotation). Logs are very rudimentary and use UTC time only.
There are many other user-configurable variables (shown below) that can be tweaked, and some like HOMEPATH and RELOADPATH should be changed to reflect the actual home and reload path names. The "reload path" is just a fake path that will refresh the list of files that the server will allow to be accessed (if SAFEFILES is enabled, more on this below).
That's mostly it. My instructions to install the software above are for Void Linux↗, but there are similar packages with slightly-different names for many other Linux distributions, and LuaRocks↗ (if installed) can also be used to install LuaSocket if it is not in the distribution's repository, but I did not need to use this.
User-Configurable Variables
HOMEPATH | The full path and page name of the first page if no page is indicated. This is relative to the domain and site directory and not necessarily an actual file system path. (e.g. /index.gmi) |
MIME | An array (table) of extensions to MIME types and optional charsets, for example: {gmi="text/gemini; charset=utf-8", html="text/html; charset=utf-8"} Leave out the period on the extension. In STRICTMODE, the extensions longer than 3 characters are ignored. |
NOMIME | Sets the MIME type for pages with no MIME entry or extension, such as "text/gemini". Not applicable in STRICTMODE. |
MAXTHREADS | The upper cap on the number of coroutines to spawn to limit excessive memory use. If a lot of memory is available and a lot of requests coming in at once are expected, then this can be increased further. It should not be increased further than the max file descriptors of the system (e.g. cat /proc/sys/fs/file-max) |
MAXPATHLEN | The maximum number of bytes for a directory path plus the filename. In many cases in Linux ext4 this is up to 4095 (path) + 255 (filename) - 1 = 4349 (e.g. cat /usr/include/linux/limits.h) but if there are no huge paths and filenames, it should be limited further to expected lengths. |
MAXUPLOADBYTES | The max number of bytes for uploads if uploads are turned on. |
MAXREQBYTES | The total max number of bytes in a Spartan request line. It should be bigger than MAXPATHLEN since it also includes the path to the file. And it should be bigger than MAXUPLOADBYTES if uploads are turned on. If it ever hits this limit, the connection closes and the request is not processed. |
DOWNLOADBUFFER | The number of bytes for a chunk of RAM per coroutine for big downloads. This is not used for .gmi files. Bigger values use more RAM but use less calls to the network function. |
TIMEOUTSECS | The number of seconds before a connection will timeout if the request is still open and has not yet been completed. |
TIMEOUTCHECKSECS | This sets a polling value of how many seconds to check see if a timed-out connection from TIMEOUTSECS above is present so that it can be killed. Lower values check faster but use more CPU, and vice-versa. |
DEBUG | If true, this prints extra information to stdout to assist in troubleshooting, similar to a verbose mode. |
UPLOADSENABLED | If true, this enables the Spartan upload protocol, but only for an internal upload variable. More details below. |
SAFEFILES | If true, this preloads an array on boot that contains all of the full filenames and paths in the site directory and forces any valid requests to check this array rather than check the site directory for that file directly, which is safer. However, any new filenames added to /sitedir will not be accessible until the server is restarted to pick up the new names. |
RELOADPATH | The full path and page name of a fake page that should be named something non-obvious for admin use only. On accessing this page from a client, a reload will take place to find all files in the site directory (including sub-directories) and add them to an in-memory array which is used to restrict what files are allowed to be accessed if SAFEFILES is enabled, creating a whitelist. For example, if new files with new filenames are added to the site directory, and SAFEFILES is enabled, they will not be accessible until the RELOADPATH is accessed at least once. |
STRICTMODE | If true, this provides strict request filtering and only allows ASCII alphanumeric, underscore, hyphen in the request, forces a 3 character file extension, does not serve any files without an extension listed in the MIME array, disables percent-decoding, only allows alphanumeric and spaces in uploads, and only allows one subdirectory level to be traversed below the main level. |
Design
At the time of this writing in Feb. 2025, Louia is the only Lua-based server that I've seen that hosts the Spartan protocol, so I assume it is the first. I chose Lua because it is simple, fast, and uses low-resources since it has to run on a Raspberry Pi Zero (yes, an actual $5 Pi Zero when it was released, not even a Zero W). I once wrote a non-compliant XMPP server in Lua in NodeMCU on an ESP8266 (which is even smaller), but this is only my second Lua program.
I've had more fun writing in Lua than Python, Perl, or C and think it is an underappreciated language. It reminds me of the old, restricted C64 BASIC where you were free to add machine-language functions to it if you had to expand its capabilities (like Lua can do easily with C), but you tried really hard to do what you could with its limited set of stock commands (standard library).
And when you combine Lua with LuaJIT, it's really, really fast, one of the fastest dynamic languages available. It's so simple, too; I only needed one external module, "LuaSocket" to give me the TCP socket functions. I decided to stop there and not add any more additional modules but got by with what it offered. It fits perfectly with the "spartan" nature of the protocol. However, I did rely on the Linux find (or gfind) command to perform my filename lookups to build the safe files list if SAFEFILES is enabled, which is only used during manual reloads, so it's not pure Lua, but almost.
NodeMCU (which I did not use in this project) supports the Lua 5.1 version that I used (and even coroutines), so maybe Louia could someday run on something as small as the ESP8266 or ESP32 by replacing the LuaSocket code with its "net" module. Sharing the same 3.3 volt logic, it's trivial to solder jumper pins to a common (and usually included) plastic SD card adapter (for MicroSD insertion) then run those jumpers over to the MCU's SPI pins. Even FAT is possible↗ on these skinny systems but then I'd have to create a way to list those files without Linux find, which I don't feel like doing right now. Data rates for SPI, though, are slower than SD bus mode and it also would be over Wifi, but for small gemtext pages, it may be feasible. For now, though, it's only working on Linux.
My first attempt was a "blocking" style server. These are nice and work well as long as all clients are on their best behavior. But if someone, say, uses telnet to make a connection, then the whole server waits until they finish typing in their input, blocking other requests. So I put a 3-second timeout on that blocking server to clear the traffic jams which would be fine for most clients, but it prevented anyone from surfing via telnet.
A week later, I had added coroutines for cooperative multithreading, a simple and elegant construct that allowed me to create just one big coroutine function and then use a main loop to spawn or scale back any coroutines as needed, which solves the blocking problem. Lua popularized the modern view of the concept (which is much older) and coroutines are now part of many languages, yet most examples I've seen for network use are on the client-side (like a smartphone client doing multiple downloads) and not as many discussed server-side coroutines.
My main issue with the design was that once I turned off the blocking and made it asynchronous, I had to manually manage all of the situations those functions normally handled, which increased the complexity of my code. Since each TCP connection runs in its own coroutine, I had to track when that was opened and closed, how long it stayed open, and how many were open, while also collecting and filtering any incoming bytes. And once you unblock a function, your CPU usage skyrockets due to rapid polling, so I had to find a creative way of halting the polling loops until something was actually happening, yet not fully halting the loop or I could never timeout an unresponsive connection. Well that type of halting-yet-not-halting to keep the CPU as low as possible was difficult.
It's only a single "thread" as far as the OS is concerned. This is not a problem for me, but one could, in theory, get multithreaded or multiprocessing operation on a modern multi-core CPU by launching multiple instances of the server on different ports and then use a reverse-proxy to forward incoming requests (round robin, etc.) to the next available server. I have not yet tried this at the time of this writing but even some HTTP servers could be setup as a front-end to do this sort of TCP-only proxy for load-balancing.
An unresolved problem with my Lua-based code, however, is related to Lua's limited string-processing ability. Since it lacks regex by default, you cannot design a pattern to match multiple substrings per capture like [dog|cat] for example, something which is ideal with parsing a URL/URI or parsing Spartan's request strings. But regex code is comparatively huge (which is why it is not in Lua 5.1), so I had to get by with either clever "string.find" matches or use additional string.find conditionals that ate up more CPU cycles. This is why I used Perl for my HTML static-site generator in 2014 (its wonderful string processing), so if the construction of one's string functions get unwieldy, then it loses more of its speed advantages over Perl or Python.
My solution was to use a compromise: since the Spartan request syntax was thankfully so simple, I decided to just write a single string.find command that tries to match the captures, and then I carefully pick through the aftermath.
Input validation on the level of attaining 100% compliance is complex, as you have to look at RFC 3986 to see symbols are allowed for the request string, then see which of that applies to the Spartan protocol, then determine which of that should get percent-decoded and which of that happens to include special characters that need additional delimiting in the string-search expressions that I use, and finally match those results to a POSIX Linux filesystem (which is very lax in what it accepts) and block path traversals, etc. Filtering down is more dangerous than building up, and I do a little of both, input-filtering for the most concerning combinations and a safe file list that is built from prior scans that is used to block any incoming requests that are not on that approved list.
For the input-filtering, I like simple and spartan and didn't want to rely on many external modules, so I designed a crude sieve and then dump a pile of rocks onto the sieve, so to speak, some natural, some painted, and some rocks will sit on top (causing a coroutine yeild and eventual timeout), some will fall through, and some will break into fragments when they fall through. But due to lack of regex, some of the rocks that managed to fall through into their sorted bins are the wrong rocks, so the server has to also check those bins later. But I combine some rock fragments (concatenation) and scratch off their paint (percent-decoding) before I perform the final analysis to look for actual bad rocks. You don't want to analyze them before the percent-decoding (since you don't know yet what you are dealing with) and you don't want to try to "repair" your rocks, take fragments that wouldn't normally fit and try to make a "good" rock (substitution), for there could be a malicious fragment that turns your repair into a monstrous chimera, like pulling out a bad symbol only to realize that it was the only buffer between two periods that prevented a double-dot path traversal sequence... So it is best just to look for the potentially bad sequence and reject outright, like a final quality control check.
Spartan requests are ASCII, but it seems to allow 8-bits for things like utf-8 or IEC 8859-1, so I've allowed 8-bits when doing the percent-decoding.
In RFC3986 the gen-delims are @ ? # / : [ ] and may be un-encoded if used for delimiting purposes. But since this is not the case in Spartan, I do not allow direct ? or # or @ since the client should ignore them. I also do not allow direct : [ ] since Spartan should have no delimiting use for them. The slash / is the only allowed delimiter.
The RFC3986 sub-delims are $ . + ! * ' ( ) , & ; = and these may be un-encoded, too. But $ . + * ( ) are magic characters in Lua and need %
to delimit them. (The characters %
- ? [ ^ are also Lua magic characters.) The remaining characters ! ' , ; = & do not require %
in Lua and can be left as-is, except that I had to use the backslash \ to delimit the single quote ' in my Lua pattern string. And everything I just said above is also prone to be intercepted by the Textile and custom markup generator I use to generate this web page. Whew!
There are 4 very safe non-alphnum chars . - _ ~
However, in strict mode (more on this below), I exclude all dots in the filename and extension as additional protection against potential path traversal.
The rest of the characters { } | \ ” %
< > SP ` ^ are unsafe and must arrive as percent-encoded. I also skip all ASCII below decimal 32 hex 20 (before the space char) and I skip decimal 127 hex 7F (DEL). But from the unsafe list, %
is used as the percent-decoding delimiter, of course.
Linux/POSIX can accept almost any character in a filename except slash, so I also block dots, slash, backslash, and DEL in the URL percent-decoding routine. I also block percent-decoding of control codes, including decimal 27 (ESC) so ANSI CSI commands cannot pass (too risky for me). And I block anything with the .lua extension to prevent attempts to access the server source if the server file is accidentally placed in the site directory.
Multiple domains (virtual hosting) are not allowed since my initial string.find cannot match additional substrings (as mentioned above, and I don't want to add additional subsearches) but the functional equivalent can be obtained by running multiple instance with a different domain:port and logdir on the command line parameters (which would allow it to run on different cores on a multicore CPU). Since it opens the files for read only, no locking issues should occur. Port-forwarding would then have to be configured in front of it to forward requests to that port, if the same port it to be used for multiple domains.
If you do all of this filtering carefully, you can minimize the extra code needed to run to finally accept or reject the request. It's such a weird pattern that I haven't even fully outlined it yet; some bad requests will yield, some will quickly reject, and I kind of like this natural space/time division to balance the memory (many yields and extra coroutines) vs. the CPU/network (the rejection and error codes). The sieve is actually more like a living vine, and programmers tend to hate code that grows naturally like a vine, for it's often complex and full of hidden bugs. But the vine works for me and is even fun to explore. Do I trust it? No way! I put a sandbox on that thing because the vine could be a man-eater.
And I also added the SAFEFILES user-configurable variable, that if set to true, doesn't even trust that filtering, but gatekeeps, only allowing entry if there is a match with the pre-loaded list of filenames in the site directory (and sub-directories). There is some overlap, so I'm wasting some cycles here, but I decided it was best to keep it in place as I'm not that experienced with Lua's string interactions and file handling in combination with the Linux filesystem and ASCII control codes and symbols. Because I don't allow the control codes in the filenames in the first place, I don't have to worry too much about creating and searching Lua arrays (what Lua calls "tables") built directly from the Linux find command, but I'm still wary of it.
I used an ipairs (stateless iterator) array match for simplicity for me, but this is not the fastest form of match in Lua or LuaJIT, and it does require memory to hold the filenames. However, if SAFEFILES is false, checking the file io to see if a file exists also takes time. I'm not sure which method is faster as I have not compared them.
Another downside to having this safe index is that it requires updating if new files with new filenames are added (otherwise those files are not available). Since running the Linux find command in the background uses a relatively high amount of resources, too, it was best to allow the site admin to request it when needed, and the simplest way to do that was to just have the admin create a fake page on the site, that if accessed, performs the reload. This RELOADPATH should be something that people accessing the Spartan site are not privy to, like a random number, to reduce excessive CPU and disk usage if they were to click on it. Of course, Spartan is TLS-free, so this path isn't necessarily secure, and the RELOADPATH should therefore only be accessed locally for additional privacy.
It's the paradox that any type of firewall, spam filter (or even an ancient army) has. You have to lower your shield to communicate, but lowering it means you are vulnerable. Neither can be absolute--this is one reason I'm not a fan of the mandated TLS in Gemini (and wish the creator veered more to his "Mercury" specification) for one can't have privacy without civility, as they go hand in hand. We need both.
Here is an example of how some of the most concerning combinations are handled between the strict and non-strict modes:
For both modes:
- // and /. gets through but are blocked
- /textreallyreallylong is blocked when it hits MAXPATHLEN and closed
- /text. is possible
- /dir\text.gmi cannot get through (yield)
- /.text and /test/.text get through but are blocked
- ~/ or ./ cannot get through (yield)
- /myserver.lua gets through but is blocked
With STRICTMODE on:
- anything with .. cannot get through (yield) and is also blocked just in case
- /this%20is%20a%20test.gmi cannot get through (yield)
- /./text gets through but is blocked
- /./text.gmi or /text.text.gmi cannot get through (yield)
- /text./text.gmi cannot get through (yield)
- /~ or /~/~/~~~test cannot get through (yield)
- /subdir1/text.gmi is possible
- /subdir1/subdir2/text.gmi cannot get through (yield)
- /text.longext cannot get through (yield)
- /text.bad gets through but is blocked (if "bad" is not in the MIME array)
- uploads only allow alphanumeric
With STRICTMODE off:
- anything with .. or %2E%2E (or combination) gets through but is blocked
- /this%20is%20a%20test.gmi is possible
- /./text gets through but is blocked
- /./text.gmi gets through but is blocked
- /text./text.gmi gets through but is blocked
- /text.text.gmi is possible
- /~ or /~/~/~~~test are possible
- /subdir1/text.gmi is possible
- /subdir1/subdir2/text.gmi is possible
- /text.longext is possible
- /text.bad is possible and uses the fallback NOMIME MIME type
- uploads allow full 8-bit
I also created an UPLOADSENABLED flag to allow me to disable uploads. The Spartan upload markup is a unusual as it is not gemtext but a small addition. And the way the uploads work is that the upload itself follows the request string. So you can't know whether the request arriving is just a short request to access a page or this monster 8-bin binary tsunami that arrives at the end of your request string. I limit the request and upload sizes so if it reaches the max size, the connection abruptly ends and the local variable is destroyed.
I say "variable" as I have not yet added a way of saving chucks of uploaded data to disk, so a series of very large uploads would overwhelm the memory of the 512 MB Pi Zero. In fact, I don't use uploads at the time of this writing but just left some code in place that provides the upload in the "upload" variable which can receive short text strings as input (similar to how HTML form fields are used to send data for a POST request). One would have to modify the code to do something with that variable or add any save-to-disk functionality (similar to the logging code). It could be used to build some sort of MUD or text-based game, for example.
But for download-type requests, it uses a chunking system to serve the files to the network instead of just loading up one variable with the entire contents of the file. The exception is the most common type, the .gmi extension files. For .gmi files, I decided to load the entire gemtext page into memory for speed, so those files should be no larger than expected for the memory available. If giant gemtext pages are required, then assigning them to another extension (with the same text/gemini MIME type) would be required.
The most secure way of running Louia (and the most limiting) would be to set UPLOADSENABLED to false, set STRICTMODE and SAFEFILES to true, create a random, alphanumeric RELOADPATH that only the admin knows, and then limit all of the sizes for MAX values as small as possible for the intended operation. Every file would need a 3-character or less extension, and every extension would need a MIME type added to the MIME table.
The default MIME table provided is very small, so it will likely need to be expanded by the admin before use.
Known Bugs
There are likely all kinds of errors and bugs in it. I left it sparse and small and kept my conditional statements or extra libraries to a minimum. I did not do any automated testing--it's all manual, ad-hoc. I'm guessing it's pretty fast under LuaJIT based on the structure of my code, but I have not run any benchmarks to confirm. Most of my constructs are not optimized--I know this--but they are currently fast enough for me on the Pi Zero. Lua and LuaJIT are fascinating, and there are all kinds of ways to improve this code and also make it safer without necessarily making it larger. Ideally, I wouldn't even incorporate the Linux find function, but I did want to add that safe files list without too much work to prevent any mistakes in my filtering to allow path traversal.
The common bugs that I've found (and I'm sure there may still be some present) are:
- When a variable is undefined (nil) and is attempted to concatenate or print and crashes
- Weird patterns that slip by my Lua string.find captures for URL filtering, since it is not precise
- And, of course, buffer overflows if I don't limit large values
In Lua, a nil and an empty string "" are not the same thing, and an empty string or the number 0 is considered "true" which is always a source of concern.
There are other Spartan servers on the Internet that you can download besides mine that don't use Lua, many of which use built-in or external functions to process URLs if you need better filtering. So, I leave you with my own boilerplate disclaimer:
Disclaimer
Warning, this project is experimental and not recommended for real data or production. Do not use this software (and/or schematic, if applicable) unless you read and understand the code/schematic and know what it is doing! It was created by a human (myself) and not AI, and I made it solely for myself and am only releasing the source code in the hope that it gives people insight into the program structure and is useful in some way. It might not be suitable for you, and I am not responsible for the correctness of the information and do not warrant it in any way. Hopefully you will create a much better system and not use this one.
I run this software because it makes my life simpler and gives me philosophical insights into the world. I can tinker with the system when I need to. It probably won't make your life simpler, because it's not a robust, self-contained package. It's an interrelating system, so there are a lot of pieces that have to be running in just the right way or it will crash or error out.
There are all kinds of bugs in it, but I work around them until I later find time to fix them. Sometimes I never fix them but move on to new projects. When I build things for myself, I create structures that are beautiful to me, but I rarely perfect the details. I tend to build proof-of-concept prototypes, and when I prove that they work and are useful to me, I put them into operation to make my life simpler and show me new things about the world.
I purposely choose to not add complexity to the software but keep the complexity openly exposed in the system. I don't like closed, monolithic systems, I like smaller sets of things that inter-operate. Even a Rube Goldberg machine is easy to understand since the complexities are within plain view.
Minimalism in computing is hard to explain; you walk a fine line between not adding enough and adding too much, but there is a "zone", a small window where the human mind has enough grasp of the unique situation it is in to make a difference to human understanding. When I find these zones, I feel I must act on them, which is one of my motivating factors for taking on any personal project.
Here is an analogy: you can sit on a mountaintop and see how the tiny people below build their cities, but never meet them. You can meet the people close-up in their cities, but not see the significance of what they are building. But there is a middle ground where you can sort of see what they are doing and are close enough to them to see the importance of their journey.
The individual mind is a lens, but, like a single telescope looking at the night sky, we can either see stars that are close or stars that are much farther away, but we can't see all stars at the same time. We have to pick our stars.
I like to think of it like this:
It is not within our power to do everything, but it is within our power to do anything.
Source Code
Source code can be downloaded here, which is licensed under GPLv3. A copy of the GPL license can be found here.
Comments