Page Created:    Last Modified: 3/10/2016   Last Generated: 4/18/2018
The computer cluster that is serving this page consists of 4 Raspberry Pi computers that inter-operate:
- Static - A static server running ScratchedInSpace
- Comments - A dynamic comment server running ScratchedInTime
- Cache - A caching server running Memcached
- OswaldBot - An optional XMPP control server.
It is a Perl, Python and Bash-based, static wiki generator with document-oriented, NoSql-style architecture, including tags, backlinks, metadata, search, recursive macros, transclusion, breadcrumbs, blog, wiki-like editing, combined with a dynamic commenting system with cryptographic ID, remote monitoring and control, Atom feeds, and captcha.
It integrates with FastCGI, reverse proxy, caching, Textile markup, Memcached, Bogofilter Bayesian spam filtering, and XMPP.
ScratchedInTime is the 3rd and largest Perl script I have ever written (ScratchedInSpace being the 2nd, and IdThreePlugin being the first), and I never wanted it to be that large. Where ScratchedInSpace is tiny and focused, ScratchedInTime is large and does a lot of different things. I never wanted to write a CGI (or FastCGI) script, but it was the only way to allow user commenting on my static site without resorting to a huge dynamic system or a 3rd party.
How It All Began
OswaldBot worked so well that I thought, "Hey, maybe I can run Foswiki on these things and run my BashTalkRadio on them". But even with Foswiki page caching and Memcached enabled and replacing Apache with Nginx, it was still too slow. The Perl language that Foswiki uses is fast, but its dynamic nature was still too much for the 700 MHz ARM.
That really bothered me and meant I would have to keep my wiki on a larger, more complex and power hungry server.
So, I thought, "I've always wanted to create my own wiki engine, maybe I can create a NoSQL, document-oriented-style, structured wiki in ultra-fast C language!" I started reacquainting myself with C and then realized that string processing (used heavily in such wikis) is nightmarish in C.
...but then I realized that Perl is ultra-fast at string processing and can sometimes be faster than C (unless you're a C guru) due to design efficiencies... hmm, perhaps that is why the Foswiki guys used it? I did some research on Perl and really liked what I read, but then realized I would hit the same performance problem as Foswiki, generating those pages dynamically.
Then, I thought--what if I generate the dynamic pages on the (more powerful) client, and just put the static, pre-generated versions on the Raspberry Pi? Wow! I thought that someone else must have surely thought of this idea, and sure enough, a lot of Python, Ruby, and Perl programmers had created their own "static site generators" to do that very thing.... but... did they incorporate macros, recursive search and NoSQL database features?
Most of them did not. It appears the static site crowd handles this using various templating systems and other methods.
So as I delved into this idea further, I realized that creating a static site generator was actually much easier than creating a dynamic, CGI-based wiki. You pretty much design a parser to interpret any markup of your choice (just think up your own) and turn it into HTML. This was my motivating factor for putting Perl to the test.
I named it ScratchedInSpace, and it worked better than I imagined. Perl will easily call itself recursively, so I was able to integrate recursive rendering and macros. And my plugins are simply Perl subroutines substituted in place of my macro markup. This is an immensely powerful wiki structure, essentially a document-oriented database. It is, however, very fragile and not good coding practice. But why do I care? It's just a static site generator running with user permissions on a Linux client PC. It's not server-side code. So I found that static site generators are really neat things...
Adding Minimal Dynamics
When I set out to design it, I decided I would not have a commenting system, but from what I learned from the OscarPartySystem , I knew that it was too much of a security risk, too hard to maintain, too hard to code, and would simply bog down the little Pi. But many people believe that a comments forum is important to a site, that it provides valuable input from others.
Most people with static sites handle this via a 3rd party service such as Disqus. But this seemed like it was violating the principle of building such a system in the first place, and I didn't want to send my guests to a 3rd party.
So I decided for isolation and speed to build my first dynamic, CGI program on a separate Raspberry Pi, and use a 3rd Pi to run a Memcached server, which I would use as a form of high-level IPC. Memcached turned out to be far more useful than this as the project progressed, allowing Nginx caching, ring buffers, CGI session tokens and tarpit timing.
The Difficulties of Spam
Writing a CGI comment page is easy. Keeping people from hacking it into oblivion, being enslaved by their bot army, or being tricked by them masquerading as people you know is hard.
Most of my time working on this project was spent on this problem, which I broke down into the following:
- Create a captcha system to weed out the bots
- Validate and restrict the input
- Create a tarpit minefield
- Use Eric Raymond's Bogofilter for Bayesian content filtering
- Create a way for me to monitor and flag spam remotely
- Set size limits on comments pages
- Provide a cryptographic ID to people that want them
I had to create a system that would keep the resources of the Raspberry Pi low, keep it easy for me to manage. And because I did not incorporate a user account system, if the system did get compromised, nothing of value is lost (no private data would be exposed).
I eventually got it working, and named it ScratchedInTime.
So, by the time I was done, I had 3 servers, Static, Comments, and Cache, and then there was OswaldBot , dutifully running the whole time, which I had been ignoring while my mind was deep in Perl land.
What if... OswaldBot could send messages to the comments server? Then I could use my phone to text commands to that server via proxy.
So I added this ability to OswaldBot.
All servers are running on Arch Linux, headless, with no GUI. My code is not portable and there are some peculiarities that only work on my OS configuration.
Static is a purely static web server running Nginx for speed and low resource usage. There is no CGI or FastCGI running on it. It also performs 3 additional functions. For static pages (the main site), it uses the Nginx front side caching which I directed to tmpfs ramdisk, minimizing the need to pull from disk. Disk is slow on the Raspberry Pi sdcard, but 512 MB memory is high enough for the size of my static pages.
For dynamic pages (the comments and blog), it pulls from the Cache server running Memcached. And if Cache goes down, it pulls from Comments directly. This "proxies" my Comments server and keeps it from being exposed to the Internet directly for security reasons and for speed. I had to reduce the CPU processing of the dynamic server except in cases where I needed it to actually process, such as generating a new comment or blog.
Comments is a web server running Lighttpd. I chose Lighttpd since it has built-in support to autospawn FastCGI, which Nginx didn't have. I run FastCGI on it, since it is more efficient than CGI and is less of a burden on the CPU.
Cache runs only Memcached, to allow as much of the 512 MB memory as possible for Memcached. It is used by Nginx to cache pages stored by ScratchedInTime, is used as a type of high-level IPC between ScratchedInSpace and ScratchedInTime, and between ScratchedInTime and OswaldBot, and is used by ScratchedInTime to store persistent variables for FastCGI access.
If I need to take down the other servers for maintenance, it can act as a temporary web server to notify people that the Static and Comments servers are down. It is as simple as swapping out its sdcard (preconfigured for this purpose) and rebooting.