Whenever one designs frameworks in which some of the more “core-y” functionality is supposed to be implementable by outside sources, a really difficult problem arises: How do you design the interface? Since this is a core functionality, a lot of other “modules” will depend on it and you can't have those write their functions 20 different ways just to cover all the different versions of one core part. But you also don't want to restrict the interface too much, because that would limit the capability of extending the base functions. It's even more problematic when this pluggable core system is supposed to be general purpose and no strict interface can be defined for each part of the core. So how do we deal with this?
Before I'm going to answer that, I'll get rid of these questions beforehand: Why is this an issue now, wasn't this already solved in v4? What's suddenly different? Well, now that I've been thinking a bit more about the actual structure for v5, I've come to the realization that I want to create a “pluggable implementations system”. What this means in more specific terms is that the core will allow special modules to implement parts of the core system itself. These are parts that are most likely used by almost all top-level modules. Parts such as users, authentication, session handling, templating, text parsing, etc. Since I've chosen a “top-down” perspective, here's why I came to this idea: I want to implement systems that don't have a database connection at all and I want to allow different parsing systems to be easily exchangeable. In order to do this I need to get rid of the primary dispatcher and I need to allow extensions of core mechanisms (text parsing) as an example.
Why didn't this problem exist in v4? Well, it did, but I sort of hacked the solution in. V4 has a hooks system, which allows modules to offer certain points in their program flow where other modules can “hook into” and extend functionality. This is fine per se, but it wasn't usable for the text parsing problem as that goes the other way around (primary module calls sub module). I fixed this with a hack to the loader and added a special function to parse text, which then in turn called a series of “parser hooks” that parser modules could register for. This is not ideal, but it was good enough.
V5 should be better, so we're going to generalise this ugly hack and thus turn it into a feature. Hooray for software development! But first, I should explain the new program flow. Since we cannot have a dispatcher that relies on database connections, we're going to have to make a new startup sequence. I've worked out the following: User calls webpage, call gets caught by Hunchentoot. Any request that isn't a file gets handled by INIT. INIT has a list of static handlers that only act on subdomains. If no direct module handler for a subdomain is registered, a call to the implementor is made to retrieve the dispatcher. INIT then calls the loader to get an instance of the dispatcher and passes all call variables on to it. The dispatcher is a pluggable core module, so it can be anything. What's done in the default core though is that a collection of url matchers (that belong to the specific subdomain) is loaded from the database. Each matcher is compared against the rest of the url. If it matches, the module associated with it is retrieved through the loader and its registered function is called with all the regex groups in the matcher as arguments.
In short: User → Hunchentoot/Webserver → INIT → Loader → Dispatcher → Loader → Module
Another nice thing about this is that the actual dispatcher now could also be programmed to handle more complex cases or to do prefetch/caching work or… whatever! Additionally, the database can too be a completely separated pluggable of its own, so we can have different database types available if so desired.
I don't think there's a good solution for the interface problem. You just can't be both dynamic and generalized, the two contradict each other. I'm going to try to lay out the minimum requirements for the core plugins and thus define a minimum interface that pluggables have to implement. This requires a lot of fore thought on my part and I'm sure it's going to fail at some point, but I still like it better than having completely generalized functions and leaving everything open. Since this smells very much like classes, that's also the way I?m going to go. The plugins system is basically a class database. Modules can register themselves for a specific functionality or they can act as a base class that any implementation has to extend. The core is going to register all the base core classes that need to be extended to impose this “minimum interface” approach. The additional benefit of this is that the implements can be used for basically anything by any other module, which would automatically give them an extensions system as well. I really like this side effect, as it offers a potential solution for the plugins problem I had with v4. More on all of this later as I think more about it and put all the puzzle pieces together.
Now that we've got all of this jazz out of the way, it's finally time to take a look at the module base class. Any module will have a bunch of basic properties it should fill out so that it's easily introspectable by outside modules. This increases the possibility for abstractions as you can create modules that act on all other modules. This is required if you want to create a generalized search engine or a REST api or whatever that isn't supposed to be specific to anything. The first thing that's necessary for introspection is some information about the module itself. These are name, author and version fields. This is mostly helpful to detect incompatibilities or to display information in an administration panel or something like that. Next the module loading process can quickly be simplified if dependencies are introduced. So any module can have a list of other modules it needs to have loaded before it is ready. Our default dispatcher would have the database as a dependency, a module relying on user specific stuff would depend on sessions, etc. Something important for modules calling others is that they need to know which arguments to pass to what functions and what they can expect. For this, a module needs to define a map of methods and arguments. Each method mentioned in this map should be designed to be accessible from the outside (public, as you might say) and each value in this map is a list of arguments that are expected. Each argument should have a name and type defined. The last thing I found important for introspection is the data structure. Or in other words, what kind of collections the module uses in the database and how they're usually structured. This is a bit more complex and I'll explain it in detail. One last change to the previous version arises from the fact that PHP always builds up its entire framework every single call. Now that we're working with Lisp and Hunchentoot we can implement persistent modules, so that gets its own attribute as well: ul{ l_Name_ l_Version_ l_Author_ l_Persistent_ l_Dependencies (list: symbol)_ l_Callables (map: Method (symbol) → Arguments (list: symbol, type))_ l_Collections (list: collection)_ } In order to allow data introspection (or snooping, if you will) by other modules they need to know what structure to expect and how to fare with it. As such, any table that is used by a module and should be known from the outside needs to be defined as a collection instance in the collections list. A collection simply contains a list of columns. The list for each column must contain the following attributes: name, type, mode. There can be additional attributes, but I don't see any use for others right now. Type is simply the expected variable type and mode is the standard UNIX access mode for this column. The mode is especially important so that sensitive information isn't accidentally revealed to parties who shouldn't know about it. Instead of reinventing the wheel there I simply rely on the default UNIX file modes (rwx-ugw). You can also define a mode on the entire table, although column specific modes will always override that setting. Aside from the columns and the mode, a collection also contains its own name. All of this should be sufficient to allow direct data access from the outside, while retaining security.
This all adds a bit of tedium to creating modules, as changing the data representation will often also require updating the collection in the module metadata. However, it makes it actually possible for outside modules to reuse data, while this was practically impossible to do before. I'm also quite happy with what I have so far about the new startup sequence and the implements, but it still needs some more thought. Next time: More in-depth thoughts and refinements.
Written by shinmera