Building TyNETv5 Pt. 14 - Human Interface

2013.09.01 00:34:32

Index

As promised last time, I took a look at the framework and thought a bit about what I actually wanted to write in the end, versus what I had to write in actuality to achieve the same goal. I didn't think of writing a blog at the time and so I already moved on a bit from there. I'll try to recount the issues I've located even if it has been a while now.

One of the biggest concerns is the ease of writing a web-application. There are multiple parts to this process, each of which has their own set of issues. Ideally all information concerning the web-app should only need to be written once, even if it is repeated in the individual parts. For example, when outputting a form you should only have to describe it once and the framework should be able to handle output, validation and data submission itself. This isn't a trivial task, as there's a lot of different data types, formats and structures that need to be handled properly. Not to mention that data is not necessarily presented in the same structure to the end-user as it is in the application or the database as it might either be more efficient in a different structure, constrained to this form by other limitations or simply because it needs to be outputted in different ways.

Templating concerns itself mostly with taking the data from the database or application and transforming it into HTML for the user to see. Here it's important to have a distinction between data processing and the actual HTML template itself. Ideally the template should be almost completely separated and the application should be able to figure out on its own where to put the data in the template. This allows easy modification of templates by webdesigners and helps to encapsulate and abstract away the differing processes. Having enough flexibility is key here as different website designs, while in the end providing the same data, can be structured very differently.

Validating takes a look at data that the user submits to the server. It checks if everything that is required is provided and that all of it comes in acceptable formats. This is usually also called “sanitization”. Raw user input is not considered “sane” as it could contain malicious content that might mess up the system. This is not exclusive to form submission. For example, checking that a requested user profile exists could also be considered part of the validation process. Here the difficulty lies in recognising what types are appropriate, how to check for errors without letting anything slip by and lastly, how to present the errors to the user in a simple and streamlined fashion.

Storage takes user data and manipulates the application's data set with it. This can be anything ranging from changing the session to updating the database. At this point it is usually considered that all input is “sane”. Storage should not have to care about the data contents and only worry about getting it to the proper place or evaluating it accordingly. Most of the time this is step is rather straightforward. The issue is mostly in the sense that it is prone to duplication. Most of the time the “model” has already been described for the validation and templating steps, or it is extremely similar to other model storage processes.

A rather strict separation of these three components results in the MVC (Model View Controller) pattern. However, all parts have areas in which they overlap with each other. If I could just go out and wish for a feature, it would be to have this all unified into one declaration that requires no duplicate information and handles almost everything on its own. However, since this is real-world computing, we'll have to make some compromises and settle with trying to reach for the best.

In Radiance, templating is rather easy, provided you use lQuery and UIBox. UIBox allows you to use a special html attribute that designates which part of a block should be filled with what data. UIBox then simply has to be fed the data and it will automatically transform the template into something usable. This is nice because it allows a webdesigner to make a template that looks like a finished page load. He can then simply add the proper tags at the right places and, while a browser will still see the template standard design when loading the HTML file, Radiance will also be able to transform it into an actual, dynamic page. It isn't at the ideal of “figure out where to put the data yourself” yet, but it comes rather close I would say.

The part that is probably most lacking still is validation, as it is a rather tiresome topic to me. There's a couple of ideas I have on how to unify it with storage and simplify the entire process radically. For now though this still needs a lot of work done. Hell, I haven't even got validation functions for most types done yet.

Storage is pretty simple as well, thanks to the data-model interface and the with-model macro. Data models provide a unified and simple interface to the database. To the framework user they appear as a special kind of hash-table. You can simply store and retrieve things on a flat key→value basis. Loading models happens through a unified database query interface, as it would if you were loading raw data. Updating, inserting and deleting data however becomes as simple as executing a single function on the model. All database interaction is hidden away from the user, making this a very efficient and simple method. The with-model macro makes this even easier by allowing you to directly bind model fields to variables, similar to the with-accessors macro the CLOS provides.

In these central issues, it's clear that I still need to work on validation the most. Everything else is looking pretty good in comparison right now. I still need to get used to a more REST-API based design for forms, but even that is simple with the provided radiance functions.

Aside from these problem areas, there's some more technical issues involved. The biggest of which is caching. Efficient caching is an extremely important factor for web frameworks, as page builds are very expensive. There is a very slim time-frame available during which all data for the request has to be validated, loaded, transformed, templated and then has to be sent to the user. If page builds tend to be expensive, it is time to start caching things. However, caching comes with its own set of rather nasty issues. There are four things to consider for caching: What level should it be cached on, is the cache overhead worth it, how does the cache get renewed and how easily is this cache implementable while hiding as many of the details from the programmers as possible?

For radiance I see five levels of caching. Which level to choose is heavily dependant on how well you can afford data changing behind your back (since everything is asynchronous) and how easy it is to renew this cache. 1) Cross-Server. This is usually where the database and configuration resides. Things cached in here are available even after a server has been restarted. 2) Server. On this caching level, everything persists for as long as the server stays online. Sessions, modules, hooks and so on come in here. 3) Session. Values for sessions are active as long as a user is considered “logged in.” This might include things like the current IP, the last visited page and so on. 4) Request. Request-level-caching is useful if information needs to be shared between modules or might be re-requested at a later point in the request sequence. For example, all users that are loaded from the database during the page-build could be cached here without any risk. 5) Function. This is basically just lexical scope related, but it is a “cache” level regardless. Saving the amount of computations necessary can still be a great performance boost if it happens in the right places.

Caching does introduce some overhead. For example, when caching fixed HTML pages to a statically rendered page, there's some overhead in saving the file to disk and then loading it from disk again when presenting it to the user. Obviously in this case it is well worth it, since building the raw page would always take longer than directly loading a file from disk. However, would it still make sense to cache all sorts of tiny values that could easily be re-computed later? Managing the cache-store for this might end up being more expensive than the calculations themselves.

Renewing the cache is another problem that can be quite tricky. Since the server is very likely to process many requests at once, some of which may very well modify data, it's very easy to catch it in a “bad” state and produce a cached version of a value that immediately becomes outdated or maybe even wrong, depending on how safe your threading is. The other part of this issue is that you need to be able to recognise that the cache is out of date in the first place, which can be very tricky depending on the situation.

Lastly, because caching can be so complicated and difficult, it is all the more important that these nasty bits are hidden from the programmer as much as possible. To the programmer, caching should ideally only involve one operation called “build-cache-or-read-from-cache”. Everything else, from recognising cache-age to outputting the data should be handled by the cache implementation with relatively little necessary intervention from the user. As most of radiance's modules will be dynamic pages for most of the part, figuring out a good page-caching functionality is going to be very important, but also rather difficult because the pages need to retain their dynamic features.

I've written quite a bit it seems. Well, hopefully I can return with good news regarding these issues soon. I'm not sure how development of radiance is going to go on from now on as University is about to start, but I will write a blog about that before too long.

Written by shinmera