Building TyNETv5 Pt. 10 - Keeping it Safe and Sound

2013.07.25 18:45:25

Index

The web wasn't designed for applications. Or at least I don't think anyone who was working on the initial HTML and HTTP implementations imagined it turning out like this. As a result of this the standards are constantly being bent to their extremes to allow the luxuries we enjoy when using the web today. But regardless of how much you want to advance the standards, it won't change the fact that the underlying core principles of the web are causing quite a few problems. On the client side we have the problem that CSS, HTML and JS simply are not solutions designed for applications. On the server side we have the problem of persistence and authentication. And that's what I'm going to talk about.

The web is based on a request architecture. What this means is that for every change in data, a message has to be sent to the server, which then has to process the message and return whatever data is appropriate. In the early days this was the logical way to do it. The web was there to serve files, so you asked the server for a file and if the server could find the file, it would send the data back to you. Once the transfer is done, the client and server part ways and don't need to know of each other at all past that point. Simple, direct and effective.

Now as you might imagine, once you try to write applications with this kind of underlying system you run into a problem. How does the server know what kind of data to respond with? Every time the browser sends a request to the server, the server basically has no clue of the client's state or anything. With this kind of set up, any kind of action in the application would result in a complete reset, which is unusable. So how do we fix this problem? Well, since we can't keep persistent connections, we need to figure out a way to make the server able to identify the client asking for the request, so it can server the appropriate content. In an ideal world, this identification would be completely unique and match only this one client.

In our world however, the client cannot be uniquely identified and due to the request based architecture the identity can be stolen, so that other clients can pretend to be someone else. This is quite bad and poses a huge security risk. Nevertheless, we need to deal with it, as HTTP is far too prevalent to be succeeded by anything else any time soon, if ever. Luckily there's something to help along the way a bit: Cookies. Cookies are basically additional pieces of data that are sent along with the request to the server. The server can also issue cookies to be set on the client side. Using this, we could for example generate a random, unique string on the server and send that as a cookie. Every request after that will be “signed” with the cookie, so we can identify whom we're dealing with.

This kind of thing is called a session. The session contains data related to the client and potentially the currently active set of data for the page the client is working on. Our simple UID (unique ID) approach gets rid of the request problem, but it's easily exploitable. Thanks to the way the web is made, other computers can use a variety of techniques to obtain the cookies of another machine. This would allow them to steal a session. This problem is still unsolved on most websites today and the only real solution is to send the cookies over SSL (HTTPS). Not all sites can or want to employ SSL though, so it will probably remain a vulnerability for some time.

The next thing we have to worry about is tying the session to a user. User data needs to be saved somewhere so that it may be re-served at a later time, without losing it when the session ends. So now we arrive at the next problem: How to identify and authenticate a user. The standard procedure for this is to employ a username and a password that the user can pick for themselves. If those match with the account stored on the database, the client is authenticated and we can begin a session. Sadly, the general populace doesn't seem to be suited for the use of passwords, as most of the passwords people use are ludicrously insecure, especially considering the computing power available today.

The first problem that arises when using passwords is that the database containing the passwords to check against might get compromised in some fashion. Since users don't like to think, they're very likely to have used this exact login information on other sites before. The attacker now has access to all those accounts as well, which could include sensitive data. So we need to stop them from gaining access to the passwords. The first step to this is to create a hash of the password. A hash is a one-way encryption, where an input generates a seemingly random output, but the output cannot be de-crypted to the original input. Hashes at least won't give the attackers immediate access to the passwords. Sadly though, many users will use dictionary words in their passwords. Using a dictionary brute force attack on the hash will quickly discover the original password.

Salting is the (partial) solution to this problem. A salt is a random string that is attached to the password, which in effect both makes the password longer and unsuitable to be attacked with a dictionary based brute force. There's still two issues with this though. The first being that the salt could be compromised as well, in which case it becomes completely useless. The other is that even with salting, modern processors (especially GPUs) are so insanely fast, that a Radeon 7970 can calculate 8213.6 M c/s!

Well, what's the solution to this then? I'll admit, the above is a bit of a trick. I haven't even talked about the different hashing algorithms yet. The statistic from before is for MD5 hashes, an extremely weak form of hashing. So you might be guessing by now what there is to do about this problem: Use algorithms that take longer to compute. The currently most suitable algorithms for this are bcrypt and PBKDF2. These algorithms have an additional parameter that specifies how many iterations to undergo and they're specifically made to be slow on GPUs. Since the login process can take a bit longer without any worries, you can scale the calculation of the hash rather high, so it becomes unfeasible for attackers to brute force it. Of course, this also means that servers and values need to be kept up to date with current processing speeds, to ensure safety. And it is certainly never a guarantee, since it might still be possible for them to obtain the salt, figure out the encryption parameters and then hack a few passwords.

Long story short: Password authentication is really insecure and will always remain a problem and point of attack. Not only is it an inconvenience for the user (remembering all these dang login informations), it's also a security threat to them. But is there a way to avoid it?

You can't dodge it completely, but you can at least shove the blame and responsibility off to someone else. Large sites often offer a service like OpenID or OAuth that allows you to use their authentication to log your users in. This has both the benefit of reusing existing accounts and that large sites simply have more resources to properly secure your data against attacks, which basically fixes both of the issues in one go. Having talked about all this should give you a rather accurate idea of how I'm handling it in TyNETv5. I will still state the more precise procedure for posterity. In Radiance, all major components aside from the core are written as “implementations”. The same goes for users, sessions and authentication. Each of these objects have a standard interface defined they have to abide by, but are otherwise free to implement the requested functionality as they see fit.

The standard module to handle all this is called Verify (radiance-mod-verify). Verify itself handles users and sessions, but does not concern itself with authentication directly. It instead offers another interface to define authentication mechanisms. Before going on to mechanisms, I'll explain the idea behind the users and sessions. Users are rather direct mappings to the already existing data-model, which is an abstraction of a database interface for a particular record. As such they merely serve as an interface for the data on the db.

While users are db-persistent, sessions are server-persistent. No session data will ever reach the database and sessions are therefore strictly temporary. The way a session is identified happens through a cookie that is built in the following fashion:

TIMESTAMP:RANDOM:SESSION-UUID → [ RSA-ENCRYPT | USER_SECRET ] → USERNAME-ENCRYPTED_PART → [ RSA-ENCRYPT | GLOBAL_SECRET ] → COOKIE

Once a request comes in, the cookie is decrypted and destructured. All the values are compared to the ones stored in the session and if everything matches, we have authenticated our user! I am no expert on security, but I am hoping that this approach is secure enough for this. Obviously transmitting them over SSL would be the last step in this, but that's more of a server issue than a programming one.

Finally, let's get to the mechanisms. Each mechanism has to implement a couple of functions, both for logging in and registering. Each mechanism is responsible for putting the form data into the page itself and for actually handling the authentication process or for linking an account with a particular method. I've implemented three standard ways of authentication: Password, OpenID and OAuth. Each of these methods is implemented as a module and can be deactivated or activated at will. I didn't build the OpenID and OAuth mechanism from scratch, I luckily had cl-openid and cl-oauth to do the heavy lifting. For the password hashing and encryption I was lucky to find that Ironclad fulfilled all of my needs.

The biggest issue in all of this was figuring out how to tie it together. Even though I've made it all work, it's still rather shoddy and the current structure doesn't please me at all. I will probably have to refactor it quite a bit still to bring it into an acceptable condition. I was surprised by how tedious and needlessly complicated it was to work out the architecture for this, so I am glad to be able to move on for now.

Next time: More abstract interfaces and core functionality.

Written by shinmera