Functional Extensive Markup

2012.04.13 16:44:40

Index

In the modern web 2.0 user interaction is everywhere. One of the most important things a user can do is to contribute content. And this content comes, most of the time, in form of text. However, there are certain ways to enhance the plain text experience and make the text more appealing, make it portray more information and just generally make it easier/better to read.

The problem comes pretty much immediately. How do you let the user enhance his text? The most direct approach would be to simply let the user input HTML, since the website is going to be HTML as well, so there's no need to parse it. However, HTML has some glaring issues. The first being that you usually don't want the user to have unlimited control over what he can do. After all, he could easily screw up the entire website layout with some messy HTML code. So you need to sanitize it. HTML sanitization is a very messy topic and is really not an easy task. There are libraries out for that though, so you could simply use one of those and get your sane and safe code delivered pronto. However, the sanitization isn't the only problem HTML input faces. The second is that it's not the easiest thing to learn. This may sound a bit ridiculous (it does to me), but again, the first principle of webdesign, Don't Make Me Think gives some heavy problems here, since the user has to learn HTML first. The third problem is that HTML is, simply, terrible to read. It doesn't look nice and the forest of tags can quickly become very frustrating to look at.

So HTML might not be the best choice after all. What else do we have? Many forums and message boards use BBCode, which is very similar to HTML, but offers a more streamlined, simple and sane approach. BBcode can be learned pretty quickly and has tags that make sense to the user. Parsing is pretty easy as well, since you can use simple regex to do most of the work (This approach has its flaws, but it holds water for most things). The problem of sanity is pretty much eliminated altogether since you can select a specific few tags to parse and make sure you parse them correctly. The remaining problem, while it's not as bad as with HTML, is the “weight” of the markup. It just doesn't look easy on the eyes.

Alright, BBCode is better, but still suffers from some of the same flaws as HTML. There's got to be another way though, right? And there is. The next widely used approach is Markdown. Markdown is very different to the previous two and tries to give more readability. It has some pretty handy features and is very easy on the eyes, but you really need to learn what which characters do and well, how to get around in the text. Luckily for markdown, many of the formatting characters are intuitive, but it's still a hassle to get used to it, especially when you're used to tagged markup. So markup trades ease-of-learning for simplicity and it does a very good job at it.

However nice Markdown is, it's not appealing to me for the simple fact that I still often find myself confused about what is what and in question have to hit up the syntax instead of getting a simple clue from a tag about what it does with your text. Markdown also suffers from the fact that it isn't as easy to extend, since the amount of choices you have is very limited. In BBCode for example you can simply add more tags at will, specify a certain HTML equivalent and voilà, you got more markup choices. Of course, I wager that there's more markup systems out there that I'm not aware of. If you know one, tell me! I'd be interested to know!

Since none of these choices are very appealing to me, what I decided to do is make my own markup language, FEM. It orientates itself on argol-style function calls. That is to say, it kinda looks like code in a way. Which of course makes it very appealing to me as a programmer (I however can't shake the feeling that this markup language must have been already made in some fashion). So, what's the ups and downs for FEM? FEM takes the concept of BBCode markup and strips some of its heaviness off, as well as some other slight nuisances. Let's take a look by example. Here's a standard HTML formatted text: code{ So Lemme Tell You Something
This is a load of bollocks especially since you have absolutely
NO idea what you're talking about. Seriously, just shut the fuck up.
I am telling you for the last time. SHUT THE FUCK UP, SHINMERA, YOU DON'T KNOW ANYTHING!!1!1
} And the equivalent in (possible) FEMarkup (which is parsed to the exact same thing above): code{!{ u{So Lemme Tell You Something} img(left){http://stevenarch.tymoon.eu/fab/thumb/130963228092s.png} p{ This is a load of b#bollocks i#especially since you have absolutely b#NO idea what you're talking about. Seriously, just shut the fuck up. I am telling you for the last time. **SHUT THE FUCK UP, SHINMERA, YOU DON'T KNOW ANYTHING!!1!1** } }!} I'm saying possible here since you can define the tags however you want. So what's the difference? The first thing to mention is simply that the text is enclosed in brackets, rather than tags. This also eliminates the closing tags. There really is just no need for them anyway. In fact, specific closing tags will only add problems, like so: <strong> <em> This is invalid! </strong> </em> The only benefit they have is a little more clarity to mark your blocks with, but honestly, that's hardly an argument as long as proper indentation is used. The next thing to note is the way arguments for the tags are specified. And that's simply by including them in a list inside regular brackets after the tag. So the HTML equivalent of <a href="http://lol.com">Whatever</a> Is simply (for example) this: !{ [Whatever](http://lol.com)!} There's also support for multiple arguments, for a hypothetical function like this: !{ quote(username,postID){Quotetext}!}

There's also a way to escape formatting (which isn't really possible in HTML, unless you replace every occurrence of < and > so they don't get parsed): !{This is how you can format things: _Makes me italic_ and **Makes me bold** try it.! }

So yeah, it's a pretty neat system and I like it quite a lot. I don't know how easy to learn/read it is for other people, but I hope it's not much more difficult for them than it is for me. Of course, this isn't as clean as markdown, but it's still quite a step forward from HTML/BBcode. FEM will be the default markup language for TyNETv4. I have already implemented a parser for it.

One thing I didn't mention so far are WYSIWYG editors that are making themselves more and more present throughout the web. The reason why I didn't mention them is because they still use a markup in the background and I personally hate WYSIWYG editors. My hate mainly comes from the fact that it's often less WYSIWYG than you'd like and I simply prefer to see code and be able to type how I want the text to be instead of clicking around. Of course, for regular users a WYSIWYG is probably the best choice. I might make one for FEM if I get really really bored some day, but I doubt it.

Written by shinmera