CL Symbols and Packages - Confession 8

2014.07.09 17:54:03

Index

While Practical Common Lisp is a superb primer on CL I feel like it doesn't cover packages and symbols early or in-depth enough as I still had to learn a lot about them afterwards. This topic as well as ASDF/Quicklisp and Slime/Swank are things that I'll talk about a bit in a few blog entries in the hopes that they may be useful to other people learning CL.

One of the big things to wrap your brain about when you come from other languages is the concept of symbols. A symbol is one of CL's main data types and one that doesn't exist in most other languages. Symbols are also what makes up most of the source code and allows the seamless transformation of code through macros. Looking at the hyperspec entry reveals that symbols have a couple of properties, most notably a name, package, function and value.

When the CL reader reads a character sequence that isn't specially handled like numbers, strings, and so forth it reads it as a symbol. To explain this in more digestible terms, when you use (READ “foo”) it will return the interned symbol FOO. The same happens when a CL source file is first read; it is transformed into lists (cons cells), symbols and other primitive types. A symbol can be named anything at all, though you may have to surround it with vertical lines (|) since the reader would treat certain characters differently. Symbols are also case-sensitive, but are by default automatically put into uppercase.

So symbols are a special type of data to name variables and functions in your source code. As a symbol identifies a variable it can carry a value. This value can be retrieved through SYMBOL-VALUE or by simply writing the symbol unquoted into the source code of course. CL, unlike some other lisps, differentiates between values and functions and thus allows you to bind a function and a value to the same symbol at the same time. The function a symbol is bound to to can be retrieved with SYMBOL-FUNCTION, with the special operator FUNCTION, or through the reader macro #'.

Usually when a symbol is read it is INTERNed into the current package. This sets the symbol's package property and registers it with the package, but you can also make symbols that don't belong to any package by using #:. Packages are a rather simple form of namespaces. There is no package hierarchy as it exists in many other languages, nor are there any other complex relations. Packages merely possess a registry of symbols and a status of whether a given symbol is external, internal or inherited. When a symbol is inherited, it means it was IMPORTed into the package from somewhere else. If a symbol is external it means it's intended for anyone to use when they want to use the functionality the package offers. Such symbols may be either imported or accessed with package:symbol. When it is internal, it resides in the package but is not meant to be used from the outside. It is however still possible to access it by using package::symbol.

Packages therefore offer a way to group and separate symbols. Since symbols carry functions and values we can gain access to functions and variables that are defined in other packages. One of my worries when I figured this out was that, when a package is :USEd since all the symbols it exports are now in my package, I'm basically binding my variables to ‘their’ symbols now. Of course this could be problematic if the other package defined, say, special variables on regular symbols, I may accidentally muck things up. However, this is a non issue. First, special variables should always wear earmuffs, so that they can be easily identified as such. Second, establishing lexical variables should always happen with LET so you wouldn't accidentally set a different value to their special in the first place. However, it could still be troublesome since it would be a dynamic binding, rather than a lexical one. One legitimate problem though is that you might want to name one of your functions the same, which would lead to a conflict. To avoid this you can SHADOW the symbol, which will then create a new symbol of the same name in your package. Of course, this means that you will have to use the full package:symbol name whenever you want to use the other package's function.

Now you may be wondering that if the primary purpose of symbols is to allow the access of functions and values, what the point behind uninterned symbols (ones without a package) would be. These are exactly useful because they don't belong anywhere and thus can't interfere with any other symbol. The main use of such symbols is either when only the symbol-name matters (such as in DEFPACKAGE to avoid needless interning) or in macros where you need to expand to variables that hold values or functions but should not come into contact with the user of the macro. In the latter case you should use GENSYMs to make absolutely sure.

Another thing you may be a bit iffy about is that internal symbols are still accessible, especially when coming from a language like Java with private fields. However, this can actually be used to great advantage and is something I've come to love about CL. For one, it allows you to override functions from other packages, essentially making it possible to write separate projects that act as very complex extensions that might need to change some internals of other packages to achieve their goal, or you could temporarily fix a problem in another package by fixing it yourself without needing to modify the other source code. So while yes, this does give you the ability to seriously screw over other parts of a program you still cannot do so by accident (due to the distinction between internal and external) and having the ability to do this if you need to is a great benefit.

To close off: Symbols are one of the more unique aspects of lisp, but they offer an ingenious way to create a uniform representation of code syntax. Understanding how they work gives a good amount of insight into why CL is able to do the things it does.

Written by shinmera