Common Lisp semantics

Semantics of function calls

Like most other programming languages, Common Lisp uses call-by-value semantics. It means that arguments are evaluated before the function is applied to the values obtained by evaluating those arguments. So for instance, in the following function call:

(+ (* x 2) (/ y 3))

the evaluator first determines that + designates a function (namely the function for addition), so that function-call semantics should be used. It therefore recursively evaluates the argument forms, namely (* x 2) and (/ y 3). The standard requires evaluation to be from left to right, so that (* x 2) is evaluated before (/ y 3). In this case, this order does not matter, but if any of the argument forms have side effects, then the order may matter to the result.

So if we assume that the initial form is evaluated in an environment where x has the value 3 and y has the value 9, then the value of the form (* x 2) is 6, and the value of the form (/ y 3) is 3. The function named + is therefore applied to the two values 6 and 3. In other words, the code of the addition function can not determine how the values of its arguments were obtained. In this case, they were obtained from the evaluation of two compound forms, but they could have been obtained from the evaluation of literal numbers, or from the evaluation of a variable.

In fact, both the argument forms show this possibility. The form (* x 2) is also a function call, so the evaluator first determines that * designates a function (namely the function for multiplication), so that function-call semantics should be used. It therefore recursively evaluates the arguments x and 2 from left to right. The variable x is evaluated by looking up its value; in this case 3. The literal 2 is self evaluating, so the value is also 2. The multiplication function is then applied to the values 3 and 2. Again, the code of the multiplication function can not determine how these values were obtained.

Common Lisp objects

Semantically speaking, every Common Lisp object is manipulated through a reference (or a pointer). The object itself should be thought of as existing in an area called the heap. Common Lisp variables never contain the objects themselves, but only the reference. The following figure illustrates this idea:

Here, the value of the variables x and y are 3 and 9 as in our previous example. We have included a variable named z the value of which is the list (9 3). For an explanation of how Common Lisp lists are represented, see ... In the Common Lisp standard, the description of a function such as + says that it takes numbers as arguments and that it returns a number. It would have been more accurate for it to say that it takes references to numbers as arguments, and that it returns a reference to a number. But since every such description would always contain the phrase "a reference to a", this phrase is always omitted in the standard. It is good to keep in mind that this is always what is meant, though. ¹

We use the phrase uniform reference semantics to refer to this style of semantics, i.e., that every object should be thought of as being manipulated through a reference. To relate to the previous section, Common Lisp function calls use call-by-value, but the values are references. Some programming languages use the phrase "call-by-value" incorrectly. They use the word "value" to distinguish a datum from a "reference", but this usage has no support in the computer-science literature. A "value" is just the result of evaluating a form, and the term says nothing about the nature of that result.

Why languages such as C++ have much more complicated semantics

The semantics of Common Lisp are very simple, and very sane. So why do languages like C++ complicate things so much? The answer is simple: they do not have automatic memory management.

In order for uniform reference semantics to be possible, the language must also have automatic memory management, simply because it is generally impossible to know how many references there are to a particular object. If a function is called, passing with some object as an argument, that function may pass it to other functions, assign it to some global variables, and perhaps store it as the element of an array.

If it is generally impossible to know how many references there are to a particular chunk of memory, it follows that it is impossible to know when to reclaim that chunk of memory. If the chunk is reclaimed although there are still references to it, the program might crash or (worse), silently produce the wrong result. If the chunk is not reclaimed even though there are no references to it, the program has a memory leak. Neither of those two situations is acceptable.

So in order to always know how many references there are to an object, C++ and similar languages always copy objects by default. That way, each copy always has exactly one reference to it, and when that reference is no longer used, the object can be deleted. Let us call this technique "copy semantics".

But the implications of copy semantics are dramatic. For one thing, objects can be large, so when a function is called with some object as an argument, the object must be copied, and that is costly in terms of performance. For that reason, while C++ and similar languages also use "call by value" semantics by default, the programmer can specifically request "call by reference" semantics instead. This term means that the object is not copied, and instead a reference is passed to the callee. Call by reference is indicated by the caller having a parameter such as "int &x". When that is the case, it is no longer possible to have an arbitrary expression as the argument to the function; it has to be what is called an "l-value" in C++.

Examples

  (defparameter *y* (cons 234 nil))

This example is the basis of further examples. It creates a new variable named *y* and gives it an initial value. We say that the value is a cons cell with 234 in the car slot and nil in the cdr slot, but this way of saying it is really shorthand for "the value is a reference to a chunk of memory on the heap. That chunk of memory is a cons cell. The car slot of the cons cell contains a reference to a chunk of memory on the heap containing the number 234 and the cdr slot of the cons cell contains a reference to a chunk of memory on the heap containing some complex data structure representing the symbol nil". This situation is illustrated in the following figure:

Since we are always going to mean "a reference of a chunk of memory on the heap, containing...", we usually leave it out when we discuss the semantics of some Common Lisp form. Here, we make it more explicit, in order to get the message across loud and clear.

Another way of thinking of the value of *y* is that it is a Common Lisp proper list with a single element in it, namely the object 234.

  (defparameter *y* (cons 234 nil))

  (defun f (x)
    (setf x "hi"))

  (f *y*)

In this example, we create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function f with the variable as an argument. The variable is evaluated, producing the cons cell as a value (which is really reference to some chunk of memory on the heap that contains the cons cell as previously described). Then a copy of that reference is passed to the function f and made to be the value of the parameter x. This situation is illustrated in the following figure:

What f does is to modify the value of that parameter to instead be a reference to a different chunk of memory holding the string "hi". But since the parameter is just a local variable of f, this modification has no effect outside of f. Most modern programming languages behave this way, including C++. The new situation is illustrated in the following figure:

  (defparameter *y* (cons 234 nil))

  (defun g (x)
    (setf (car x) "hi"))

  (g *y*)

In this example, we again create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function g with the variable as an argument. The variable is evaluated, producing the (reference to the) cons cell as a value. Then a copy of that reference is passed to the function g and made to be the value of the parameter x. As in the previous example, we now have the situation illustrated in this figure:

What g does is to modify the car slot of the cons cell being referred to by x. Since both x and *y* refer to the same cons cell, the car slot of the cons cell referred to by *y* now contains the string "hi" (or rather, a reference to a chunk of memory on the heap containing the characters of the string "hi"). The resulting situation is illustrated in this figure:

  (defparameter *y* (cons 234 nil))

  (defun h (x)
    (push "hi" x)) ; same as (setf x (cons "hi" x))

  (h *y*)

In this example, we again create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function h with the variable as an argument. The variable is evaluated, producing the (reference to the) cons cell as a value. Then a copy of that reference is passed to the function h and made to be the value of the parameter x. As in the previous example, we now have the situation illustrated in this figure:

What h does is to create a new cons cell containing (a reference to) the string "hi" in the car slot and (a reference to) the cons cell that x also contains (a reference to), resulting in a list of two elements, namely "hi" and 234. It then assigns (the reference to) that new cons cell to the local variable x. The resulting situation is shown in this figure:

Finally, h terminates and the value of the local variable x is lost. Again, the value of the variable *y* has not changed in any way, as illustrated in this figure:

The objects that are no longer reachable from any variable, in this case the cons cell that we added and the string "hi" are ultimately reclaimed by the garbage collector.

^{1.
Warning, this footnote is meant for people with concerns about
performance of "uniform reference semantics". The secret word here is
"semantics", meaning "Common Lisp behaves as if...". The people who
create Common Lisp systems, are smart enough to include optimizations
that preserve the semantics without actually incurring the additional
cost of allocating small objects on the heap.
↩}

robert.strandh@gmail.com