Common Lisp Semantics

Purpose of this page

We are frequently asked about the semantics of a function call in Common Lisp. So rather than repeating the same thing in the #lisp IRC channel every time the question is asked, here is an attempt to create a small explanation that can be referred to instead.

Values, object representation, and function calls

Value

Let us start by defining what a value is. It is simply an object that is the result of some computation, or of some evaluation as we say in the Lisp world. The term does not say anything about the nature of the object. In C++ and some other languages, this term is often used to mean "not a pointer", but that definition is not particularly coherent, since it would mean that some expressions do not return a value when evaluated. Here, we use the term in the sense of "result of a computation".

Object representation

The definition of object in the Common Lisp standard is "any Lisp datum", so a number such as 234 is an object, and an instance of a class such as person is an object as well. When we say "object", that is what we mean.

Common Lisp uses what we might call uniform reference semantics. This term implies that Common Lisp behaves as if objects are systematically manipulated through a "reference" or a "pointer". The objects themselves live in memory somewhere (called the "heap"), but what is passed around to functions, and what is assigned to variables are always references. 1

Function calls

Common Lisp, just like most other programming languages (including C++), uses "call by value" for function calls. This term means that arguments to a function are evaluated before the function is called, and that the function then receives copies of those values as arguments. So for instance if you have the code (f (print "hello")), then "hello" will be printed before the value of the form (print "hello") is passed to the function f.

This term says nothing about the nature of the objects that are passed to functions, but as we mentioned in the previous paragraph, the nature of that object is (semantically speaking) a reference to the string "hello" in this example, because that is what print returns in this case. So, although values are copied before passed to the function, what is copied here is a reference. In other words, a parameter of the function refers to the same chunk of memory as does the result of the evaluation of the corresponding argument.

Examples

  (defparameter *y* (cons 234 nil))

This example is the basis of further examples. It creates a new variable named *y* and gives it an initial value. We say that the value is a cons cell with 234 in the car slot and nil in the cdr slot, but this way of saying it is really shorthand for "the value is a reference to a chunk of memory on the heap. That chunk of memory is a cons cell. The car slot of the cons cell contains a reference to a chunk of memory on the heap containing the number 234 and the cdr slot of the cons cell contains a reference to a chunk of memory on the heap containing some complex data structure representing the symbol nil".

Since we are always going to mean "a reference of a chunk of memory on the heap, containing...", we usually leave it out when we discuss the semantics of some Common Lisp form. Here, we make it more explicit, in order to get the message across loud and clear.

Another way of thinking of the value of *y* is that it is a Common Lisp proper list with a single element in it, namely the object 234.

  (defparameter *y* (cons 234 nil))

  (defun f (x)
    (setf x "hi"))

  (f *y*)

In this example, we create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function f with the variable as an argument. The variable is evaluated, producing the cons cell as a value (which is really reference to some chunk of memory on the heap that contains the cons cell as previously described). Then a copy of that reference is passed to the function f and made to be the value of the parameter x. What f does is to modify the value of that parameter to instead be a reference to a different chunk of memory holding the string "hi". But since the parameter is just a local variable of f, this modification has no effect outside of f. Most modern programming languages behave this way, including C++.

  (defparameter *y* (cons 234 nil))

  (defun g (x)
    (setf (car x) "hi"))

  (g *y*)

In this example, we again create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function g with the variable as an argument. The variable is evaluated, producing the (reference to the) cons cell as a value. Then a copy of that reference is passed to the function g and made to be the value of the parameter x. What g does is to modify the car slot of the cons cell being referred to by x. Since both x and *y* refer to the same cons cell, the car slot of the cons cell referred to by *y* now contains the string "hi" (or rather, a reference to a chunk of memory on the heap containing the characters of the string "hi").

  (defparameter *y* (cons 234 nil))

  (defun h (x)
    (push "hi" x)) ; same as (setf x (cons "hi" x))

  (h *y*)

In this example, we again create a variable named *y* holding a (reference to a) cons cell as described previously. Then we call the function h with the variable as an argument. The variable is evaluated, producing the (reference to the) cons cell as a value. Then a copy of that reference is passed to the function h and made to be the value of the parameter x. What h does is to create a new cons cell containing (a reference to) the string "hi" in the car slot and (a reference to) the cons cell that x also contains (a reference to), resulting in a list of two elements, namely "hi" and 234. It then assigns (the reference to) that new cons cell to the local variable x. Finally, h terminates and the value of the local variable x is lost. Again, the value of the variable *y* has not changed in any way.

Why languages such as C++ have much more complicated semantics

The semantics of Common Lisp are very simple, and very sane. So why do languages like C++ complicate things so much? The answer is simple: they do not have automatic memory management.

In order for uniform reference semantics to be possible, the language must also have automatic memory management, simply because it is generally impossible to know how many references there are to a particular object. If a function is called, passing with some object as an argument, that function may pass it to other functions, assign it to some global variables, and perhaps store it as the element of an array.

If it is generally impossible to know how many references there are to a particular chunk of memory, it follows that it is impossible to know when to reclaim that chunk of memory. If the chunk is reclaimed although there are still references to it, the program might crash or (worse), silently produce the wrong result. If the chunk is not reclaimed even though there are no references to it, the program has a memory leak. Neither of those two situations is acceptable.

So in order to always know how many references there are to an object, C++ and similar languages always copies objects by default. That way, each copy always has exactly one reference to it, and when that reference is no longer used, the object can be deleted. Let us call this technique "copy semantics".

But the implications of copy semantics are dramatic. For one thing, objects can be large, so when a function is called with some object as an argument, the object must be copied, and that is costly in terms of performance. For that reason, while C++ and similar languages also use "call by value" semantics by default, the programmer can specifically request "call by reference" semantics instead. This term means that the object is not copied, and instead a reference is passed to the callee. Call by reference is indicated by the caller having a parameter such as "int &x". When that is the case, it is no longer possible to have an arbitrary expression as the argument to the function; it has to be what is called an "l-value" in C++.


1. Warning, this footnote is meant for people with concerns about performance of "uniform reference semantics". The secret word here is "semantics", meaning "Common Lisp behaves as if...". The people who create Common Lisp systems, are smart enough to include optimizations that preserve the semantics without actually incurring the additional cost of allocating small objects on the heap.