Modifying Lists

One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.

With the subject of modifying lists, we will begin to discuss in detail the notion of side effects, which I mentioned briefly in an earlier tutorial. Most Scheme procedures and special forms are like the advice for environmentally concerned wilderness travelers, "Take nothing but memories; leave nothing but footsteps." Most of them don't change anything: There is no way for another, subsequent procedure to tell that the first procedure ever ran. They don't even leave footsteps.

A Scheme procedure that changes something is different. By examining the changed object, you can tell that the procedure has been there. Such a visible change is called a side effect. For example, if you change a list -- replace one of its elements by something else -- then that change will be noticeable by any procedure that subsequently examines the list.

The basic reason why computer types worry about side effects is very pedestrian and very common. Many times it is useful to know that you can set something up, and leave it and go away, and then come back later and find that it has not been changed. I am not talking about just computer science here, I mean things in everyday life, like setting up dinner on the table and then coming back later and finding that the cat has tipped over the soup tureen. A clumsy cat -- or a hungry one -- is an excellent source of side effects.

Such issues come up in computer science as well. For example, many users might be doing things to a web site at the same time, and causing side effects for each other. Or, sometimes individual computer programs are carelessly written, so that there are side effects that the programmer did not intend, or forgot about, in one part of the program, that mess things up for another part.

The convention in Scheme is that procedures and special forms that have side effects all end with an exclamation point, which is sometimes also called a "bang". We have already seen one example, the special forms "set!" -- which is usually pronounced "set-bang". Its side effect is that it changes the value associated with a symbol -- a variable name -- so that any subsequent use of the variable will get the new value.

(define a 6)    ;; ==> a
a               ;; ==> 6
(set! a 42)     ;; ==> #t
a               ;; ==> 42

You might have noticed that "define" has side effects as well, since it creates a named variable, with a value, where there wasn't one before. I have sometimes thought that "define" should be named "define!" -- with a final "bang", but the R5 report has it as just "define", so I must go with that. It is also true that programmers usually have more reason to worry about changing some object that already exists, than about creating a new object. So there is at least a rationalization for "define" not going out with a bang.

Lists are composed of interconnected pairs -- cons cells -- and in consequence, the basic procedures that modify lists are really procedures that modify pairs. The two elements of a pair are its car and its cdr. The procedures to change those elements are "set-car!" and "set-cdr!". Each takes the pair to be changed as its first argument, and the new value of the car or cdr as its second argument. Neither of these procedures returns any useful value; they both return #t.

(define my-pair (cons 73 88))    ;; ==> my-pair
my-pair                          ;; ==> (73 . 88)

(set-car! my-pair 3)             ;; ==> #t
my-pair                          ;; ==> (3 . 88)

(set-cdr! my-pair 'foo)          ;; ==> #t
my-pair                          ;; ==> (3 . foo)

The pair changes every time we use either "set-car!" or "set-cdr!".

Incidentally, there is a reason why I did not quote in the preceding example. You might have wondered why I did not save a few characters of typing by writing

(define my-pair '(73 . 88))

instead of

(define my-pair (cons 73 88))

The reason has to do with an aspect of "quote" that I mentioned briefly in another tutorial. The "quote" special form is supposed to create constant things, so when you try to change something that has been created by "quote", Scheme won't let you.

(define my-constant-pair '(73 . 88))    ;; ==> my-constant-pair
my-constant-pair                        ;; ==> (73 . 88)
(set-car! my-constant-pair 3)           ;; ==> ... the following error message ...

(73 . 88)
Problem: Attempt to apply "set-car!" to a constant pair.  (Resetting)
Top-level loop ...

Probably the most common use of these procedures is to modify the first pair of a list.

(define my-list (list 1 2 3 4))    ;; ==> my-list
my-list                            ;; ==> (1 2 3 4)

(set-car! my-list 137)             ;; ==> #t
my-list                            ;; ==> (137 2 3 4)

(set-cdr! my-list '(5 6))          ;; ==> #t
my-list                            ;; ==> (137 5 6)

Do you see what happened with the "set-cdr!" of that example? Just before it, the value of "my-list" was

(137 2 3 4)

which could also have been written using pair notation as

(137 . (2 3 4))

That is, the first pair of the list had 137 as its car, and "(2 3 4)" -- the whole rest of the list -- as its cdr. The "set-cdr!" operation replaced that cdr with a different object, which happened to be the list "(5 6)". So even it looked as if "set-cdr!" was somehow magically having an effect which somehow propagated itself all the way to the far end of the list, all that was really happening was that one part of a pair was getting set to a new value.

To see how this business of side effects might matter, consider the following example. Suppose you have a list bound to the variable "a" -- that is, "a" has a value that is a list:

(define a (list 'fee 'fie 'foe 'fum))    ;; ==> a
a                                        ;; ==> (fee fie foe fum)

Suppose some other part of your program needed to use that list, but for some reason decided to call it another name; that is, it used another variable, like this.

(define new-variable a)    ;; ==> new-variable
new-variable               ;; ==> (fee fie foe fum)

Be sure you understand what just happened. In the "define" statement, "a" was evaluated before being passed to the actual code that does the "define", so that "new-variable" was bound to the value "(fee fie foe fum)".

Now let's go back and apply "set-car!" to "a".

(set-car! a 'phooey)    ;; ==> #t       
a                       ;; ==> (phooey fie foe fum)

Now, what many people would describe what we have done very simply. They would say that we have "changed a", but that is just plain wrong! The "set-car!" procedure did not do anything whatsoever to "a". In fact, "set-car!" has never even heard of "a". The expression

(set-car! a 'phooey)

is a procedure application, which means that the argument "a" will be evaluated -- to the list "(fee fie foe fum)" that is presently the value of a -- before being passed to "set-car!"; "set-car!" receives a list as its first argument, and modifies that list -- it doesn't know about anything else. Let me restate: What the "set-car!" procedure application did was in effect to go to the list that is bound to "a" -- the list that is the value of "a" is what "set-car!" received as an argument -- and change that list itself.

Yet that very same list, at that one particular place in Scheme memory, is also the value of "new-variable. So when some other part of your program needs to know what "new-variable" is, or when you type it into the Wraith Scheme interpreter, the answer is

new-variable               ;; ==> (phooey fie foe fum)

If we think of this operation in terms of what "set-car!" really does, there is no problem. The list that is bound to "a" is the same list that is bound to "new-variable", and "set-car!" changes lists, so of course the change will show up the same way in both "a" and "new-variable". The problem is that people sometimes think of "set-car!" as somehow "changing a", and then they are surprised to see that "new-variable" has changed as well.

In any case, the point is that both "set-car!" and "set-cdr!" act by side effect. Their whole purpose is to make noticeable changes. We sometimes say that such procedures are called for side effects.

The R5 Scheme standard does not say what result is supposed to be returned by procedures that are called for side effects -- the R5 descriptions of all of those procedures say that the result is undefined. Wraith Scheme generally returns #t from procedures for which the R5 report states that the result is undefined. (It has to return something, and #t seemed to me to be as good a value as any.)

There are a couple of other procedures that also modify lists, or create variants of lists that already exist. The simplest of these is "reverse", which reverses a list and returns it.

(reverse '(1 2 3 4))    ;; ==> (4 3 2 1)

This procedure does not cause side effects. It builds up a brand new list to return, that contains the elements of the original list in the reverse order.

Another useful procedure is "append". Its main purpose is to connect lists together, but it has a few quirks, particularly with regard to side effects. Append takes one or more arguments, and all but the last must be proper lists, which is what you get when you start with the empty list and cons items onto the front, one at a time. The last item can be any kind of Scheme object, but for the moment, let's stick with proper lists. Here is an example of "append" used in the most conventional way.

(define list-a (list 1 2 3))                        ;; ==> list-a
(define list-b (list 4 5 6))                        ;; ==> list-b
(define list-c (list 7 8 9))                        ;; ==> list-c

(define all-three (append list-a list-b list-c))    ;; ==> all-three

all-three                                           ;; ==> (1 2 3 4 5 6 7 8 9)

I hope that is what you were expecting.

The first quirk has to do with side effects. Let's go back and change the three lists that we used as arguments for "append".

(set-car! list-a 'fee)    ;; ==> #t
(set-car! list-b 'fie)    ;; ==> #t
(set-car! list-c 'foo)    ;; ==> #t

The question is, what has happened to "all-three"? Let's find out.

all-three    ;; ==> (1 2 3 4 5 6 foo 8 9)

We know that "set-car!" changed the actual lists that we originally passed to "append", and we can see that one of those changes has showed up in what "append" returned, without our having to do anything else to make it happen, but the other two changes did not. What is going on?

The answer is that "append" -- like "reverse" -- does some copying. It makes copies of all of its arguments except the last one, then returns what it gets by first connecting up the copies, head to tail, and then finally connecting the tail of the connected copies to the head of the actual last argument. Thus when we make changes to things that we have previously passed to "append" as arguments, the only ones that show up in the result from append are the changes we made to the last argument.

I mentioned that the last argument to "append" does not need to be a list. What actually happens is this: If the last argument to "append" is a pair -- and that includes proper lists as a special case, since a proper list is just a pair whose cdr is the rest of the list -- then what is returned is the connected list of the copied other arguments, with the pair made to be its new last cdr. For example

(append '(1 2 3) '(4 5 6))    ;; ==> (1 2 3 4 5 6)

In this case, the "connected list of copies" would have been "(1 2 3)". Its last cdr -- its cdddr -- would have been the empty list, (). (The empty list is always the last cdr of any proper list.) When we start looking at successive cdrs of the result of the append, we find that the result's cdddr is no longer (), it is now (4 5 6), which was the last argument to the append, and which is a pair. Similarly

(append '(1 2 3) '(4 5 . 6))    ;; ==> (1 2 3 4 5 . 6)
(append '(1 2 3) '(4 . 5))      ;; ==> (1 2 3 4 . 5)

If the last argument to "append" is not a pair, that last argument is simply made to be the last new cdr of the connected list of copies.

(append '(1 2 3) 42)     ;; ==> (1 2 3 . 42)
(append '(1 2 3) '())    ;; ==> (1 2 3)

If "append" only has one argument, or if all of its arguments except the last one are the empty list, then "append" simply returns its last argument. What is returned in this case is not a copy, so that subsequent modifications to the last argument will show up when you look at the result.

(append 42)                           ;; ==> 42
(append '() 42)                       ;; ==> 42

(define xyzzy (list 11 12 13))        ;; ==> xyzzy
xyzzy                                 ;; ==> (11 12 13)
(define plugh (append xyzzy))         ;; ==> plugh
plugh                                 ;; ==> (11 12 13)
(define plover (append '() xyzzy))    ;; ==> plover
plover                                ;; ==> (11 12 13)

(set-car! xyzzy 'ignite)              ;; ==> #t
xyzzy                                 ;; ==> (ignite 12 13)
plugh                                 ;; ==> (ignite 12 13)
plover                                ;; ==> (ignite 12 13)

The "append" procedure does not strictly cause side effects, so its name does not end with an exclamation point. Yet if you forget that "append" does not copy its last argument, you may encounter confusion of the side-effect variety.

There are a variety of procedures to convert various other kinds of Scheme objects into lists, and vice-versa. We will encounter some of them in subsequent tutorials.

-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)