Lists

One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.

"Lisp" stands for "LISt Processing", so it should come as no surprise that lists are basic data structures in all Lisp-class programming languages, including Scheme. (Actually, many people will tell you that "lisp" stands for "Lisp Is Superfluous Parentheses". Don't pay them any attention, and they will go away -- I mean the people, not the parentheses ...)

If you have read the earlier tutorials in this series, you already know what lists look like, even though I have not mentioned them explicitly. Informally, a list looks like a bunch of things surrounded by balanced parentheses. For example, here is a list of three items:

(1 2 3)

In that list, the first item is the number 1, the second is 2, and the third is 3.

Lists can be nested, and can contain different kinds of items. Here is a list containing three items, the first two of which are themselves lists:

(("cat" "food") (1 2 3) #t)

In that list, the first item is a list of two strings, the second item is a list of three numbers, and the third item is the true boolean, #t.

Here is a rather common list, an empty list.

()

Some lisps are set up internally so that there is only one empty list in existence, and what appear to be multiple copies of it are really all just references of some kind to a single instance of the empty list. Thus lisp enthusiasts commonly speak of the empty list, rather than an empty list. Wraith Scheme does not do that -- it has lots of empty lists floating around -- but nevertheless, I myself will regularly speak of "the empty list", as if there were just one. (That makes a certain kind of sense, because Wraith Scheme treats all instances of the empty list as identical.)

Now you know what lists look like when Wraith Scheme types one out, but you can't just type in a list in the same form and have things work correctly. If you did try to type "(1 2 3)" into Wraith Scheme, you would get this error message:

(1 2 3)

1
Problem: Attempt to apply a non-procedure.  (Resetting)
Top-level loop ...

If you remember the tutorial titled "Scheme as a Calculator", you should be able to understand what is going on. Wraith Scheme sees the left parenthesis and thinks you are typing in a procedure application; it is expecting the first thing past the left parenthesis, which is "1", to be the name of a procedure -- something like "+" or "sqrt". Unfortunately, "1" is not the name of a procedure, it is a number, so Wraith Scheme doesn't know what to do, and blames you for making a mistake.

By the way, programming languages are always blaming hapless programmers for making mistakes. You will be gratified to learn that at least some of Wraith Scheme's error messages blame me -- the developer of Wraith Scheme -- for some kind of mistake I may have made in creating the program. If you ever see such an error message, please EMail me at Jay_Reynolds_Freeman@mac.com, and tell me about it.

The piece of the puzzle that is missing here, is that Scheme has a special procedure whose purpose is to create lists. If you want to create the list "(1 2 3)", you should probably use that procedure to do so. The procedure is named "list", and it takes as many arguments as you like. As is usual with procedure applications, the arguments are all evaluated and then passed to the "list" procedure itself, which builds a list and returns it. For example:

(list 1 2 3)                                ;; ==> (1 2 3)
(list (list "cat" "food") (list 1 2 3) #t)  ;; ==> (("cat" "food") (1 2 3) #t)
(list)                                      ;; ==> ()

This is a particularly good time to be sure you understand the difference between a procedure application and its result. Thus

(list 1 2 3)

is a procedure application: It is an instruction to Scheme to do something; namely, to make a list of three numbers, by using a Scheme procedure whose name is "list". The result of that action is the desired list itself, and when Scheme prints it out for you to look at, what you will see will be

(1 2 3)

Of course, you can use lists as Scheme data objects in other procedures and special forms, too. For example:

(define my-list (list 2 4 6 8))    ;; ==> my-list
my-list                            ;; ==> (2 4 6 8)
(define a 42)                      ;; ==> a
(set! a my-list)                   ;; ==> #t
a                                  ;; ==> (2 4 6 8)

In the next to the last line above, the argument "my-list" of the "set!" special form was evaluated, so that "set!" got (2 4 6 8) as the value to assign to "a".

If you had to create every list you were ever going to use by means of the "list" procedure, your typing could get pretty confused, pretty fast.

(list 1 (list 2 (list 3 (list 4)) (list 5) (list)) (list 6 ( list 7 8)))
    ;; ==> (1 (2 (3 (4)) (5) ()) (6 (7 8)))

Fortunately, Scheme has a built-in bit of trickery that will reduce the verbosity. It is called quoting, and the idea is that if you prefix any expression you type into Scheme with a single quote -- an instance of the character ' -- Scheme will not attempt to evaluate what follows the single quote, it will just return whatever followed the quote without any further processing.

Note carefully, that bit of punctuation is a plain-vanilla single quote, not a double quote, not a "back-quote" (`), and not one of the fancy tilted characters that many word processors let you use to achieve professional-looking typography. Note also, that the single quote must be right next to the expression you are quoting -- there is no white space allowed in between. By this means, when you type

'(1 2 3)

into Scheme, Scheme will not print an error message, it will simply return the list "(1 2 3)" itself, so that what you see printed out will just be

(1 2 3)

without the quote. Similarly,

'()    ;; ==> ()

and

'(1 (2 (3 (4)) (5) ()) (6 (7 8)))    ;; ==> (1 (2 (3 (4)) (5) ()) (6 (7 8)))

That last is a lot simpler than the earlier way to create the same list, with all the repetitions of the word "list" in it.

You can quote any Scheme expression, simple or complicated. Some expressions, like numbers and strings, do not change when they are evaluated, so that quoting them doesn't make much difference, but you can quote them anyway.

'23        ;; ==> 23
'"meow"    ;; ==> "meow"

Now look at this:

'define    ;; ==> define
'set!      ;; ==> set!
'xglerb    ;; ==> xglerb
'sqrt      ;; ==> sqrt

The quoting mechanism provides a way for Scheme to deal with the name of something without trying to evaluate it. If you don't understand that, type

sqrt

into Wraith Scheme, with no quote and no parentheses, just like that. Wraith Scheme will type back

#<Built-in procedure "sqrt">

to show you that the name "sqrt" is associated with a Scheme procedure.

Incidentally, the technical term for a name used that way is a symbol: define, set!, xglerb, sqrt, and + are all symbols. In that last sentence I did not put quotation marks around the symbols, so that you would not confuse them with strings. Let's be perfectly clear on that matter.

"sqrt"

(with quotation marks) is a string, but

sqrt

(with no quotation marks), is a symbol. Note also that

"sqrt"

evaluates to itself -- all strings do that -- whereas

sqrt

evaluates to a procedure, namely

#<Built-in procedure "sqrt">

and

'sqrt

with a quote, is a Scheme expression that evaluates to the symbol

sqrt

Quoting is very powerful, and if you are a little confused, don't worry too much about it yet -- there is a whole separate tutorial on the subject -- but let's see what else you can do with quoting and lists. Consider the following:

(list '+ 1 2 3)

This is a Scheme procedure application -- a command for Scheme to do something -- using the procedure "list". If you type it into Scheme (be sure to quote the "+"), it will evaluate to the Scheme object

(+ 1 2 3)

which is a list, but this particular list also happens to be a procedure application, for the addition of three numbers.

Do you see what is going on? We have used a procedure application -- a Scheme command -- to construct a Scheme data object -- a list -- which is also another Scheme command. In more conventional terms, we have executed a little tiny program in Scheme -- the one-liner, "(list '+ 1 2 3)" -- to construct a data object which is itself another Scheme program.

One of the important features of Lisp-class programming languages in general is that there is no sharp line between programs and data; every program is just a list, like '(+ 1 2 3), and lists are perfectly ordinary data objects in Lisps. I haven't talked about how to cause Scheme to evaluate an expression by any other means than typing it into the interpreter, but there are such means, and with them, it is possible for you to write a Scheme program that will write another Scheme program all by itself, and then execute it.

I think that is exciting, and what is more, it is one of the reasons why Lisps have been so widely used in artificial intelligence research. Metaphorically, a Scheme program can contemplate and decide upon a possible action; that is, it can build up a program as data, then analyze it and perhaps modify it if necessary. When it is done, it can go ahead and run the program it has just written. A Scheme program can even modify part of itself, if necessary. (Fair warning: Programs that modify themselves are often very difficult to design, understand, and maintain.)

Oh, all right, let me show you one way to evaluate an expression. Scheme has a procedure called "eval" that evaluates things, but there are some complexities in its use that involve matters I have not presented yet, so let's use one of Wraith Scheme's enhancements that pushes the some of the complexity under the rug for a while. The enhancement is a procedure named "e::cons-with-continuation", and you use it like this:

(define my-procedure (list '+ 2 2))         ;; ==> my-procedure
my-procedure                                ;; ==> (+ 2 2)
(e::cons-with-continuation my-procedure)    ;; ==> 4

The last few lines are a microcosm of a certain approach to artificial intelligence. Incidentally, I must stress that "e::cons-with-continuation" is not part of R5 Scheme, it is an extra procedure that I added to Wraith Scheme because I found it convenient. Wraith Scheme has many enhancements; in particular, there are a lot of procedures whose names start with "e::" -- they are all enhancements. (Think "e" for enhancement.) There is a whole section of the Wraith Scheme Dictionary devoted to them.

Ahem! We were talking about lists. There is one more thing about lists that I want to show you in this tutorial. The "list" procedure is good for constructing a list when you are prepared to provide all of the list items at once, but what if you need to construct a list one item at a time? There's a procedure for that. Its name is "cons", which is short for "construct".

The "cons" procedure actually has wider applicability than just to lists: It can be used to join any two Scheme objects together. It returns a new kind of Scheme object as a result, something called a cons cell, a pair,or a dotted pair -- we will see where the latter term came from in a moment. You can think of a cons cell as a simple container for two things -- that would be a pair of things -- in a specific order. If you are familiar with formal data structures in the sense of computer science, a cons cell is just a node in a binary tree, or perhaps a tuple of two objects. You use "cons" like this:

(cons 1 2)    ;; ==> (1 . 2)

The dot in the printed result is not part of the data structure itself, it is just punctuation so that you will know that you are looking at a cons cell and not a list, and so that you can more easily identify the two objects that the cons cell contains. The dot is also why cons cells are sometimes called dotted pairs.

It turns out that lists are made up of cons cells, in the following way. If you cons any object with the empty list, the list being the second argument to the "cons" procedure, you get a list of that contains just one item -- the object you consed in.

(cons 5 '())    ;; ==> (5)

Scheme could have printed that out as

(5 . ())

That expression means exactly the same thing as

(5)

but as a convention for ease of reading, Lisps in general and Scheme in particular do not print dots when the second item of a cons cell is a list.

Continuing, if you cons any object with a list of one item, again with the list being the second argument to "cons", you get a list of two items:

(cons 4 '(5))    ;; ==> (4 5)

We could have gotten the same result from

(cons 4 (cons 5 '()))    ;; ==> (4 5)

and in either case, if Scheme were printing out all the dots it would have printed

(4 . (5 . ()))

That last line is messy enough so that you can see why there is the convention about not printing dots when the second item of a cons cell is a list. Notwithstanding, you should notice that (4 5) is not the same as (4 . 5) -- the fact that the former could have been written as (4 . (5 . ())), whereas the latter is merely (4 . 5), should make the difference in structure clear.

Now would be a good time to alert you to the prospect of making mistakes like confusing

(1 . 2)

with

(1 .2)

The former is a cons cell whose first element is the number one and whose second element is the number two. The latter is a list whose first element is the number one and whose second element is the number two-tenths. That one blank space between the "." and the "2" makes the difference. As further clarification, note that

(1 .2)

could also be written in dotted-pair notation, with extra dots, as

(1 . (.2 . ()))

whereas

(1 . 2)

has all the dots that it is ever going to need.

The business of building lists by consing can go on indefinitely. Every time you cons, you are in effect stuffing a new item into the front end of the list.

(cons 1 (cons 2 (cons 3 (cons 4 (cons 5 '())))))    ;; ==> (1 2 3 4 5)

The list that you get from

(cons 1 (cons 2 (cons 3 (cons 4 (cons 5 '())))))

has exactly the same structure as the one you get from

(list 1 2 3 4 5)

In fact, Wraith Scheme uses "cons" internally when it is performing the "list" procedure, just as in the examples above, so the structures really are the same.

Building up a list all at once by consing is a silly thing to do: Using "list" takes less typing and has less chance of missing a parenthesis or two somewhere. Yet as I said before, you aren't always in the position of building a list all at once. Some times your program will start to construct a list, then do something else for a while, then add another item to the list, and so on. For those situations, the "cons" procedure is ideal.

There are lots of procedures to access the individual items of a list or a cons cell, and to break a list into smaller pieces or change its internal structure, but I will save those for another tutorial.

There are a few things about how the Wraith Scheme program displays lists that you might want to know now. If you tell Wraith Scheme to print a very long list, after a while it will stop printing items and will just print an ellipsis -- three dots in a row -- to tell you that there are more items present.

(list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25)
    ;; ==> (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... )

Similarly, if you tell Wraith Scheme to print a deeply nested list -- one with lots of left parentheses all together -- it will stop printing left parentheses after a while, and print an ellipsis.

'(((((((((((((((((((((((((1 2 3) 4 5) 6 7) 8))))))))))))))))))))))
    ;; ==> ((((((((((( ... )))))))))))

If you want to see the missing items, there are some enhancements to change the points at which Wraith Scheme throws up its paws and prints the ellipsis. They are the procedures "e::set-list-print-length!" and "e::set-list-print-depth!". You can look them up in the Wraith Scheme Dictionary to see how they work. The current values of the list print length and depth are shown in the Wraith Scheme Instrument Panel -- the drawer that dangles below the Wraith Scheme window -- on the third line down from the top. There are similar print lengths and depths for vectors, too, but we haven't gotten to vectors yet.

-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)