Finding Things in Lists


One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.


The "Lists" tutorial described several ways to create lists, but told nothing about how to examine their content or get access to what they contain. Let's see how to do that.

The basic procedures for examining list content are the ones used to examine the cons cells of which they are made. Recall that a cons cell is simply a container for two Scheme data objects in a particular order, and that cons cells are also sometimes called pairs or dotted pairs. You can create a cons cell by using the "cons" procedure, like this:

You can also create one by quoting:

The dot in "(foo . bar)" is not actual stored data, it is simply a visual cue in the printed text to tell you that you are looking at a cons cell and not some other data item surrounded by parentheses. It is also the reason why cons cells are sometimes called "dotted pairs".

If you have a cons cell and would like to write Scheme code to get at the elements inside it, there are two procedures to help you. They are called "car" and "cdr". Each takes one cons cell as its single argument. Procedure "car" returns the first element of the cons cell, and "cdr" returns the second element.

These procedures don't do anything to the cons cell, it is still the same after you use them.

A programmer would say that "car" and "cdr" do not have side effects; that is, they leave no evidence that they have been used: They just return a value without messing anything up.

We now interrupt your tutorial for a brief historical note: The names "car" and "cdr" are not meaningful words or abbreviations, like "list" or "sqrt". They are in fact acronyms, but they pertain to a particular computer used in the early days of Lisp. It had several places in its electronic workings where items of data could be put when a program needed to access them quickly. Such places are often called registers. This computer had an address register and a decrement register. The programmers who wrote the Lisp program that ran on this machine set things up so that whenever the program had to work with a cons cell, it put the cell's first item into the address register and its second item into the data register. Hence the acronyms "car" and "cdr" respectively mean "Contents of the Address Register" and "Contents of the Decrement Register". The names became well-established, and we are now stuck with them, even though they don't have any obvious meaning that has to do with cons cells.

Many Lisp-class languages have replaced the names "car" and "cdr" with "first" and "rest", which makes sense for cons cells that have been used to build lists. Yet the term "rest" still does not make any real sense when you are talking about cons cells, so I think that "first" and "rest" are still not very good names for these procedures. To see why those names make sense for the cons cells that are the skeleton of a list, remember that you can build a list one item at a time, by consing things onto its front end. For instance, suppose we already have the list (2 3 4), and we want to add "1" onto its front end. We can do it this way.

You already know that the "cons" procedure returns a cons cell, and you have just seen how to build a list by using "cons", so you should not be surprised that when I have been telling you that a list is a fundamental Scheme data object, I have been lying. A list is really a bunch of cons cells stuck together in a certain way. It is the cons cell that is the fundamental data structure; lists are kind of a syntactic sugar built on top of cons cells, in that "list" is just a name for a special kind of collection of connected cons cells.

There is one exception to the rule that lists are bunches of cons cells. The empty list (), is not a cons cell, it actually is a separate kind of Scheme data object. Honest!

Anyhow, the point is that you can apply the "car" and "cdr" procedures to lists.

Now you can see why it would make sense to change the names of "car" and "cdr" to "first" and "rest" if you were only going to talk about lists. When you take the car of a list, you get its first element, and when you take the cdr of a list you get the rest of the list -- everything beyond the first element.

Do remember that it is an error to try to take the car or the cdr of the empty list, because the empty list is not a cons cell.

If you think "car" and "cdr" are confusingly meaningless, I am sorry to tell you that things get much worse. It should be clear to you by now that if I want, say, the third element of a list, I can take the cdr of the list, and take the cdr of that, and then take the car of the cdr of the cdr. That is

so that

Similarly, suppose you had a list in which some of the elements were also lists, such as

If for some reason I happen to want the third element of the list which is the first element of my-other-list, I can get it as

Programmers got tired of writing expressions that looked like

So they invented some more procedures with names related to "car" and "cdr" to make things briefer.

I didn't say "simpler", I said "briefer".

Scheme in fact has another twenty-eight such procedures. They all have names that start with "c", end with "r", and have just "a"s and "d"s in between. The total number of "a"s and "d"s allowed in any one procedure name can be either two, three or four. I am not going to write out the names of all these procedures -- you can find them in the Wraith Scheme Dictionary -- but for example, they include

but not

because that last one has a total of five "a"s and "d"s, and the people who put these things in had to stop somewhere.

These procedures all have something to do with car and cdr, but fortunately you do not have to remember what they all mean separately. There is an easy rule to figure it out when you need to. Suppose you are confronted with an expression like

and you have no idea whatsoever what that means. Here is what you do. First you strip off the leading "c" and the trailing "r" in the procedure name.

Next you separate out the remaining letters so there is plenty of white space in between them.

Put a "c" before each letter and an "r" after it.

Add a left parenthesis before each "a" or "d" (except for the first one, which has its own left parenthesis already).

Finally, add enough right parentheses at the end so that all the left parentheses are matched.

There you are. That is what "(caadar foo)" means. You can apply this rule backward, too, in order to get from

to

By using these procedures, we can deal with the examples earlier much more briefly.

Those expressions are not really any simpler than the originals, but at least they are briefer. Remember, without brevity, things would be too long. Seriously, some times you have to write code that involves a long series of cars and cdrs. At such times, using things like "caddar" can make your code more readable, because the sense of it does not get lost in the middle of messy lines like

Scheme has a several other procedures that can help access data within lists.

returns the nth element of some-list, which is very useful, but you have to remember that it is zero-based; that is, if you want the first element of some-list, you must use zero for the value of n. So for example

The name "list-ref" stands for "list reference".

There is another procedure, "list-tail" that takes successive cdrs of a list. It also is zero-based, and

returns "some-list" itself -- no "cdr"s have been taken. For additional examples

Note that if you think of these two new procedures in terms of "car" and "cdr", they aren't quite as much alike as you might hope. For example, "list-ref" with a second argument of zero takes "car" once, but "list-tail" with a second argument of zero doesn't take "cdr" at all, it returns the whole list.

Two very handy procedures do not so much as get information out of lists as get information about them. The first is "length", and it tells how many items there are in a list:

By the way, see if you can figure out why '() is quoted in the preceding example. Hint: What does a procedure application look like?

The second handy procedure is "list?", and it tells whether an object is a list or not. It returns either the true boolean, #t, if its argument is a list, or the false boolean, #f, if not.

Procedure "list?" is an example of a new kind of procedure, called a predicate. Predicates answer a yes-or-no question -- they return #t if the answer is yes, and #f if it is no. There is no rule that requires the names of predicates to end with a "?", but all of the standard Scheme predicates, and all of the ones that I have added to Wraith Scheme as enhancements, do have names that end with a "?".

Here is an interesting example.

Can you figure out why this is not a list? I have not strictly said before, but the problem here is that '(1 2 3 . 4) does not end with the empty list. That is, its last cons cell has "4" as its second element, not ().

It turns out that the description I gave in the "Lists" tutorial, of how to build lists by consing things onto the empty list and then continuing, encapsulates the formal definition of lists in Scheme. Formally, a list is either

This kind of definition, that tells how to build big things by combining little things of the same kind, is an example of a recursive definition. You will hear a whole lot more about recursion as you continue to learn Scheme, but for now just remember that what this definition means is that if you can't build it out of cons cells in the way I described, it isn't a list.

There is also a predicate to tell whether a Scheme object is a cons cell or not; its name is "pair?".

Every list except the empty list will make "pair?" return #t, because every list except the empty list consists of something consed together with another list, thereby making a pair.

Another predicate that is sometimes useful is "null?", which returns #t if, and only if, its argument is the empty list.


-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)


Wraith Face