Lexical Scope

One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.

We have already talked about lexical scope in several previous tutorials, notably in the one titled "Local Variables". Let's review the basic idea of scope before we talk about it again.

The scope of the binding of a particular value to a particular variable is the range within the program in which an attempt to look up the value of that particular variable will yield that particular value. The wording here is necessarily a little vague, because different programming languages use different dimensions when they are talking about "range".

For example, some languages use ranges in time: There is in effect only one value of a variable in existence at any one moment; any attempt to look up the value of that variable gets that value, and any operation that changes the value of a variable, like "set!" or "define", changes its value for everybody. This kind of scoping is sometimes called dynamic scope, though I think it might better be described as temporal scope.

In a dynamically scoped language, there are no local variables, and all of the examples of scoping that I have provided in previous tutorials would have to be worked out differently. Fortunately, Scheme is not dynamically scoped, and I think the examples I have provided are correct. I mentioned dynamic scope here to demonstrate that there really is something more than pedantry to talking about how scope works in a particular language; different kinds of scope rules make an enormous difference in how the language works, and in how you program in it.

Scheme does not use the temporal dimension in talking about scope, it uses a spatial dimension. Fortunately, Scheme is not concerned with spatial dimensions in the everyday sense of geography; scope in Scheme has nothing to do with whether your code was written in Palo Alto, Perth, Peshawar, or Pinsk. The spatial dimension in question is forward and backward in the stream of text that comprises your source code. The word "lexical" means "having to do with text". You can tell all you need to know by looking at the places in your source code where the various bindings of values to a variable are made, and where in your source code the values of that variable are used. Scheme is lexically scoped.

Actually, there is one waffle in that definition; it has to do with variable bindings that are made at top level in the Scheme interpreter. Those bindings are not lexically scoped in the technically precise sense of the word, as we will see in a moment.

The easiest way to figure out the scopes of variable bindings in Scheme is to remember that essentially all of the complicated expressions in Scheme -- and by "complicated", I mean "the ones with parentheses around them" -- are syntactic sugar for lambda expressions. The rules for variable scope in lambda expressions are simple: The only new bindings that a lambda expression can create are to its argument variables, and the scope of those bindings is precisely the body of the lambda expression in question. Those bindings can be changed within the body of the lambda expression, perhaps by "set!", but the place where they are created is where the argument variables are bound to values as the lambda expression is being used in a procedure application.

Of course, lambda expressions can be nested. One lambda expression may be written down within the body of another, like this.

((lambda (x y) ((lambda (a b) (+ (* a x) (* b y))) (sin x) (cos y))) 42 137) ;; ==> 7.2867524079576

What this lambda expression does is calculate the quantity more conventionally written as

x sin(x) + y cos(y)

The outer lambda calculates x sin(x) and y cos(y), and then passes them as arguments a and b respectively to the inner lambda, which also has access to the outer lambda's arguments x and y, which it needs to do the whole calculation. Truth is in the details, so let's see how that works.

In such a case, if we want to investigate what binding to use for a variable that occurs within the body of the inner lambda expression, the first thing we must do is look to see whether that variable is one of the argument variables of the inner lambda expression. If so, that is the binding to use. If not, we look at the argument variables of the enclosing lambda expression and see if the variable is one of them. If so, we use that binding, and if not, we look and see if there are any more enclosing lambda expressions with argument variables to inspect. Eventually, we run out of enclosing lambda expressions, and must look for variable bindings at top level.

The main problem with this approach is that most Scheme programmers have never had to implement a Scheme system, and so have no clear idea of just how all the special forms expand into lambda expressions. For example, the simplest form of "begin" is syntactic sugar for the evaluation of a lambda expression of no arguments, whose body is the sequence of expressions in the body of the "begin". Yet there is a fancier version of "begin" which allows "define" statements at its start, and those define statements are implemented as if the "begin" were implemented as a "letrec" with the same variable bindings. Then you have to remember that "letrec" is implemented as a lambda expression with internal "set!"s, and by that time you are probably thoroughly confused.

I think the only way out of the confusion is to study the R5 report and such other documentation about Scheme as you can find. There is no denying that some of this material is complex, but I suggest that understanding it will contribute greatly to your ability to write powerful programs that do wonderful things.

The waffle about Scheme being lexically scoped, that has to do with variable bindings at top level, is as follows. Suppose you write and evaluate the following procedure definition at top level, and suppose further that at the time you write and evaluate it, you have not yet defined a top-level variable called "frobnitz".

(define (foo n) (+ frobnitz n)) ;; ==> foo

At this point, you decide to try to figure out what variable binding "foo" will use for "frobnitz", and -- surprise -- there isn't one. Does that mean that procedure "foo" is predestined to fail, and that it will give you some kind of variable-not-found error message? Not at all, because in Scheme, top-level variables have dynamic scope. You can save the day by providing a top-level definition for "frobnitz" after you have defined "foo" but before you get around to evaluating it.

(define frobnitz 95) ;; ==> frobnitz (foo 42) ;; ==> 137

Note the contrast in what we know about the two variables bindings used in "foo". The binding of "n" that is involved in "foo" is the one made when some value is passed to "foo" as an argument. There may be many other procedures that also use variables named "n", but the only "n" we need to worry about in understanding what "foo" does is the one that is in used in the lexical scope that is the body of "foo". Procedure "foo" gets its value of "n" from that binding, and if "foo" should have happened to change "n" -- perhaps by a "set!" operation -- then only that one binding would change -- the others, in all of those other procedures, would be unaffected.

In contrast, the binding for "frobnitz" is not even known when "foo" is created. Because of the way Scheme works, we can assume for purposes of analysis that there will be such a binding made before "foo" is ever used, and that the binding will be in the top-level environment, but Scheme will have to go and look it up there when the time comes, and must be prepared to report an error if no such binding is present. Furthermore, this one binding of "frobnitz" is accessible at top level, and is also accessible within the body of any other procedure, unless that other procedure has another variable named "frobnitz" in its own lexical scope, getting in the way. Anyone, anywhere, can modify the top-level binding of "frobnitz", and everybody else will see the new value.

The business about another variable getting in the way works like this. Suppose we define a procedure that has a variable named "frobnitz":

(define (bar n frobnitz) (+ frobnitz n)) ;; ==> foo

Now if we proceed as above, and evaluate:

(define frobnitz 95) ;; ==> frobnitz (bar 42 123) ;; ==> 165

Procedure "bar" used its own value for "frobnitz" -- it used the binding created when the value 123 was bound to the "frobnitz" argument named "bar". The value of "frobnitz" that had been defined at top level did not matter inside of "bar"; the local binding kept bar from "seeing" the top-level one. In such a case, the more local binding is said to shadow the less local one.

It is interesting to note that probably the most common variable look-ups in a Scheme program are of top-level variables that never change. For example, here is our old friend the hypotenuse function.

(define (hypotenuse x y) (sqrt (+ (* x x) (* y y)))) ;; ==> hypotenuse

Those lines of code involve two procedure argument variables, namely "x" and "y", and one newly-defined top-level variable, "hypotenuse", but they also involve top-level variables "define", "sqrt", "+" and "*", and it is unlikely that you will ever want or need to change the values to which they are bound.

One of the nice things about lexical scope is that software tools that need to figure out how a program or procedure works can do so by static analysis. Such tools include compilers, for example. It is useful for a compiler to be able to draw firm conclusions about program behavior from something that doesn't change, like the source code. (I don't mean that you never change source code, but I do mean that if you do, you probably recompile it before trying to use it again.) It would make the compiler's job much more difficult if it had to keep track of what other code had run before the code it was trying to compile got to run. Yet in a dynamically scoped language, it might have to do just that, in order to figure out what variable bindings were in use. (Actually, a compiler for a dynamically scoped language probably could not draw many conclusions about what variable bindings were going to be at run-time, it would have to generate code that simply looked them up.)

-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)

Wraith Face