Hygienic New Macros

One of a series of tutorials about Scheme in general
and the Wraith Scheme interpreter in particular.

Copyright © 2011 Jay Reynolds Freeman, all rights reserved.
Personal Web Site: http://JayReynoldsFreeman.com
EMail: Jay_Reynolds_Freeman@mac.com.

The title of this tutorial is based on the fact that there are two kinds of macros available in Wraith Scheme. One is the "hygienic macro" implementation that is required by the R5 Scheme standard. The requirement for hygienic macros is relatively new to Scheme: This tutorial provides an introduction to them.

Wraith Scheme also has an additional macro implementation installed, one that predates the introduction of hygienic macros into Scheme. This older system is non-hygienic, and is also "dirty" in the sense of "quick and dirty" -- it is about the simplest kind of Lisp macro implementation that you can imagine. Thus I describe it in another tutorial, titled "Dirty Old Macros". If you are curious about where macros are coming from, and if you would like to feel grateful for how much simpler hygienic macros are than what preceded them, I suggest you read that tutorial first.

The intent of a programming-language macro system is to transform source code before it is compiled or interpreted. Macro preprocessors in C, C++, and in various assembly languages typically use a simple substitution language to transform text strings. The Unix macro processor, "m4", does the same thing with a bit more power and flexibility. Macro processors for Lisp systems generally do something different: They usually deal with source code at the level of S-expressions, not text -- that is, what is to be transformed is something like the Lisp object

-- note the quote -- and not the text string

R5 Scheme's macro system is indeed of this latter kind. It provides a mechanism to transform a Scheme object before the regular Wraith Scheme evaluator gets a chance to interpret it. Thus if the symbol "foo" is bound to a Wraith Scheme macro of the older kind, then the macro "call"

(which looks just like a procedure application but is not, because "foo" evaluates to a macro, not to a procedure), results in "(foo something)" being somehow transformed into "something else", whereupon the Wraith Scheme interpreter gets to evaluate

The way that happens is that the entity that symbol "foo" is bound to -- the macro itself -- is in essence a lambda expression by another name. When the Wraith Scheme interpreter evaluates "foo" (as the first part of evaluating "(foo something)", it finds that "foo" evaluates to a macro, and on that basis it applies the lambda expression in question to the entire unevaluated form, "(foo something)", and then evaluates whatever the lambda application returns.

The basic idea here is really simple: When you use a macro, you get to use a lambda expression that was built up when you created the macro, to modify the expression that invoked the macro, before that expression gets evaluated.

In the R5 implementation of hygienic macros, the lambda expression that does the modification of the original Scheme expression is sometimes called a transformer. R5 Scheme has a special form -- actually itself a macro -- called "syntax-rules", that is used to construct the transformer you want.

Let's look at that again in a bit more detail. The Wraith Scheme evaluator sees the expression

It first evaluates "foo" -- recall that the evaluator always has to evaluate the first item in such cases, in order to know what to do with the rest of the arguments. The result is a macro, and on that basis the evaluator knows what to do. The macro is specially-identified lambda expression. The evaluator applies that lambda expression to the entire original expression, "(foo something)", without evaluating any of it. That is a little hard to write out with the notation we have been using, but the following will suggest what is going on:

The lambda expression can do whatever it wants to the original expression -- it can transform it in any way. It can even ignore the expression entirely, and return something else. Ultimately, though, it must return something:

(Incidentally, "something-else" may well be a list or a more complicated structure; I am not being very specific about what it is because it can be anything you like.) The Wraith Scheme evaluator does not pass "something-else" back to the original invoker of "(foo something)". Instead, the evaluator evaluates "something-else", and passes the result of that second evaluation back as the result of invoking macro "foo". If we put that all together, what gets passed back as the result of the macro application "(foo something)" is in essence

Hygienic macros support the basic facility for transforming Scheme expressions before evaluating them; furthermore, they have several special features that distinguish them from earlier macro systems. To that end, the lambda expressions that the hygienic macro system builds up as transformers are a good deal more complicated than the ones most users of earlier macro systems would have built up by hand. It probably won't be obvious what some of the underlying issues are without further discussion (which will follow), but let's at least list some of the enhancements of hygienic macros to start with.

Probably the best way to illustrate the simplicity of use of hygienic macros is by example. For contrast, let's use the example discussed in the tutorial "Dirty Old Macros" for comparison. Our task was this:

Suppose you are tired of writing assignment statements in the form "(set! foo bar)", and would rather write them as "(assign bar to foo)", so that after you had done so, you could then write

With the old macro system, the macro definition required to set up "assign" is this:

That is a bit confusing to look at, even if you use backquote, comma, and "cadddr" every day. On the other hand, with hygienic macros, the required definition looks like this:

Even without explanation, it doesn't take a whole lot of imagination to figure out that "(assign binding to variable)" is how the macro is supposed to be used, and "(set! variable binding)" is how it is supposed to expand -- what it is supposed to do. Now, let's reformat that example and walk through the details.

The first part of that reformatted definition,

is just like the first part of an ordinary "define" statement -- which would be something like

In both cases, you are defining some kind of thing in the section indicated by "...", and then giving it a name. Using "define-syntax" instead of "define" tells Scheme that you are going to be defining a special kind of object for use in the hygienic macro system, the kind of object returned by "syntax-rules". That object will become the transformer that is bound to "assign". If you were to evaluate a syntax-rules expression all by itself and inspect the result (with "e::inspect") you would find it to be a lambda expression; that is, a transformer is just a particular kind of lambda expression.

The "syntax-rules" part of "define-syntax" looks like this:

That is a Scheme expression all by itself, and confusingly enough, "syntax-rules" is itself a macro -- we are using a macro to create more macros. (Perhaps you are wondering if there is some kind of chicken-and-egg problem here -- how can we use a macro system to create the macro that we must use to create macros? Let's not go there ...)

The first item past "syntax-rules" is a list of symbols that the macro system should take as literal values -- the evaluator is never going to need to evaluate them. Thus in

the symbol "to" is just there to make the whole expression read more like English text; it has no actual function in the expansion of the macro or in the evaluation that produces the assignment.

The second item past "syntax-rules" is a list of two items, each of which happens to be a list in the present example. The first list shows a form of Scheme expression that the macro is designed to deal with, and the second list shows how the macro should alter a Scheme expression that matches the given form.

The "syntax-rules" macro actually allows for more than one such list of two items, in case there is more than one possible form of the macro that we are defining. We will see an example of such a macro shortly, but for the moment let's stick to our simple one.

Now we have to consider one matter most carefully: Every item in each of the two lists is a symbol, and could in principle be the name of something -- it could be bound to a value. I hope I have made enough fuss about bindings and environments in other tutorials for you to understand that it is a big deal what environment is used in looking up any values that may be bound to those symbols.

Part of the problem is solved for us: We know that "assign" is the name of the macro, and the hygienic macro is smart enough to figure that out as well. It will never need to look up the value for "assign", it already knows it. Furthermore, the short list "(to)", that was the first part of the "syntax-rules" expression past the name "syntax-rules" itself, tells the system that "to" is not to be evaluated: It is just syntactic sugar that is part of the macro use for some reason like readability or human understanding -- it is not required to have a value in the first place.

All the other symbols in the first list of the two -- the one that indicates the form that a use of the macro is supposed to follow -- are what are called free variables. The system does not look them up in any environment; rather, it matches them to the actual quantities that occur in any particular use of the macro. They are kind of like arguments to the macro, and they are what I meant when I used the word "arguments" loosely, a couple of paragraphs ago. It might be a little confusing to use the word "argument" in any context outside of a procedure application, so I will avoid using it for macros now that I have explained what I was talking about. (The use is tempting, because many macro calls look like procedure applications, and procedures do have arguments.) Anyway, for example, if the "assign" macro were used like this:

Then, since the form of the macro is supposed to be

the expansion of the macro will be done on the basis that "binding" means "42", and "variable" means "my-favorite-number". That is, the code that gets generated, to be evaluated, for this use of the macro, will start by using

as kind of a template, will take "variable" to be "my-favorite-number" and will take "binding" to be "42". Thus the code that finally will be evaluated will be:

Note that even if there are bindings to "variable" and "binding" in some environment that the macro has access to, they will have nothing to do with how the macro is expanded in this case. We might have done all kinds of things with symbols named "variable" and "binding" while we were working up our macro definition and using it, but they would makes no difference to what the macro does, as in the following messy example.

How that works isn't magic, it is just some clever and systematic renaming of things by "syntax-rules". I hope you can see that without the renaming, there might be a problem with the symbols "variable" and "binding" used in the macro definition somehow getting confounded with the ones used at top-level. Avoiding that kind of confusion is one of the benefits of hygienic macros.

There is one more symbol our "assign" macro has to deal with; namely, "set!". The rule here is that symbols that have to be looked up are looked up in the environment in which the macro was originally defined. Quite likely "set!" is the same in any environment, but the issue of where to look things up does matter. Suppose we had defined a macro to add the value of a variable called "delta" to a number. (We probably would more often do this with a procedure rather than with a macro, but suppose ...) We might have written:

If that is all we do, we are in trouble, because we haven't defined "delta" yet:

Now we can go ahead and define "delta" in the same environment, and then the macro will work:

Now suppose we try to evaluate the macro in some other environment, that has a different delta defined:

You see, we still get the value of delta from top level -- 13 instead of 99. That is what "add-delta" would do if we had defined it as a function, of course, and the point here is to preserve in the hygienic macro system the property of Scheme that variables bindings are determined by lexical scope, in this case the lexical scope of the place where the macro was defined. If we had been using an old-style macro system that merely expanded "(add-delta 42)" into the text "(+ 42 delta)", without doing any clever renaming and without keeping careful track of what environment to use, we might have gotten a value of delta of 99 instead of 13.

Here is an example of "define-syntax" that has two possible patterns that a macro invocation might follow. This is the actual code used in Wraith Scheme to define what "if" does:

Wraith Scheme defines "if" in terms of a lower-level primitive, "c::if", which is hard-coded -- built-in. The only difference between "c::if" and "if" is that "c::if" must always have something explicit to do as an alternative if the predicate -- the test of the if -- turns out to be false. On the other hand, plain "if" is allowed to be used a form in which there is no explicit alternative, in which case it returns #f if the test evaluates to false. The first form of "if" given in this use of "define-syntax" is the one with no alternative; it expands out into "c::if" with an explicit alternative of #f inserted as required. In the second form, "if" already has an explicit alternative, and the clauses of "if" are merely copied straight over into "c::if".

There is no limit to the number of macro forms you may have in any one instance of "syntax-rules". After the list of non-evaluated symbols you may have as many two-element lists as you like, each list comprising a macro form and its expansion, provided there is at least one such two-element list.

There is a great deal of powerful syntax built into "syntax-rules", to allow you to write macros that recognize complicated expressions and expand them in complicated ways. For example, suppose we wanted to reimplement the procedure "list" by another name -- "lyst" -- using a macro created with "define-syntax". We might write that macro like this:

Here there are given two forms of the "lyst" macro: The first, "(lyst)" allows for an empty list:

Look at the second form, though. It uses an ellipsis -- "..." -- to indicate "more of the same", just as you might use one in ordinary written language. The meaning of the second form is that "lyst" macros may have one or more arguments, one after another, and that what will be generated as the expansion of that form of the macro is a garden-variety "list" of all the arguments, in the same order.

I hope it is clear that the "..." syntax is powerful. (When I was an undergraduate at the California Institute of Technology, I heard the late Richard P. Feynman, Nobel Laureate in Physics, say that he thought the ellipsis was the most powerful mathematical notation yet invented. If so, it speaks very well of R5 Scheme to have included it in a constructor for macros.)

One limitation of "define-syntax" is that it cannot be used anywhere -- you may only use it, more or less, in the same places where you could use "define". Fortunately, R5 Scheme defines a variety of other forms, based on "syntax-rules" in the same way that "define-syntax" is, to allow the use of macros in other places. The two other forms that are present in R5 Scheme are "let-syntax" and "letrec-syntax". You can learn more about them in the Wraith Scheme Dictionary or in the R5 report, but here are a few simple examples of how they are used:

I will close this tutorial with some final thoughts about macro use in general. There is an old saying among computer programmers: "It is possible to be entirely too clever with macros." In particular, one of the wonderful virtues of the forms based on "syntax-rules" is that they make it very easy to add entirely new syntax to the Scheme programming language. Alas, one of the terrible vices of the forms based on "syntax-rules" is that they make it very easy to add entirely new syntax to the Scheme programming language: You, and anyone else who uses the new syntax must remember what it means and how to use it, and if you are its developer, you may want to worry about testing it thoroughly, documenting it, and making sure it does what you want, and only what you want. That is a lot of work, and a lot of things to forget if you don't do the work.

The doom and pride* of macro systems is that they allow you to change the nature of the language you are dealing with, with ease. You must decide how to make the tradeoff between too little use of macros, and too much, in the way that is best for you.

-- Jay Reynolds Freeman (Jay_Reynolds_Freeman@mac.com)

* Surely, Kipling had a well-known preprocessor in mind when he wrote, "We have fed our C ...".

Wraith Face