Wednesday, September 07, 2011

macros and literals

Macros often have associated auxiliary identifiers (sometimes called keywords or reserved words, although both terms are problematic in Racket). For example, cond has else; class has public, private, etc; unit has import and export.

The fundamental question is what constitutes a use of an auxiliary identifier, and there are two reasonable answers: symbolic equality and referential equality. By symbolic equality I mean, for example, that any identifier written with exactly the letters else is accepted as an else auxiliary form. By referential equality I mean any identifier that refers to (using the standard notions of binding, environments, etc) the binding identified as the else binding.

The advantage of symbolic equality is its simplicity. The problem is that it conflicts with other kinds of terms. The most common use of auxiliary identifiers is to distinguish special sub-forms that require special interpretation from standard forms such as expressions or definitions. For example, the first sub-term of a cond clause is either the auxiliary else or an expression; a class-body form is either a public form, a private form, ..., or a definition or expression. But a variable can be named else, and a function can be named private... except that with symbolic auxiliaries, in some contexts a reference to such a variable or function will be given a drastically different interpretation, just because of its name.

We like to think that names don’t matter, as long as they’re used consistently and don’t collide with other names in use. And, crucially, our notion of collision is based on binding. The problem with symbolic auxiliaries is that they are ghosts; they don’t collide with other bindings, but they still change the interpretation of an identifier in some contexts (but not others).

The alternative is referential auxiliaries, in which the special interpretation of the auxiliary identifier is tied to a binding. The virtue of this approach is that it relies on the standard mechanisms of scoping. An identifier cannot refer to both a variable binding and the else auxiliary in the same scope. Having a dedicated binding means the notion of else has an existence more concrete than just “a special interpretation that cond gives to identifiers spelled a certain way”; one consequence is that else can be documented using Racket’s binding-based documentation system.

Auxiliary bindings collide like (and with) ordinary bindings, and these collisions can be resolved using standard namespace management tools such as import renaming (rename-in). Alas, there is no way to protect a referential auxiliary identifier from being shadowed, but at least with it behaves consistently: shadow else and it no longer acts like else, but after all, shadow lambda and it no longer acts like lambda. I do wish there were a way to mark certain names unshadowable, though; I think it would be a net win for usability.

(Note that all of these arguments in favor of referential auxiliary identifiers also apply to syntax-parameters as a superior alternative to unhygienic binding. See my old blog post on the subject.)

Scheme uses referential equality for macro “literals lists” (used to recognize auxiliary identifiers). Unfortunately, it allows literals that don’t refer to any binding, creating another kind of ghostly identifier interpretation: it collides with nothing but can be overridden by any other binding that comes along.

When I created syntax-parse, I made it an error to specify a literal identifier that had no corresponding binding. This has its own difficulties, however, which I will describe in a separate post.

No comments: