one racketeer

Friday, September 09, 2011

syntax-parse and literals

In my last post, I talked about macros and referential auxiliary identifiers—what we usually call a macro’s “literals.” Scheme macro systems only get it half right, though, because while they compare identifiers using referential equality (i.e., using the free-identifier=? predicate), they allow literals to refer to nonexistent bindings. While the comparison is well-defined via the definition of free-identifier=?, at a higher level the idea is nonsensical.

In contrast, syntax-parse requires that every literal refer to some binding. (I’ll sometimes refer to this requirement as the is-bound property for short.) This requirement is problematic in a different way. Specifically, this property cannot be checked statically (that is, when the syntax-parse expression containing the literal is compiled).

That might strike you as bizarre or unlikely. After all, you can easily imagine checking that a syntax-rules macro, say, satisfies the is-bound property. But in Racket, not every macro uses syntax-rules, and—more importantly—not every bit of syntax-analyzing code is a macro. And both of these facts have to do with phases.

macros and literals

Macros often have associated auxiliary identifiers (sometimes called keywords or reserved words, although both terms are problematic in Racket). For example, cond has else; class has public, private, etc; unit has import and export.

The fundamental question is what constitutes a use of an auxiliary identifier, and there are two reasonable answers: symbolic equality and referential equality. By symbolic equality I mean, for example, that any identifier written with exactly the letters else is accepted as an else auxiliary form. By referential equality I mean any identifier that refers to (using the standard notions of binding, environments, etc) the binding identified as the else binding.

asynchonous execution for databases, using places

I added asynchronous execution to my database library yesterday using Racket's places. The coding part took about an afternoon and part of an evening. The new code is a bit less than 300 lines, most of which is boring serialization and deserialization code, some of which will go away soon.

My database library contains two wire-protocol connection implementations (for PostgreSQL and MySQL) and two FFI-based connection implementations (for SQLite and ODBC). The wire-protocol implementations are more work, but they just use I/O ports, and Racket handles I/O pretty well. On the other hand, the entire Racket VM stops during an FFI call, because Racket threads are green threads.

Having all threads stop execution for FFI calls isn't much of a problem if the FFI calls are all short. If the FFI call is "execute this SQL statement," on the other hand, that can cause serious problems with responsiveness. (Of course, it still depends on how long the SQL statement in question takes to execute.)

ODBC provides the ability to execute some operations asynchronously—in theory. In practice, of all the drivers I had available on my development machines, only the DB2 driver actually supported asynchronous execution. Furthermore, the way one performs an asynchronous call—repeatedly calling a function with identical arguments until it returns something different—plays poorly with GC'd languages, where keeping memory locations identical from call to call requires more effort than it does in, say, C. In short, ODBC's asynchronous execution doesn't solve the interactivity problem.

Racket actually has multiple kinds of concurrency. In addition to (green) threads, Racket also has "futures" (true concurrency if it's not too much trouble, everything shared) and "places" (true concurrency for sure, almost nothing shared, message passing). You can't send higher-order data (functions, objects, etc) between places (rather, you would have to be clever about it), but database connections traffic in mostly first-order data structures, so it's relatively easy to create a connection proxy that dispatches to a real database connection running in a difference place.

The one exception, the single kind of higher-order data used by connections, is the prepared statement object. But it's possible to proxy those using a hash table and finalizers. (Actually, prepared statements use finalizers already to clean up resources, and the clean-up code is in the connection class, so I didn't even need to create a new prepared statement class.)

Saturday, April 08, 2006

macros, parameters; binding, and reference

Danny Yoo had an interesting question on the plt-scheme mailing list recently. At first it seemed like your standard non-hygienic, “I want this to mean something in here” macro question. I used to group macros into three levels of “hygienicness”: the hygienic ones, the ones that are morally hygienic in that the names they introduce are based on their input, and the totally non-hygienic ones that have a fixed name that they stick into the program. An example of the middle set is define-struct, and an example of the third set is a loop construct that binds the name yield in its body.

The third class used to offend me from a purist’s (semi-purist?) perspective. But it’s a very reasonable thing to want to do. Consider the class macro and the names it uses to do interesting things: super, public, field, init, and so on. It depends on those particular names.

Danny Yoo was writing a generator library. He had a define-generator form, used like this:

(define-generator (name . args) . body)

and he wanted yield to have a particular meaning inside of the generator body. He had used the usual non-hygienic technique of creating the right yield identifier using datum->syntax, but he was asking for other ideas.

Dave commented that non-hygienic macros typically do not play nicely together; that is, it can be hard to write other macros that expand into them, because you have to think about what version of the code you want to bind the variable in, and you can’t always tell... it’s a mess. Dave recommended creating two versions of the macro: a non-hygienic front-end that forged the yield identifier, and a hygienic back-end that did the actual implementation. People who wanted to further abstract over generator definitions could use the hygienic version.

But there’s another way to look at it, and that’s what I replied to the mailing list. It got me thinking about the similarities between macros and normal programming, and the different techniques we use.