one racketeer: 2011

Monday, October 31, 2011

in praise of PostgreSQL arrays

I just added support for PostgreSQL arrays to the db library. While there are some uses of arrays that are iffy from a database design standpoint, there’s one use that weighs overwhelmingly in their favor: avoiding dynamic generation of SQL IN comparisons.

the Soylent Green Theory of Presentations

It’s people.... The audience is made of people!

lazy module loading

The Racket module system is good at managing dependencies. When you require a module, you ensure that that module is initialized before your code runs, and when the other module changes, the compiler will notice and recompile your module too. Racket even stratifies dependencies according to phase levels so you can use some modules in your macro implementations and other modules in your run-time code and the expander/compiler/linker knows what you want when. It keeps track and makes sure that everything is loaded and available when it’s supposed to be.

But sometimes you want to manage dependencies yourself. This post is about how to lazily load the implementations of functions and—with a bit of care—even macros.

definitions vs enclosing binding forms

There are two kinds of binding forms in Racket: definitions and enclosing binding forms. The scope of a binding introduced by an enclosing binding form is entirely evident: it’s one (or more) of the form’s sub-terms. For example, in

(lambda (var ...) body)

the scope of the var bindings is body. In contrast, the scope of a definition is determined by its context: the enclosing lambda body, for example, or the enclosing module—except that scope is too simple a term for how bindings work in such contexts. Enclosing binding forms are simpler and cleaner but weaker; definition forms are more powerful, but have a more complicated binding structure. Definitions also have the pleasant property of reducing rightward code drift.

syntax-parse and literals

In my last post, I talked about macros and referential auxiliary identifiers—what we usually call a macro’s “literals.” Scheme macro systems only get it half right, though, because while they compare identifiers using referential equality (i.e., using the free-identifier=? predicate), they allow literals to refer to nonexistent bindings. While the comparison is well-defined via the definition of free-identifier=?, at a higher level the idea is nonsensical.

In contrast, syntax-parse requires that every literal refer to some binding. (I’ll sometimes refer to this requirement as the is-bound property for short.) This requirement is problematic in a different way. Specifically, this property cannot be checked statically (that is, when the syntax-parse expression containing the literal is compiled).

That might strike you as bizarre or unlikely. After all, you can easily imagine checking that a syntax-rules macro, say, satisfies the is-bound property. But in Racket, not every macro uses syntax-rules, and—more importantly—not every bit of syntax-analyzing code is a macro. And both of these facts have to do with phases.

macros and literals

Macros often have associated auxiliary identifiers (sometimes called keywords or reserved words, although both terms are problematic in Racket). For example, cond has else; class has public, private, etc; unit has import and export.

The fundamental question is what constitutes a use of an auxiliary identifier, and there are two reasonable answers: symbolic equality and referential equality. By symbolic equality I mean, for example, that any identifier written with exactly the letters else is accepted as an else auxiliary form. By referential equality I mean any identifier that refers to (using the standard notions of binding, environments, etc) the binding identified as the else binding.

asynchonous execution for databases, using places

I added asynchronous execution to my database library yesterday using Racket's places. The coding part took about an afternoon and part of an evening. The new code is a bit less than 300 lines, most of which is boring serialization and deserialization code, some of which will go away soon.

My database library contains two wire-protocol connection implementations (for PostgreSQL and MySQL) and two FFI-based connection implementations (for SQLite and ODBC). The wire-protocol implementations are more work, but they just use I/O ports, and Racket handles I/O pretty well. On the other hand, the entire Racket VM stops during an FFI call, because Racket threads are green threads.

Having all threads stop execution for FFI calls isn't much of a problem if the FFI calls are all short. If the FFI call is "execute this SQL statement," on the other hand, that can cause serious problems with responsiveness. (Of course, it still depends on how long the SQL statement in question takes to execute.)

ODBC provides the ability to execute some operations asynchronously—in theory. In practice, of all the drivers I had available on my development machines, only the DB2 driver actually supported asynchronous execution. Furthermore, the way one performs an asynchronous call—repeatedly calling a function with identical arguments until it returns something different—plays poorly with GC'd languages, where keeping memory locations identical from call to call requires more effort than it does in, say, C. In short, ODBC's asynchronous execution doesn't solve the interactivity problem.

Racket actually has multiple kinds of concurrency. In addition to (green) threads, Racket also has "futures" (true concurrency if it's not too much trouble, everything shared) and "places" (true concurrency for sure, almost nothing shared, message passing). You can't send higher-order data (functions, objects, etc) between places (rather, you would have to be clever about it), but database connections traffic in mostly first-order data structures, so it's relatively easy to create a connection proxy that dispatches to a real database connection running in a difference place.

The one exception, the single kind of higher-order data used by connections, is the prepared statement object. But it's possible to proxy those using a hash table and finalizers. (Actually, prepared statements use finalizers already to clean up resources, and the clean-up code is in the connection class, so I didn't even need to create a new prepared statement class.)

one racketeer

Monday, October 31, 2011

in praise of PostgreSQL arrays

Thursday, October 27, 2011

the Soylent Green Theory of Presentations

Sunday, October 16, 2011

lazy module loading

Tuesday, September 27, 2011

definitions vs enclosing binding forms

Friday, September 09, 2011

syntax-parse and literals

Wednesday, September 07, 2011

macros and literals

Wednesday, August 31, 2011

asynchonous execution for databases, using places