Monday, October 31, 2011

in praise of PostgreSQL arrays

I just added support for PostgreSQL arrays to the db library. While there are some uses of arrays that are iffy from a database design standpoint, there’s one use that weighs overwhelmingly in their favor: avoiding dynamic generation of SQL IN comparisons.

Everyone knows not to “parameterize” SQL code by smashing strings together, right?

That is, don’t do this:

(define (get-history name)
(query-rows the-db
(format "SELECT url FROM fb.global_web_tracker WHERE name='~a'"

Instead, do this:

(define (get-history name)
(query-rows the-db
"SELECT url FROM fb.global_web_tracker WHERE name=$1"

But what if we want to search multiple names simultaneously?

If we know the names statically, we can use IN:

SELECT url FROM fb.global_web_tracker

WHERE name IN ('Alice', 'Bob')

And if the names aren’t fixed and known in advance? Looks like a job for string smashing ... (but don’t!)

A clean, but awful, alternative is to create a temporary table, insert each of the strings into it, perform the query with a sub-select or join on the temporary table, then drop the temporary table. This works, and it avoids the possibility of SQL injection. But isn’t the point of a declarative language (like SQL, ostensibly) to be able to talk about information without having to first tell the system where to put it?

In PostgreSQL, the array solution is clean and simple, although the syntax is a bit peculiar:

SELECT url FROM fb.global_web_tracker

WHERE name = ANY ($1)

The parameter $1 here must by an array, probably with a type like TEXT[], assuming name is TEXT. The parentheses around $1 are significant.

Here’s what the code looks like using the recently-updated db library:

(define (get-history names)
(query-rows the-db
"SELECT url FROM fb.global_web_tracker WHERE name = ANY ($1)"
(get-history (list "Alice" "Bob"))

In general, array values are actually represented with a dedicated structure type, pg-array, that accommodates multi-dimensional arrays, non-standard starting indexes, etc. But lists are automatically converted, for the sake of simplicity.

Very preliminary research suggests that arrays might be mentioned in one of the later SQL standards. MySQL and SQLite don't support them, though. Oracle and DB2 seem to have some sort of support, but getting to it through ODBC would be tricky, if it’s even possible. So for now, advantage PostgreSQL.


Anonymous said...

Re the final paragraph "MySQL and SQLite support them, though" did you mean to say "do not support"?

Ryan Culpepper said...

@Anonymous: Yes, thanks for catching that. I've edited the post.