Example of a custom aggregate in PostgreSQL

Yesterday I switched my development environment to PostgreSQL 8.4, and so today I foolishly used the PostgreSQL 8.4 manual while I was developing, without thinking that I might be using some new functionality. Silly me!

What I wanted to do was to convert a column of words into a comma-delimited list (for readability, not for export), to get output something like this:

 id  |                   tags
-----+----------------------------------------------
 141 | DAViCal, FOSS, Programming, CalDAV, Releases
 138 | Family, Life, Kids
 137 | Kids, Family, Rants
 136 | Life, FOSS, Debian, lca
 135 | Releases, FOSS, Packages, Debian, DAViCal

Where the table has two columns 'id' and 'tag', like:

 id  |     tag     
-----+-------------
 141 | Releases
 141 | Programming
 141 | CalDAV
 141 | FOSS
 141 | DAViCal
 138 | Kids
 138 | Life
 138 | Family
 137 | Kids
 137 | Family
 137 | Rants
 136 | Debian
 136 | lca
 136 | Life
 136 | FOSS
 135 | Packages
 135 | Releases
 135 | DAViCal
 135 | Debian
 135 | FOSS

I looked at this and thought: that's just the job for an aggregate function! It's like sum(), except it concatenates!

Looking through the PostgreSQL 8.4 manual I discovered an aggregate function called array_agg() which didn't quite do what I want, but it was willing to convert the column into an array. Looking a little further I found array_to_string() which did the rest of the job for me, so I ended up with syntax like this:

SELECT id, array_to_string(array_agg(tag),', ') FROM tagging GROUP BY id

Perfect!

Or so it seemed, until I rolled it into my staging site and discovered that array_agg() is a new function... It seemed like such an obvious thing to want to do that I had assumed it must have been around forever!

So: back to the drawing board, where I wrote a

list_of( TEXT )

aggregate as follows:

CREATE OR REPLACE FUNCTION concat( TEXT, TEXT ) RETURNS TEXT AS $$
  SELECT $1 || (CASE WHEN $1 = '' THEN '' ELSE ', ' END) || $2;
$$
LANGUAGE SQL STRICT IMMUTABLE;

CREATE AGGREGATE list_of ( TEXT ) (
    SFUNC = concat,
    STYPE = TEXT,
    INITCOND = ''
);

First I had to create the concat(TEXT,TEXT) function to concatenate each value onto the earlier ones, and then I had to create the list_of(TEXT) aggregate using that function. In the end it was surprisingly easy, so I may well find myself creating custom aggregates for things like this more in the future. Most of my time was, in fact, spent puzzling through the manual trying to figure it out.

With it all created I can now use the following SQL in my code:

SELECT id, list_of(tag) FROM tagging GROUP BY id

Perfect! Well, nearly :-)

Having got to this point it seemed to me that I should be able to specify a different join text, as I could when I was using the array_to_string() function. In fact this proved not to add much complexity to the problem at all, as follows:

CREATE OR REPLACE FUNCTION concat_delimited( TEXT, TEXT, TEXT ) RETURNS TEXT AS $$
  SELECT $1 || (CASE WHEN $1 = '' THEN '' ELSE $3 END) || $2;
$$
LANGUAGE SQL STRICT IMMUTABLE;

CREATE AGGREGATE delimited_list_of ( TEXT, TEXT ) (
    SFUNC = concat_delimited,
    STYPE = TEXT,
    INITCOND = ''
);

Now I can use the following SQL, allowing me to specify the delimiter:

SELECT id, delimited_list_of(tag, ' = ') FROM tagging GROUP BY id

And lo:

 id  |                delimited_list_of                 
-----+--------------------------------------------------
 141 | Releases = Programming = CalDAV = FOSS = DAViCal
 138 | Kids = Life = Family
 137 | Kids = Family = Rants
 136 | Debian = lca = Life = FOSS
 135 | Packages = Releases = DAViCal = Debian = FOSS

I've created the functions as 'IMMUTABLE', since I know that whatever gets passed into concat(TEXT,TEXT) or concat_delimited(TEXT,TEXT,TEXT) the output will always be the same for that set of arguments.

I've also created the functions as 'STRICT', since that's what I tend to do by default, but in this case it is the right thing too. The function does not need to handle NULL values explicitly since the aggregate function won't be called for NULL values, and the existing state will be retained for the next non-NULL call.

We can confirm that, too:

INSERT INTO tagging(id,tag) VALUES(137,NULL);
SELECT id, delimited_list_of(tag, ' = ') FROM tagging GROUP BY id;

 id  |                delimited_list_of                 
-----+--------------------------------------------------
 141 | Releases = Programming = CalDAV = FOSS = DAViCal
 138 | Kids = Life = Family
 137 | Kids = Family = Rants
 136 | Debian = lca = Life = FOSS
 135 | Packages = Releases = DAViCal = Debian = FOSS

Perfect!

[D] [Digg] [FB] [R] [SU] [Tweet]