incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Reduce to nothing
Date Fri, 06 Feb 2009 07:55:37 GMT
On Thu, Feb 05, 2009 at 08:44:16AM -0600, Jeremy Wall wrote:
> Is it possible to reduce a key to nothing, i.e. completely remove a
> key from the reduction result.
> 
> For instance, say you post three documents:
> 
>    {"_id": "thing1", "type": "thing"}
>    {"_id": "thing2", "type": "thing"}
>    {"_id": "...", "type": "cancellation", "cancels": "thing1"}
> 
> It's trivial to produce a map function that collates the "thing" and
> "cancellation" documents. However, I can't work out how, or even it
> it's possible, to reduce the view that so that only "thing2" remains.

Nearest I can think of is to collate your view such that the cancellation
comes immediately after the thing:

  ["thing1","thing"]
  ["thing1","cancellation"]

Then the client can see that these two are adjacent and easily check if the
item has been cancelled.

Unfortunately, you can't rely on this in the reduce function, because
sometimes the thing will be in one block of keys/values and the cancellation
will be in another.

If the purpose of your reduce view is only to _count_ how many live things
you have, then you could map:

  ["thing1",1]        # thing
  ["thing1",-1]       # cancellation

and then sum. This won't be right if you can have multiple cancellations for
a thing, but you could avoid this by choosing your doc id naming convention
for cancellations (e.g. "thing1_cancel"). In any case, if you return a
grouped reduce, and see a negative value for a particular key, you know it
has been cancelled.

> I tried not returning anything, just in case it worked ;-), but got a
> JSON encoding error (can't encode undefined, iirc).

You can encode null, though.

> However, I wondered if the more "normal" approach of
> allowing a reduce function to emit zero or more (key, value) pairs
> would be even better?

I think reduce functions have to return a single value - at least, all the
ones I've seen do this. IIUC, all the k/v pairs pointed to by a single
b-tree node are reduced to one value, which is stored within the same b-tree
node. Then the parent b-tree nodes contain the reduction of their children.
The root node contains the reduction of everything to a single value, and
this is what you get if you query without group=true. If you query with
startkey and endkey then the reduce value is recalculated across the range
of keys you specify.

So a reduce function is not a filter on map output, but an aggregation /
summarisation function.

Regards,

Brian.

Mime
View raw message