incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Calle Dybedahl <>
Subject Post-filtering reduced results?
Date Mon, 19 Sep 2011 11:27:48 GMT

I have a pretty simple pair of map and reduce functions. The first is basically just emitting
a key and a 1, and the reduce is the built-in _sum function. This works fine, and tells me
how many times every key has been seen.

Now, the problem is that I'm actually only interested in the handful of keys that have been
seen the most often. The data fits a power-law distribution, which means that there is a long
tail that I'm not at all interested in. And by "long" here I'm talking about tens of thousands
of rows. At the moment, my client-side code spends more than 99.9% of its runtime receiving
and parsing JSON from the CouchDB server, very nearly all of which it will promptly throw
away as soon as it's been parsed. This is annoying and silly.

Is there any way at all to filter the results of a reduced query on the CouchDB end? Alternatively,
is there a way for a reduce function to know that it's the final stage in the re-reduce chain
(if I could drop all keys with a final value of 1, I'd save an order of magnitude of runtime)?

I can't be the first one ever to run into a problem like this, but I've failed to find any
solutions on the net.
Calle Dybedahl -*- +46 703 - 970 612

View raw message