incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@apache.org>
Subject Re: Top 10 sorted by value...
Date Wed, 01 Oct 2008 03:04:03 GMT
On Tue, Sep 30, 2008 at 5:54 PM, kowsik <kowsik@gmail.com> wrote:
> I have a data set where my map reduce returns data like so:
>
> { key: "a", value: 10 },
> { key: "b", value: 12 },
> { key: "c", value: 1 },
> ...
>
> Potentially this could be fairly large, even after the reduce. Is
> there any way to:
>
> 1. sort these by descending values? I only want the top 10.
> 2. page through them in the sorted order?
>

If I understand correctly, you've got a reduce function, and the
results you showed are from querying with group=true.

There is currently no way to sort by value. I would like that ability
as well. If you read the Sawzall paper you'll see that Google's
implementation only provides the ability to approximate the top N from
a set like this.

Sawzall: http://research.google.com/archive/sawzall.html

The approximation algorithm:
http://www.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf

The Sawzall paper also describes a technique where map/reduce jobs are
chained to allow the second to sort the output of the first. CouchDB
does not implement this currently, but I've had luck by streaming the
rows, grouped by key, through a Ruby function, and into another
database, where additional map/reduce jobs can be run on it. I'm
essentially using Ruby to run the reduce of the first map, and then
CouchDB to sort the results of that reduce.

Supporting code is here:
http://github.com/jchris/couchrest/tree/master/lib/couchrest/helper/pager.rb
(see key_reduce() - sorry for the lack of documentation)

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message