incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <>
Subject Re: Is it possible to produce counts with a reduce function, then order by those counts?
Date Wed, 02 Dec 2009 18:25:36 GMT
On Wed, Dec 2, 2009 at 9:18 AM, Simon Willison
<> wrote:
> Hello,
> I've just started learning CouchDB, so apologies if this is covered in
> an FAQ (I've looked around a bit and haven't found it though).
> I'd like to write a view which counts the number of occurrences of a
> value across my whole document set, then returns those occurrences
> ordered by their frequency.
> Essentially this:
> But with an "order by count" at the end.
> Is this possible, or am I asking the wrong kind of question?

The challenge here is that CouchDB's indexes are sorted only along the
original map value. To do what you are requesting you have 3 main

1) Sort the rows by value in your application. This is the simplest
option until you have a large # of distinct rows and you can't fit
them all in memory.

2) Pipe the group-reduce query into a process that saves each row as a
document in another CouchDB database. Then use a map view to sort
those documents by the group value. This is the best option if you
have lots and lots of rows in the group-reduce output. It's probably
the closest to Hadoop/Google-style chained map reduce that you'll see
with CouchDB. Of course the derived index won't be incremental with
updates to the source database.

3) My favorite: You can do something like (1) but on the server in
CouchDB's JavaScript application environment. The _list function is
fed each row of a view in turn, and can do whatever it likes. In your
case you could accumulate the rows there and sort by value. This has
the same memory-limits as (1), but since it's already setup to stream
rows, and since it already runs on the server, it's a little cleaner
and faster than what most application servers would do. (3) is ideal
if what you really want is the top N tags.

Regardless of which you chose, you'll want to cache the output somehow.

We've had discussions about having better support for sort-by-value.
It'd be nice to have built-in support for (2) so that you can
trigger/query it from a browser instead of needing your own small
program to do the transfer.

Most of the documentation for _list assumes you'll be using it to
output HTML, but it should be clear enough how you could use it for
sort-by-value with JSON output. This tweet might be a good start as

some list docs:

Hope that helps,

> Thanks,
> Simon Willison
> Please consider the environment before printing this email.
> ------------------------------------------------------------------
> Visit - the UK's most popular newspaper website
> To save up to 33% when you subscribe to the Guardian and the Observer visit
> ---------------------------------------------------------------------
> This e-mail and all attachments are confidential and may also
> be privileged. If you are not the named recipient, please notify
> the sender and delete the e-mail and all attachments immediately.
> Do not disclose the contents to another person. You may not use
> the information for any purpose, or store, or copy, it in any way.
> Guardian News & Media Limited is not liable for any computer
> viruses or other material transmitted with or as part of this
> e-mail. You should employ virus checking software.
> Guardian News & Media Limited
> A member of Guardian Media Group PLC
> Registered Office
> Number 1 Scott Place, Manchester M3 3GG
> Registered in England Number 908396

Chris Anderson

View raw message