couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Reduce limitations
Date Sat, 25 Apr 2009 21:55:11 GMT
On Sat, Apr 25, 2009 at 5:39 PM, Brian Candler <B.Candler@pobox.com> wrote:
> Has anyone found a clear definition of the limitations of reduce functions,
> in terms of how large the reduced data may be?
>

There is no real hard limit. There's nothing at the moment that
prevents you from returning a huge amount of data, its just that
performance will suffer fairly dramatically. Chris Anderson was
working on a patch at #couchhack to throw errors if the value grows
too large, but that would allow for people to turn it off so that it
only acts as a heads up for the unsuspecting.

> I am considering using a reduce function for a low-ordinality value, to be
> able to determine quickly what the unique values are, and how many of each.
> That is, the final reduced value would be a small object of the form
>   {"foo"=>50000, "bar"=>20000, "baz"=>30000}
>
> (in this example there are 100,000 documents but only three unique values).
>

This particular example would be more than fine. The issue is that you
need to bound the number of unique values. As in, if you ended up
having 10K unique values you'd notice the performance degradation.

> As I understand it, the final reduced object sits in the root node, so what
> would be the practical limit on the size of this object? That in turn limits
> the number of distinct values I can have.
>

It do and there is none and it do. :D 3 keys is fine, 10 of this
nature would be fine. 1 million is probably not fine.

> I imagine this would be more efficient than a grouped reduce query (which
> would have to re-reduce each key range separately), although I'm not sure by
> how much.
>

With only a few keys, I'm not entirely certain. I would put money on
"noticeably" but I'm wouldn't upgrade that to "drastically" without
testing.

> Regards,
>
> Brian.
>
>  // implementation idea (completely untested)
>
>  function(keys, values, rereduce) {
>    if (rereduce) {
>      result = values.shift();
>      for(var i in values) {
>        for(var j in values[i]) {
>          result[j] = (result[j] || 0) + values[i][j];
>        }
>      }
>    }
>    else {
>      result = {};
>      for(var i in keys) {
>        var key = keys[i];
>        result[key] = (result[key] || 0) + 1;
>      }
>    }
>  }
>
>

Notice that this is quite similar to the "top N tags" example on the
wiki. There's actually a bit of a bug in that example that came up
when me and Damien had a white boarding exercise at #couchhack that I
need to fix, but the theory is fairly sound. The basic idea is that I
was only keeping the last key in the reduce stage when you really need
to keep both the first and last keys. It should be fairly
straightforward to add but I just haven't gotten around to it yet.

HTH,
Paul Davis

Mime
View raw message