On Sat, Apr 25, 2009 at 5:39 PM, Brian Candler wrote: > Has anyone found a clear definition of the limitations of reduce functions, > in terms of how large the reduced data may be? > There is no real hard limit. There's nothing at the moment that prevents you from returning a huge amount of data, its just that performance will suffer fairly dramatically. Chris Anderson was working on a patch at #couchhack to throw errors if the value grows too large, but that would allow for people to turn it off so that it only acts as a heads up for the unsuspecting. > I am considering using a reduce function for a low-ordinality value, to be > able to determine quickly what the unique values are, and how many of each. > That is, the final reduced value would be a small object of the form >   {"foo"=>50000, "bar"=>20000, "baz"=>30000} > > (in this example there are 100,000 documents but only three unique values). > This particular example would be more than fine. The issue is that you need to bound the number of unique values. As in, if you ended up having 10K unique values you'd notice the performance degradation. > As I understand it, the final reduced object sits in the root node, so what > would be the practical limit on the size of this object? That in turn limits > the number of distinct values I can have. > It do and there is none and it do. :D 3 keys is fine, 10 of this nature would be fine. 1 million is probably not fine. > I imagine this would be more efficient than a grouped reduce query (which > would have to re-reduce each key range separately), although I'm not sure by > how much. > With only a few keys, I'm not entirely certain. I would put money on "noticeably" but I'm wouldn't upgrade that to "drastically" without testing. > Regards, > > Brian. > >  // implementation idea (completely untested) > >  function(keys, values, rereduce) { >    if (rereduce) { >      result = values.shift(); >      for(var i in values) { >        for(var j in values[i]) { >          result[j] = (result[j] || 0) + values[i][j]; >        } >      } >    } >    else { >      result = {}; >      for(var i in keys) { >        var key = keys[i]; >        result[key] = (result[key] || 0) + 1; >      } >    } >  } > > Notice that this is quite similar to the "top N tags" example on the wiki. There's actually a bit of a bug in that example that came up when me and Damien had a white boarding exercise at #couchhack that I need to fix, but the theory is fairly sound. The basic idea is that I was only keeping the last key in the reduce stage when you really need to keep both the first and last keys. It should be fairly straightforward to add but I just haven't gotten around to it yet. HTH, Paul Davis