couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <>
Subject Re: Reduce is Really Slow!
Date Wed, 20 Aug 2008 21:44:36 GMT
Paul's advice is right on - if you can get the data using a range
query on a map view (without reduce), you should do that - if you need
aggregation of very many rows into a short value, reduce is your

On Wed, Aug 20, 2008 at 1:32 PM, Nicholas Retallack
<> wrote:
> Replacing 'return values' with 'return values.length' shows you're
> right.  4 minutes for the first query, miliseconds afterward, as
> opposed to forever.

That sounds like the query times I'm getting.

> Are there plans to make reduce work for these more general
> data-mangling tasks?  Or should I be approaching the problem a
> different way?  Perhaps write my map calls differently so they produce
> more rows for reduce to compact?  Or do something special if the third
> parameter to reduce is true?

"Plans" would be a strong term, but I've been digging through the
source lately thinking about ways to make a more Hadoop-like map
process. I've prototyped remap in Ruby

The driving use case is a list of URLs, as output from a view, that
are each fetched by the view server (robots.txt etc etc), with the
fetched results stored as new documents. Essentially a Nutch
implementation backed by CouchDB.

Of course this could be an application process running against the
HTTP API, but CouchDB's view-server plugin architecture could make
managing data even easier than Hadoop does.

I've got my crazy idea hat on, so don't expect to see this in trunk soon. ;)


Chris Anderson

View raw message