incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@grabb.it>
Subject Re: Reduce is Really Slow!
Date Wed, 20 Aug 2008 21:44:36 GMT
Paul's advice is right on - if you can get the data using a range
query on a map view (without reduce), you should do that - if you need
aggregation of very many rows into a short value, reduce is your
friend.

On Wed, Aug 20, 2008 at 1:32 PM, Nicholas Retallack
<nickretallack@gmail.com> wrote:
> Replacing 'return values' with 'return values.length' shows you're
> right.  4 minutes for the first query, miliseconds afterward, as
> opposed to forever.
>

That sounds like the query times I'm getting.

>
> Are there plans to make reduce work for these more general
> data-mangling tasks?  Or should I be approaching the problem a
> different way?  Perhaps write my map calls differently so they produce
> more rows for reduce to compact?  Or do something special if the third
> parameter to reduce is true?
>

"Plans" would be a strong term, but I've been digging through the
source lately thinking about ways to make a more Hadoop-like map
process. I've prototyped remap in Ruby
http://github.com/jchris/couchrest/tree/master/utils/remap.rb

The driving use case is a list of URLs, as output from a view, that
are each fetched by the view server (robots.txt etc etc), with the
fetched results stored as new documents. Essentially a Nutch
implementation backed by CouchDB.

Of course this could be an application process running against the
HTTP API, but CouchDB's view-server plugin architecture could make
managing data even easier than Hadoop does.

I've got my crazy idea hat on, so don't expect to see this in trunk soon. ;)

Chris


-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message