couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Cohnen <>
Subject Re: Why I think view generation should be done concurrent.
Date Sun, 04 Jul 2010 10:12:17 GMT
AFAIK the current architectures does not play well with this approach. even if you have multiple
concurrent view servers (and you have a a very fast storge), the view itself need to be written
to one file, and to one b-tree. so you could process the mapping faster, but the new bottleneck
would be the process which writes the view back to disk.

(correct me if I'm wrong here).

On 04.07.2010, at 11:36, Julian Moritz wrote:

> Hi,
> a few days ago I've tweeted a wish to have view generation done
> concurrent. I'll tell you why (because @janl doesn't think so).
> I've got some documents in the form of:
> _id: 1,
> _rev: 3-abc, 
> url:,
> hrefs: [, 
> 	...,
> 	...,
> 	...]
> As you can imagine me crawling the web, I got plenty of them. And every
> second thousands more. I've got a view, is:
> def fun(doc):    
>    h = hash
>    if doc.has_key("hrefs"):
>        for href in doc["hrefs"]:
>            yield (h(href), href), None
> is:
> def fun(key, value, rereduce):
>    return True
> If you're not able to read python code: it's generating a large list of
> unique pseudo-randomly ordered urls. I'm calling this view quite often
> (to get new urls to be crawled). 
> What is my problem now? My couchdb process is at 100%cpu and the view
> needs sometimes quite long to be generated (even if I got only testing
> data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I
> think it could be way more faster if every core was used. What does
> couchdb do with a very large system, let's say 64 atom cores (which
> would be in an idle mode energy saving) and 20TB of data? Using 1 core
> with let's say 1ghz to munch down 20TB? Oh please. 
> Why doesn't couchdb use all cores to generate views?
> Regards
> Julian
> P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM it
> makes me mad to see one core out of four working and the rest is idle.

View raw message