couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Chitipothu <anandol...@gmail.com>
Subject Re: How to speedup view generation?
Date Thu, 04 Nov 2010 16:24:34 GMT
2010/11/4 cdr53x <cdr53x@free.fr>:
> On 10/30/2010 03:52 PM, Anand Chitipothu wrote:
>>
>> I'm trying to setup a couchdb database with 14M documents. The view
>> generation is taking too long. It is running at the rate of 22
>> docs/sec right now. At this rate it will take 7days to build the view,
>> which is too slow and I expect the speed to go down further as the
>> view file size increase.
>>
>>
>
> Hi ,
>
> What is the size of the design document files on the drive ?
>
> I noticed that large views use quite large file ;).
>
> I also noticed that the view group indexers take a large amount time to
> achieve the last 30% of the task. At least twice then to complete the first
> 70%.
>
> In my case I have a 'small' database containing  400K docs. I also hava a
> design doc that indexes 80% of the docs with 8 views. Map functions only
> emit a single property per doc and a null value, so they should be compact.
>
> The overall size of this desing doc .view file on disk is 17G ;).
>
> I don't know how couchdb handles the update of such large files but maybe
> there is something with updating large files ...
>
> Concerning the performance, I use std javascript as interpreter and get a
> rate of ~60 changes/sec in the beginning of the process.
>
> Then it drops to 15c/s after 70%.
>
> I'm about 6c/s, then after 85%
>
> The first 70% took 52minutes and the whole process runned for 3h21m on a
> small stand alone dedicated server.
>
> So I get the feeling that it is not an issue with the view "calculation"
> algo, but probably something that is related to the disk i/o.
>
> I have no erlang knowlege, and I might be quite wrong about the feeling, but
> if you guys know a little bit on this part of couch code  maybe there is
> something that could be checked and would improve the overall design doc
> refresh performance ?

Yes, it is due is IO. In my case it started with a speed of 200
docs/sec and it dropped to almost 3docs/sec and the view file size was
about 60GB after processing something around 6-7M docs.

I noticed that the IO wait has increased to about 15 and the the
beam.smp and couchjs together weren't taking even 50% of one core. I
tried running compaction and looked like the size of the view will be
reduced to 1/6 after compaction, but it was still not progressing well
because if IO wait. Having an SSD might have helped, but I don't have
one.

So I thought it might be faster to run compaction after loading and
waiting for view generation to complete. Tried it and still it looked
like it not going to finish in one week. Even compaction is very very
slow.

I decided to generate the view by feeding the data directly to the map
function and it took about an hour to generate the view for entire 14M
docs. I sorted it, ran reduce and saved the results in another couchdb
database. That was quite faster. I could finish the whole process in
less than 10 hours.

The downside is that I need to take the pain of making sure the view
is up-to-date with the original database. I think that is the good
compromise.

Anand

Mime
View raw message