couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: re-index efficiency
Date Thu, 22 Oct 2009 15:34:11 GMT
On Oct 22, 2009, at 11:28 AM, Fabio Forno wrote:

> On Thu, Oct 22, 2009 at 5:12 PM, Paul Davis <paul.joseph.davis@gmail.com 
> > wrote:
>> Fabio,
>>
>> There are about four things that will slow view generation down from
>> the _bulk_docs rate:
>>
>> 1. JSON conversion (twice) when passing data to the view process
>> 2. Collation of keys on tree insertion
>> 3. I/O (Disk and stdio)
>> 4. Memory thresholds
>>
>> Things like native views will give noticeable speed improvements
>> because it avoids JSON serialization and transfer over stdio. The
>> other (theoretically) tunable parameter is the memory threshold that
>> triggers flushes to disk. Its not currently configurable by the  
>> client
>> (requires a rebuild of couchdb) and as such I haven't seen anyone
>> attempt to tune it.
>
> Thanks fro the answer, so I see that there are considerable margins
> for improvements, because ideally the index re-generation should be
> bound by disk speed once all possible optimizations are kicked in
> (except some pathological situations such as an application I have
> which stores chunks of xml in document strings, obliging double
> parsing in order to process them ;))
>
> bye

There are optimizations in trunk that get CouchDB closer to achieving  
this goal.  Re-indexing does lots of random I/O, so you won't be  
seeing 30MB/s on spinning platters, but it's many times better than  
what we had in 0.9.  Best,

Adam


Mime
View raw message