couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Cohnen <>
Subject Re: View generation checkpointing on every update
Date Mon, 23 Aug 2010 07:04:20 GMT
Hey Jamie,

first, I don't know anything about view checkpointing and how/if it could be customized in
order to make couch commit less often, sorry :)

(more replies inline)

On 23.08.2010, at 08:38, Jamie Talbot wrote:

> Tuyen Tran <ituyen@...> writes:
>> We have a view that is checkpointing on every update and taking a long time 
>> to generate.
>> <snip..>
>> Has anyone seen similar performance? Are my documents too big with too many 
> fields?
>> Thanks,
>> -T
> I have almost the same situation as you, using CouchDB 1.0.  Only 3000 
> documents in this sample database (of an overall document set of 450000).  Each 
> document is about 200KB, and contains an array of JSON objects, that each have 
> 3 small properties.
> My view emits a large key of 6 parts (an array of timestamp components) and a 
> value array with 2 integers, In and Out.  Without a reduce step it takes 5m20s 
> to generate.  With a reduce step that does a sum of Ins and Outs, it takes more 
> than 30 minutes.  Each document takes about 7 seconds to process.  It 
> checkpoints after every document.

Raw speed is hard to compare, though 7s per document only for emitting some fields of each
document and summing up some values sound quite slow. Since you are emitting non scalar values
you cannot use the build-in reduce functions, which are *very* fast (implemented in erlang,
running inside couchdb, no serialization overhead). But using a custom written erlang view
could still be a very good option.

> When looking at the size of the view, it comes out at about 900MB of data, 
> from a 30MB database.  After compacting, this drops to 90MB, or a factor of 10.  
> I found 0.10 significantly faster, though I don't have hard numbers, and didn't 
> try 0.11.

What are you emitting as keys for your view? This kind of discrepancy in size between compacted
and not-compated view could be a sign, that you are emitting very large or complex keys (at
least this is my experience). Maybe you can have a look in this direction to optimize.

> On these numbers, Couch is unfortunately going to be unusable.  For the full 
> document set, it is likely to take 44 days to build the view, and will take 
> roughly 1.5TB, which will compact down to 150GB.  Once it's up and running, it 
> will probably be fine; we only add a document every 2 minutes, so a 7 second 
> build time and calling stale=true on the client will suffice.  However the
> risk on the view file is too great to bear.  If it were to be corrupted (Couch 
> does an excellent job at avoiding this, but you need to plan for the worst), it 
> would take a month and half to rebuild.

View corruption is very unlikely, but you can copy around view files like databases, so you
could easily copy the views from your backup/slave/... system to the server that got corrupted.
So that shouldn't be a real problem.

> I have seen a number of posts where people have starting considering a
> different view building algorithm that is oriented to performance.  I would 
> personally love to see a "risky=true" build option for the views, which 
> focussed more on performance and less on stability, on the understanding that 
> if we crashed while generating it, we'd have to start again.  For the initial 
> load, and rebuilds, that would be a price worth paying.  We're never going to 
> have less data!
> I'm also keen to hear peoples' experiences with this.
> Kind Regards, 
> Jamie.

View raw message