couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Talbot <c...@jamietalbot.com>
Subject Re: View generation checkpointing on every update
Date Mon, 23 Aug 2010 06:38:22 GMT
Tuyen Tran <ituyen@...> writes:
> 
> We have a view that is checkpointing on every update and taking a long time 
> to generate.
> <snip..>
> Has anyone seen similar performance? Are my documents too big with too many 
fields?
> 
> Thanks,
> -T

I have almost the same situation as you, using CouchDB 1.0.  Only 3000 
documents in this sample database (of an overall document set of 450000).  Each 
document is about 200KB, and contains an array of JSON objects, that each have 
3 small properties.

My view emits a large key of 6 parts (an array of timestamp components) and a 
value array with 2 integers, In and Out.  Without a reduce step it takes 5m20s 
to generate.  With a reduce step that does a sum of Ins and Outs, it takes more 
than 30 minutes.  Each document takes about 7 seconds to process.  It 
checkpoints after every document.

When looking at the size of the view, it comes out at about 900MB of data, 
from a 30MB database.  After compacting, this drops to 90MB, or a factor of 10.  
I found 0.10 significantly faster, though I don't have hard numbers, and didn't 
try 0.11.

On these numbers, Couch is unfortunately going to be unusable.  For the full 
document set, it is likely to take 44 days to build the view, and will take 
roughly 1.5TB, which will compact down to 150GB.  Once it's up and running, it 
will probably be fine; we only add a document every 2 minutes, so a 7 second 
build time and calling stale=true on the client will suffice.  However the
risk on the view file is too great to bear.  If it were to be corrupted (Couch 
does an excellent job at avoiding this, but you need to plan for the worst), it 
would take a month and half to rebuild.

I have seen a number of posts where people have starting considering a
different view building algorithm that is oriented to performance.  I would 
personally love to see a "risky=true" build option for the views, which 
focussed more on performance and less on stability, on the understanding that 
if we crashed while generating it, we'd have to start again.  For the initial 
load, and rebuilds, that would be a price worth paying.  We're never going to 
have less data!

I'm also keen to hear peoples' experiences with this.

Kind Regards, 

Jamie.




Mime
View raw message