couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <>
Subject Re: View generation checkpointing on every update
Date Mon, 25 Apr 2011 03:23:02 GMT
I was just googling and came on this thread.

I agree a less aggressive checkpoint pattern could be beneficial. I
think there has been some discussion about this on the dev list. Now
I've gotta dig it up.

Was thinking about looking at the way couch_view_updater interacts
with couch_work_queue.


On Mon, Aug 23, 2010 at 12:04 AM, Sebastian Cohnen
<> wrote:
> Hey Jamie,
> first, I don't know anything about view checkpointing and how/if it could be customized
in order to make couch commit less often, sorry :)
> (more replies inline)
> On 23.08.2010, at 08:38, Jamie Talbot wrote:
>> Tuyen Tran <ituyen@...> writes:
>>> We have a view that is checkpointing on every update and taking a long time
>>> to generate.
>>> <snip..>
>>> Has anyone seen similar performance? Are my documents too big with too many
>> fields?
>>> Thanks,
>>> -T
>> I have almost the same situation as you, using CouchDB 1.0.  Only 3000
>> documents in this sample database (of an overall document set of 450000).  Each
>> document is about 200KB, and contains an array of JSON objects, that each have
>> 3 small properties.
>> My view emits a large key of 6 parts (an array of timestamp components) and a
>> value array with 2 integers, In and Out.  Without a reduce step it takes 5m20s
>> to generate.  With a reduce step that does a sum of Ins and Outs, it takes more
>> than 30 minutes.  Each document takes about 7 seconds to process.  It
>> checkpoints after every document.
> Raw speed is hard to compare, though 7s per document only for emitting some fields of
each document and summing up some values sound quite slow. Since you are emitting non scalar
values you cannot use the build-in reduce functions, which are *very* fast (implemented in
erlang, running inside couchdb, no serialization overhead). But using a custom written erlang
view could still be a very good option.
>> When looking at the size of the view, it comes out at about 900MB of data,
>> from a 30MB database.  After compacting, this drops to 90MB, or a factor of 10.
>> I found 0.10 significantly faster, though I don't have hard numbers, and didn't
>> try 0.11.
> What are you emitting as keys for your view? This kind of discrepancy in size between
compacted and not-compated view could be a sign, that you are emitting very large or complex
keys (at least this is my experience). Maybe you can have a look in this direction to optimize.
>> On these numbers, Couch is unfortunately going to be unusable.  For the full
>> document set, it is likely to take 44 days to build the view, and will take
>> roughly 1.5TB, which will compact down to 150GB.  Once it's up and running, it
>> will probably be fine; we only add a document every 2 minutes, so a 7 second
>> build time and calling stale=true on the client will suffice.  However the
>> risk on the view file is too great to bear.  If it were to be corrupted (Couch
>> does an excellent job at avoiding this, but you need to plan for the worst), it
>> would take a month and half to rebuild.
> View corruption is very unlikely, but you can copy around view files like databases,
so you could easily copy the views from your backup/slave/... system to the server that got
corrupted. So that shouldn't be a real problem.
>> I have seen a number of posts where people have starting considering a
>> different view building algorithm that is oriented to performance.  I would
>> personally love to see a "risky=true" build option for the views, which
>> focussed more on performance and less on stability, on the understanding that
>> if we crashed while generating it, we'd have to start again.  For the initial
>> load, and rebuilds, that would be a price worth paying.  We're never going to
>> have less data!
>> I'm also keen to hear peoples' experiences with this.
>> Kind Regards,
>> Jamie.

Chris Anderson

View raw message