incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Update handler is very slow
Date Wed, 06 Mar 2013 13:17:11 GMT
Update handlers are very slow in comparison to a straight POST or PUT
as they have to invoke some Javascript on the server. This is, by some
margin, the slowest way to achieve your goal.

The mistake here, though, is thinking that an update handler is the
right way to update every document in your system. Update handlers
exist to add a little server-side logic in cases where it's impossible
or awkward to do so in the client (i.e, when the client is not a
browser). GIven their intrinsic slowness, I'd avoid them where I
could.

The fastest way to update documents is to use the bulk document API.
Ideally you want fetch a batch of docs that need updating in one call,
transform them using any scripting language or tool, and then update
the batch by posting it to _bulk_docs. These methods are described in
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API. Some
experimentation will be required to find a good batch size; too small
and this will take longer than it could, too high and the server can
crash by running out of memory.  Unless your documents are very large,
or very small, I'd start with a couple of hundred docs and then tweak
up and down. Since this sounds like a one-off, you might even skip
this optimization phase, the difference between doing singular PUT's
through an update handler and doing 200 documents through _bulk_docs
will be so huge that you might not need it to go any faster.

There was a recent thread to add this as a CouchDB feature. If we did,
it would work much the same as above. I'm wary, though, as it would
encourage the rewrite-all-the-documents approach. That should be quite
a rare event since a schema-less document-oriented approach should
largely relieve you of the pain of changing document contents. In this
thread's case, the inconsistent use of a particular field, a one-time
fix-up makes sense (assuming that new updates are consistent).

B.


On 6 March 2013 06:13, Anthony Ananich <anton.ananich@inpun.com> wrote:
> And how much does it take to add document by HTTP PUT?
>
> On Wed, Mar 6, 2013 at 2:33 PM, svilen <az@svilendobrev.com> wrote:
>> +1. i'd like to know also about update_handlers as i may get into such
>> situation soon.
>>
>> not an answer:
>> if you sure your transformation is correct, my lame take would be:
>> don't do anything.
>> 4doc/s, 12000/hour - so by tomorrow it would be done.
>>
>> of course, no harm to find/learn - e.g. u may need to rerun it again..
>>
>> ciao
>> svilen
>>
>> On Wed, 6 Mar 2013 12:06:41 +0100
>> Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
>>
>>> Hi,
>>>
>>> We have a problem in our data: we have been inconsistent in one of our
>>> fields, and we have named it in different ways. Besides, in some
>>> places we have used int, in other places string. I have created an
>>> update handler to correct this situation, and I am running it for our
>>> 100 thousand documents database, by doing PUT requests, as explained
>>> http://wiki.apache.org/couchdb/Document_Update_Handlers
>>>
>>> What I am doing is:
>>>
>>>    1. get affected documents with a view
>>>    2. call the update handler.
>>>
>>> And this is running over an ssh tunnel.
>>>
>>> My problem is that this is veeeery slow. Currently I am running at 4
>>> docs/s. Is this normal?
>>>
>>> I could do this locally (no ssh tunnel), but I guess things would not
>>> improve much, since the data being transferred is not that big (no
>>> include_docs, and the view emits very litte information). I have the
>>> impression that the bottleneck is couchdb itself: the update handler
>>> is just that slow.
>>>
>>> Am I right about this? Is there a way to speed this up?
>>>
>>> Thanks,
>>> Daniel

Mime
View raw message