incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: Update handler is very slow
Date Wed, 06 Mar 2013 14:21:47 GMT
I couldn't resist and I have moved to a bulk read / modify / bulk write
approach and the situation has dramatically improved: I am running now at
over 100 docs/s compared to a 4 docs/s with the update handler.

On Wed, Mar 6, 2013 at 2:28 PM, Daniel Gonzalez <gonvaled@gonvaled.com>wrote:

> Thanks Robert, that explains it.
>
> I was indeed under the impression that update handlers are faster than
> re-creation of documents. Seeing couchdb as a black-box, that is what you
> would expect, since the update handler requires less information transfer,
> and is largely performed inside couchdb itself (with eventually some data
> coming with the http request).
>
> I understand now that the implementation details of the update handler
> make it slower (in the general case) than re-creation of documents, but
> since this is not plainly obvious, I think it should be mentioned in the
> documentation about update handlers.
>
> Actually, my first approach to solve the problem was to do exactly that
> (bulk read / modify / bulk write), but I discarded it because I had thought
> that an update handler would be *faster*. Then I implemented my solution,
> and was surprised about the slowness of it. Hence my mail.
>
> Now my database update is halfway through, and I will let it run until
> completion. For the next time, I hope to remember about this discussion.
>
> Thanks,
> Daniel
>
> On Wed, Mar 6, 2013 at 2:17 PM, Robert Newson <rnewson@apache.org> wrote:
>
>> Update handlers are very slow in comparison to a straight POST or PUT
>> as they have to invoke some Javascript on the server. This is, by some
>> margin, the slowest way to achieve your goal.
>>
>> The mistake here, though, is thinking that an update handler is the
>> right way to update every document in your system. Update handlers
>> exist to add a little server-side logic in cases where it's impossible
>> or awkward to do so in the client (i.e, when the client is not a
>> browser). GIven their intrinsic slowness, I'd avoid them where I
>> could.
>>
>> The fastest way to update documents is to use the bulk document API.
>> Ideally you want fetch a batch of docs that need updating in one call,
>> transform them using any scripting language or tool, and then update
>> the batch by posting it to _bulk_docs. These methods are described in
>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API. Some
>> experimentation will be required to find a good batch size; too small
>> and this will take longer than it could, too high and the server can
>> crash by running out of memory.  Unless your documents are very large,
>> or very small, I'd start with a couple of hundred docs and then tweak
>> up and down. Since this sounds like a one-off, you might even skip
>> this optimization phase, the difference between doing singular PUT's
>> through an update handler and doing 200 documents through _bulk_docs
>> will be so huge that you might not need it to go any faster.
>>
>> There was a recent thread to add this as a CouchDB feature. If we did,
>> it would work much the same as above. I'm wary, though, as it would
>> encourage the rewrite-all-the-documents approach. That should be quite
>> a rare event since a schema-less document-oriented approach should
>> largely relieve you of the pain of changing document contents. In this
>> thread's case, the inconsistent use of a particular field, a one-time
>> fix-up makes sense (assuming that new updates are consistent).
>>
>> B.
>>
>>
>> On 6 March 2013 06:13, Anthony Ananich <anton.ananich@inpun.com> wrote:
>> > And how much does it take to add document by HTTP PUT?
>> >
>> > On Wed, Mar 6, 2013 at 2:33 PM, svilen <az@svilendobrev.com> wrote:
>> >> +1. i'd like to know also about update_handlers as i may get into such
>> >> situation soon.
>> >>
>> >> not an answer:
>> >> if you sure your transformation is correct, my lame take would be:
>> >> don't do anything.
>> >> 4doc/s, 12000/hour - so by tomorrow it would be done.
>> >>
>> >> of course, no harm to find/learn - e.g. u may need to rerun it again..
>> >>
>> >> ciao
>> >> svilen
>> >>
>> >> On Wed, 6 Mar 2013 12:06:41 +0100
>> >> Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> We have a problem in our data: we have been inconsistent in one of our
>> >>> fields, and we have named it in different ways. Besides, in some
>> >>> places we have used int, in other places string. I have created an
>> >>> update handler to correct this situation, and I am running it for our
>> >>> 100 thousand documents database, by doing PUT requests, as explained
>> >>> http://wiki.apache.org/couchdb/Document_Update_Handlers
>> >>>
>> >>> What I am doing is:
>> >>>
>> >>>    1. get affected documents with a view
>> >>>    2. call the update handler.
>> >>>
>> >>> And this is running over an ssh tunnel.
>> >>>
>> >>> My problem is that this is veeeery slow. Currently I am running at 4
>> >>> docs/s. Is this normal?
>> >>>
>> >>> I could do this locally (no ssh tunnel), but I guess things would not
>> >>> improve much, since the data being transferred is not that big (no
>> >>> include_docs, and the view emits very litte information). I have the
>> >>> impression that the bottleneck is couchdb itself: the update handler
>> >>> is just that slow.
>> >>>
>> >>> Am I right about this? Is there a way to speed this up?
>> >>>
>> >>> Thanks,
>> >>> Daniel
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message