incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Update handler is very slow
Date Wed, 06 Mar 2013 14:39:57 GMT
I bet you could go faster too, but that's a huge improvement, congrats!

On 6 March 2013 08:21, Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
> I couldn't resist and I have moved to a bulk read / modify / bulk write
> approach and the situation has dramatically improved: I am running now at
> over 100 docs/s compared to a 4 docs/s with the update handler.
>
> On Wed, Mar 6, 2013 at 2:28 PM, Daniel Gonzalez <gonvaled@gonvaled.com>wrote:
>
>> Thanks Robert, that explains it.
>>
>> I was indeed under the impression that update handlers are faster than
>> re-creation of documents. Seeing couchdb as a black-box, that is what you
>> would expect, since the update handler requires less information transfer,
>> and is largely performed inside couchdb itself (with eventually some data
>> coming with the http request).
>>
>> I understand now that the implementation details of the update handler
>> make it slower (in the general case) than re-creation of documents, but
>> since this is not plainly obvious, I think it should be mentioned in the
>> documentation about update handlers.
>>
>> Actually, my first approach to solve the problem was to do exactly that
>> (bulk read / modify / bulk write), but I discarded it because I had thought
>> that an update handler would be *faster*. Then I implemented my solution,
>> and was surprised about the slowness of it. Hence my mail.
>>
>> Now my database update is halfway through, and I will let it run until
>> completion. For the next time, I hope to remember about this discussion.
>>
>> Thanks,
>> Daniel
>>
>> On Wed, Mar 6, 2013 at 2:17 PM, Robert Newson <rnewson@apache.org> wrote:
>>
>>> Update handlers are very slow in comparison to a straight POST or PUT
>>> as they have to invoke some Javascript on the server. This is, by some
>>> margin, the slowest way to achieve your goal.
>>>
>>> The mistake here, though, is thinking that an update handler is the
>>> right way to update every document in your system. Update handlers
>>> exist to add a little server-side logic in cases where it's impossible
>>> or awkward to do so in the client (i.e, when the client is not a
>>> browser). GIven their intrinsic slowness, I'd avoid them where I
>>> could.
>>>
>>> The fastest way to update documents is to use the bulk document API.
>>> Ideally you want fetch a batch of docs that need updating in one call,
>>> transform them using any scripting language or tool, and then update
>>> the batch by posting it to _bulk_docs. These methods are described in
>>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API. Some
>>> experimentation will be required to find a good batch size; too small
>>> and this will take longer than it could, too high and the server can
>>> crash by running out of memory.  Unless your documents are very large,
>>> or very small, I'd start with a couple of hundred docs and then tweak
>>> up and down. Since this sounds like a one-off, you might even skip
>>> this optimization phase, the difference between doing singular PUT's
>>> through an update handler and doing 200 documents through _bulk_docs
>>> will be so huge that you might not need it to go any faster.
>>>
>>> There was a recent thread to add this as a CouchDB feature. If we did,
>>> it would work much the same as above. I'm wary, though, as it would
>>> encourage the rewrite-all-the-documents approach. That should be quite
>>> a rare event since a schema-less document-oriented approach should
>>> largely relieve you of the pain of changing document contents. In this
>>> thread's case, the inconsistent use of a particular field, a one-time
>>> fix-up makes sense (assuming that new updates are consistent).
>>>
>>> B.
>>>
>>>
>>> On 6 March 2013 06:13, Anthony Ananich <anton.ananich@inpun.com> wrote:
>>> > And how much does it take to add document by HTTP PUT?
>>> >
>>> > On Wed, Mar 6, 2013 at 2:33 PM, svilen <az@svilendobrev.com> wrote:
>>> >> +1. i'd like to know also about update_handlers as i may get into such
>>> >> situation soon.
>>> >>
>>> >> not an answer:
>>> >> if you sure your transformation is correct, my lame take would be:
>>> >> don't do anything.
>>> >> 4doc/s, 12000/hour - so by tomorrow it would be done.
>>> >>
>>> >> of course, no harm to find/learn - e.g. u may need to rerun it again..
>>> >>
>>> >> ciao
>>> >> svilen
>>> >>
>>> >> On Wed, 6 Mar 2013 12:06:41 +0100
>>> >> Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> We have a problem in our data: we have been inconsistent in one
of our
>>> >>> fields, and we have named it in different ways. Besides, in some
>>> >>> places we have used int, in other places string. I have created
an
>>> >>> update handler to correct this situation, and I am running it for
our
>>> >>> 100 thousand documents database, by doing PUT requests, as explained
>>> >>> http://wiki.apache.org/couchdb/Document_Update_Handlers
>>> >>>
>>> >>> What I am doing is:
>>> >>>
>>> >>>    1. get affected documents with a view
>>> >>>    2. call the update handler.
>>> >>>
>>> >>> And this is running over an ssh tunnel.
>>> >>>
>>> >>> My problem is that this is veeeery slow. Currently I am running
at 4
>>> >>> docs/s. Is this normal?
>>> >>>
>>> >>> I could do this locally (no ssh tunnel), but I guess things would
not
>>> >>> improve much, since the data being transferred is not that big (no
>>> >>> include_docs, and the view emits very litte information). I have
the
>>> >>> impression that the bottleneck is couchdb itself: the update handler
>>> >>> is just that slow.
>>> >>>
>>> >>> Am I right about this? Is there a way to speed this up?
>>> >>>
>>> >>> Thanks,
>>> >>> Daniel
>>>
>>
>>

Mime
View raw message