couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: Update handler is very slow
Date Wed, 06 Mar 2013 13:28:52 GMT
Thanks Robert, that explains it.

I was indeed under the impression that update handlers are faster than
re-creation of documents. Seeing couchdb as a black-box, that is what you
would expect, since the update handler requires less information transfer,
and is largely performed inside couchdb itself (with eventually some data
coming with the http request).

I understand now that the implementation details of the update handler make
it slower (in the general case) than re-creation of documents, but since
this is not plainly obvious, I think it should be mentioned in the
documentation about update handlers.

Actually, my first approach to solve the problem was to do exactly that
(bulk read / modify / bulk write), but I discarded it because I had thought
that an update handler would be *faster*. Then I implemented my solution,
and was surprised about the slowness of it. Hence my mail.

Now my database update is halfway through, and I will let it run until
completion. For the next time, I hope to remember about this discussion.

Thanks,
Daniel

On Wed, Mar 6, 2013 at 2:17 PM, Robert Newson <rnewson@apache.org> wrote:

> Update handlers are very slow in comparison to a straight POST or PUT
> as they have to invoke some Javascript on the server. This is, by some
> margin, the slowest way to achieve your goal.
>
> The mistake here, though, is thinking that an update handler is the
> right way to update every document in your system. Update handlers
> exist to add a little server-side logic in cases where it's impossible
> or awkward to do so in the client (i.e, when the client is not a
> browser). GIven their intrinsic slowness, I'd avoid them where I
> could.
>
> The fastest way to update documents is to use the bulk document API.
> Ideally you want fetch a batch of docs that need updating in one call,
> transform them using any scripting language or tool, and then update
> the batch by posting it to _bulk_docs. These methods are described in
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API. Some
> experimentation will be required to find a good batch size; too small
> and this will take longer than it could, too high and the server can
> crash by running out of memory.  Unless your documents are very large,
> or very small, I'd start with a couple of hundred docs and then tweak
> up and down. Since this sounds like a one-off, you might even skip
> this optimization phase, the difference between doing singular PUT's
> through an update handler and doing 200 documents through _bulk_docs
> will be so huge that you might not need it to go any faster.
>
> There was a recent thread to add this as a CouchDB feature. If we did,
> it would work much the same as above. I'm wary, though, as it would
> encourage the rewrite-all-the-documents approach. That should be quite
> a rare event since a schema-less document-oriented approach should
> largely relieve you of the pain of changing document contents. In this
> thread's case, the inconsistent use of a particular field, a one-time
> fix-up makes sense (assuming that new updates are consistent).
>
> B.
>
>
> On 6 March 2013 06:13, Anthony Ananich <anton.ananich@inpun.com> wrote:
> > And how much does it take to add document by HTTP PUT?
> >
> > On Wed, Mar 6, 2013 at 2:33 PM, svilen <az@svilendobrev.com> wrote:
> >> +1. i'd like to know also about update_handlers as i may get into such
> >> situation soon.
> >>
> >> not an answer:
> >> if you sure your transformation is correct, my lame take would be:
> >> don't do anything.
> >> 4doc/s, 12000/hour - so by tomorrow it would be done.
> >>
> >> of course, no harm to find/learn - e.g. u may need to rerun it again..
> >>
> >> ciao
> >> svilen
> >>
> >> On Wed, 6 Mar 2013 12:06:41 +0100
> >> Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> We have a problem in our data: we have been inconsistent in one of our
> >>> fields, and we have named it in different ways. Besides, in some
> >>> places we have used int, in other places string. I have created an
> >>> update handler to correct this situation, and I am running it for our
> >>> 100 thousand documents database, by doing PUT requests, as explained
> >>> http://wiki.apache.org/couchdb/Document_Update_Handlers
> >>>
> >>> What I am doing is:
> >>>
> >>>    1. get affected documents with a view
> >>>    2. call the update handler.
> >>>
> >>> And this is running over an ssh tunnel.
> >>>
> >>> My problem is that this is veeeery slow. Currently I am running at 4
> >>> docs/s. Is this normal?
> >>>
> >>> I could do this locally (no ssh tunnel), but I guess things would not
> >>> improve much, since the data being transferred is not that big (no
> >>> include_docs, and the view emits very litte information). I have the
> >>> impression that the bottleneck is couchdb itself: the update handler
> >>> is just that slow.
> >>>
> >>> Am I right about this? Is there a way to speed this up?
> >>>
> >>> Thanks,
> >>> Daniel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message