incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Carlson <lancecarl...@gmail.com>
Subject Re: Mass updates
Date Mon, 13 May 2013 06:24:12 GMT
Made a lot of updates to my couchout project. It now includes a couchin
project as well. Might create another project for updating, but it's pretty
easy for someone to script a node js script (or any language for that
matter) that connects to redis, decodes and encodes base64.


On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
andrey.kouprianov@gmail.com> wrote:

> We do that and we have a cron to touch view every 5 min. Its just that at
> that particular time we had to insert those 150k in one go (we were
> migrating from mysql)
>
> Sent from my iPhone
>
> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com> wrote:
>
> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com>
> > wrote:
> >>
> >> Rebuilding the views mentioned by James is hell! And the more docs and
> >> views you have, the longer your views will have to catch up with the
> >> updates. We dont have the best of the servers, but ours (dedicated) took
> >> several hours to rebuild our views (not too many as well) after we
> > inserted
> >> ~150k documents (we use full text search with Lucene as well, so it also
> >> contributed to the overall sever slowdown).
> >>
> >> So my suggestion is:
> >>
> >> 1. Once you want to migrate your stuff, make a copy of your db.
> >> 2. Do migration on the copy
> >> 3. Allow for views to rebuild (you need to query each desing's document
> >> single view once to trigger for views to start catching up with the
> >> updates). You'd probably ask, if it was possible to limit resource usage
> > of
> >> Couch, when views are rebuilding, but i dont have answer to that
> question.
> >> Maybe someone else can help here...
> >> 4. Switch database pointer from one DB to another.
> >
> > You don' t need to wait that all the docs are here to triggerthe
> viewupdat,
> > Jus trigger it more often. So view calculation will happen on smaller
> set.
> >
> > You caneven make it //by using different ddocs.
> >>
> >>
> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <paul.joseph.davis@gmail.com
> >> wrote:
> >>
> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
> >>> <ckoppel@alumni.gwu.edu> wrote:
> >>>> I am trying to understand whether Couch is the way to go to meet some
> > of
> >>>> my organization's needs.  It seems pretty terrific.
> >>>> The main concern I have is maintaining a consistent state across code
> >>>> releases.  Presumably, our data model will change over the course of
> >>>> time, and when it does, we need to make the several million old
> >>>> documents conform to the new model.
> >>>>
> >>>> Although I would love to pipe a view through an update handler and
> > call
> >>>> it a day, I don't believe that option exists.  The two ways I
> >>>> understandto do this are:
> >>>>
> >>>> 1. Query all documents, update each doc client-side, and PUT those
> >>>> changes in the _bulk_docs API (presumably this should be done in
> > batches)
> >>>> 2. Query the ids for all docs, and one at a time, PUT them through an
> >>>> update handler
> >>>
> >>> You are correct that there's no server side way to do a migration like
> >>> you're asking for server side.
> >>>
> >>> The general pattern for these things is to write a view that only
> >>> includes the documents that need to be changed and then write
> >>> something that goes through and processes each doc in the view to the
> >>> desired form (that removes it from the view). This way you can easily
> >>> know when you're done working. Its definitely possible to write
> >>> something that stores state and/or just brute force a db scan each
> >>> time you write run the migration.
> >>>
> >>> Performance wise, your first suggestion would probably be the most
> >>> performant although depending on document sizes and latencies it may
> >>> be possible to get better numbers using an update handler but I doubt
> >>> it unless you have huge docs and a super slow connection with high
> >>> latencies.
> >>>
> >>>> Are these options reasonably performant?  If we have to do a
> > mass-update
> >>>> once a deployment, it's not terrible if it's not lightning-speed, but
> > it
> >>>> shouldn't take terribly long.  Also, I have read that update handlers
> >>>> have indexes built against them.  If this is a fire-once option, is
> > that
> >>>> worthwhile?
> >>>
> >>> I'm not sure what you mean that update handlers have indexes built
> >>> against them. That doesn't match anything that currently exist in
> >>> CouchDB.
> >>>
> >>>> Which option is better?  Is there an even better way?
> >>>
> >>> There's nothing better than you're general ideas listed.
> >>>
> >>>> Thanks,
> >>>> Charles
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message