incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Mass updates
Date Thu, 09 May 2013 11:16:42 GMT
Rebuilding the views mentioned by James is hell! And the more docs and
views you have, the longer your views will have to catch up with the
updates. We dont have the best of the servers, but ours (dedicated) took
several hours to rebuild our views (not too many as well) after we inserted
~150k documents (we use full text search with Lucene as well, so it also
contributed to the overall sever slowdown).

So my suggestion is:

1. Once you want to migrate your stuff, make a copy of your db.
2. Do migration on the copy
3. Allow for views to rebuild (you need to query each desing's document
single view once to trigger for views to start catching up with the
updates). You'd probably ask, if it was possible to limit resource usage of
Couch, when views are rebuilding, but i dont have answer to that question.
Maybe someone else can help here...
4. Switch database pointer from one DB to another.




On Thu, May 9, 2013 at 1:41 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:

> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
> <ckoppel@alumni.gwu.edu> wrote:
> > I am trying to understand whether Couch is the way to go to meet some of
> > my organization's needs.  It seems pretty terrific.
> > The main concern I have is maintaining a consistent state across code
> > releases.  Presumably, our data model will change over the course of
> > time, and when it does, we need to make the several million old
> > documents conform to the new model.
> >
> > Although I would love to pipe a view through an update handler and call
> > it a day, I don't believe that option exists.  The two ways I
> > understandto do this are:
> >
> > 1. Query all documents, update each doc client-side, and PUT those
> > changes in the _bulk_docs API (presumably this should be done in batches)
> > 2. Query the ids for all docs, and one at a time, PUT them through an
> > update handler
> >
>
> You are correct that there's no server side way to do a migration like
> you're asking for server side.
>
> The general pattern for these things is to write a view that only
> includes the documents that need to be changed and then write
> something that goes through and processes each doc in the view to the
> desired form (that removes it from the view). This way you can easily
> know when you're done working. Its definitely possible to write
> something that stores state and/or just brute force a db scan each
> time you write run the migration.
>
> Performance wise, your first suggestion would probably be the most
> performant although depending on document sizes and latencies it may
> be possible to get better numbers using an update handler but I doubt
> it unless you have huge docs and a super slow connection with high
> latencies.
>
> > Are these options reasonably performant?  If we have to do a mass-update
> > once a deployment, it's not terrible if it's not lightning-speed, but it
> > shouldn't take terribly long.  Also, I have read that update handlers
> > have indexes built against them.  If this is a fire-once option, is that
> > worthwhile?
> >
>
> I'm not sure what you mean that update handlers have indexes built
> against them. That doesn't match anything that currently exist in
> CouchDB.
>
> > Which option is better?  Is there an even better way?
> >
>
> There's nothing better than you're general ideas listed.
>
> > Thanks,
> > Charles
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message