couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Mass updates
Date Thu, 09 May 2013 11:18:54 GMT

On 9 May 2013 12:16, Andrey Kuprianov <> wrote:
> Rebuilding the views mentioned by James is hell! And the more docs and
> views you have, the longer your views will have to catch up with the
> updates. We dont have the best of the servers, but ours (dedicated) took
> several hours to rebuild our views (not too many as well) after we inserted
> ~150k documents (we use full text search with Lucene as well, so it also
> contributed to the overall sever slowdown).
> So my suggestion is:
> 1. Once you want to migrate your stuff, make a copy of your db.
> 2. Do migration on the copy
> 3. Allow for views to rebuild (you need to query each desing's document
> single view once to trigger for views to start catching up with the
> updates). You'd probably ask, if it was possible to limit resource usage of
> Couch, when views are rebuilding, but i dont have answer to that question.
> Maybe someone else can help here...
> 4. Switch database pointer from one DB to another.
> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <>wrote:
>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
>> <> wrote:
>> > I am trying to understand whether Couch is the way to go to meet some of
>> > my organization's needs.  It seems pretty terrific.
>> > The main concern I have is maintaining a consistent state across code
>> > releases.  Presumably, our data model will change over the course of
>> > time, and when it does, we need to make the several million old
>> > documents conform to the new model.
>> >
>> > Although I would love to pipe a view through an update handler and call
>> > it a day, I don't believe that option exists.  The two ways I
>> > understandto do this are:
>> >
>> > 1. Query all documents, update each doc client-side, and PUT those
>> > changes in the _bulk_docs API (presumably this should be done in batches)
>> > 2. Query the ids for all docs, and one at a time, PUT them through an
>> > update handler
>> >
>> You are correct that there's no server side way to do a migration like
>> you're asking for server side.
>> The general pattern for these things is to write a view that only
>> includes the documents that need to be changed and then write
>> something that goes through and processes each doc in the view to the
>> desired form (that removes it from the view). This way you can easily
>> know when you're done working. Its definitely possible to write
>> something that stores state and/or just brute force a db scan each
>> time you write run the migration.
>> Performance wise, your first suggestion would probably be the most
>> performant although depending on document sizes and latencies it may
>> be possible to get better numbers using an update handler but I doubt
>> it unless you have huge docs and a super slow connection with high
>> latencies.
>> > Are these options reasonably performant?  If we have to do a mass-update
>> > once a deployment, it's not terrible if it's not lightning-speed, but it
>> > shouldn't take terribly long.  Also, I have read that update handlers
>> > have indexes built against them.  If this is a fire-once option, is that
>> > worthwhile?
>> >
>> I'm not sure what you mean that update handlers have indexes built
>> against them. That doesn't match anything that currently exist in
>> CouchDB.
>> > Which option is better?  Is there an even better way?
>> >
>> There's nothing better than you're general ideas listed.
>> > Thanks,
>> > Charles

View raw message