couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Mass updates
Date Thu, 09 May 2013 05:41:22 GMT
On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
<ckoppel@alumni.gwu.edu> wrote:
> I am trying to understand whether Couch is the way to go to meet some of
> my organization's needs.  It seems pretty terrific.
> The main concern I have is maintaining a consistent state across code
> releases.  Presumably, our data model will change over the course of
> time, and when it does, we need to make the several million old
> documents conform to the new model.
>
> Although I would love to pipe a view through an update handler and call
> it a day, I don't believe that option exists.  The two ways I
> understandto do this are:
>
> 1. Query all documents, update each doc client-side, and PUT those
> changes in the _bulk_docs API (presumably this should be done in batches)
> 2. Query the ids for all docs, and one at a time, PUT them through an
> update handler
>

You are correct that there's no server side way to do a migration like
you're asking for server side.

The general pattern for these things is to write a view that only
includes the documents that need to be changed and then write
something that goes through and processes each doc in the view to the
desired form (that removes it from the view). This way you can easily
know when you're done working. Its definitely possible to write
something that stores state and/or just brute force a db scan each
time you write run the migration.

Performance wise, your first suggestion would probably be the most
performant although depending on document sizes and latencies it may
be possible to get better numbers using an update handler but I doubt
it unless you have huge docs and a super slow connection with high
latencies.

> Are these options reasonably performant?  If we have to do a mass-update
> once a deployment, it's not terrible if it's not lightning-speed, but it
> shouldn't take terribly long.  Also, I have read that update handlers
> have indexes built against them.  If this is a fire-once option, is that
> worthwhile?
>

I'm not sure what you mean that update handlers have indexes built
against them. That doesn't match anything that currently exist in
CouchDB.

> Which option is better?  Is there an even better way?
>

There's nothing better than you're general ideas listed.

> Thanks,
> Charles

Mime
View raw message