Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: error (nike.apache.org: local policy)
Message-ID: <518B16F5.2040306@alumni.gwu.edu>
Date: Wed, 08 May 2013 23:24:37 -0400
From: "Charles S. Koppelman-Milstein" <ckoppel@alumni.gwu.edu>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8;
 rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: Mass updates
Content-Type: multipart/alternative;
 boundary="------------010505010402040602060009"

--------------010505010402040602060009
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

I am trying to understand whether Couch is the way to go to meet some of
my organization's needs.  It seems pretty terrific.
The main concern I have is maintaining a consistent state across code
releases.  Presumably, our data model will change over the course of
time, and when it does, we need to make the several million old
documents conform to the new model.

Although I would love to pipe a view through an update handler and call
it a day, I don't believe that option exists.  The two ways I
understandto do this are:

1. Query all documents, update each doc client-side, and PUT those
changes in the _bulk_docs API (presumably this should be done in batches)
2. Query the ids for all docs, and one at a time, PUT them through an
update handler

Are these options reasonably performant?  If we have to do a mass-update
once a deployment, it's not terrible if it's not lightning-speed, but it
shouldn't take terribly long.  Also, I have read that update handlers
have indexes built against them.  If this is a fire-once option, is that
worthwhile?   

Which option is better?  Is there an even better way?

Thanks,
Charles

--------------010505010402040602060009--