Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D06D1FD41 for ; Thu, 9 May 2013 03:25:35 +0000 (UTC) Received: (qmail 57536 invoked by uid 500); 9 May 2013 03:25:32 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 57362 invoked by uid 500); 9 May 2013 03:25:29 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 57320 invoked by uid 99); 9 May 2013 03:25:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 03:25:27 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_10_20 X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [178.209.62.152] (HELO s1.neomailbox.com) (178.209.62.152) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 03:25:20 +0000 Message-ID: <518B16F5.2040306@alumni.gwu.edu> Date: Wed, 08 May 2013 23:24:37 -0400 From: "Charles S. Koppelman-Milstein" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Mass updates Content-Type: multipart/alternative; boundary="------------010505010402040602060009" X-Virus-Checked: Checked by ClamAV on apache.org --------------010505010402040602060009 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I am trying to understand whether Couch is the way to go to meet some of my organization's needs. It seems pretty terrific. The main concern I have is maintaining a consistent state across code releases. Presumably, our data model will change over the course of time, and when it does, we need to make the several million old documents conform to the new model. Although I would love to pipe a view through an update handler and call it a day, I don't believe that option exists. The two ways I understandto do this are: 1. Query all documents, update each doc client-side, and PUT those changes in the _bulk_docs API (presumably this should be done in batches) 2. Query the ids for all docs, and one at a time, PUT them through an update handler Are these options reasonably performant? If we have to do a mass-update once a deployment, it's not terrible if it's not lightning-speed, but it shouldn't take terribly long. Also, I have read that update handlers have indexes built against them. If this is a fire-once option, is that worthwhile? Which option is better? Is there an even better way? Thanks, Charles --------------010505010402040602060009--