Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DCC01199D for ; Thu, 9 May 2013 11:17:35 +0000 (UTC) Received: (qmail 39541 invoked by uid 500); 9 May 2013 11:17:34 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 39319 invoked by uid 500); 9 May 2013 11:17:28 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 39279 invoked by uid 99); 9 May 2013 11:17:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 11:17:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrey.kouprianov@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 11:17:23 +0000 Received: by mail-vb0-f44.google.com with SMTP id e13so2482256vbg.3 for ; Thu, 09 May 2013 04:17:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=2dmryeEtgSAAHjJALFT84LZogVbr4yGCu2P1A53ePxU=; b=dTp+velUFmerNyUcSwn9zpeB8JNOLxyD3S0QmC7QYuu8Pm/1f16oQxn4OjnXEDMWIa QWHCI7n40RpNvHmSHKSkJmrv2VNjbpSS6Zvmx0+ZGqGYZ6Aw1ViuOERILXdQbw/zgm+N RMYHwqWS0/il5kCmtU25wGj44um2vuP/v/PObw5GzLakHmYLmWiwmWC1wMtBavhB0zfO wY4R7XIm55PNsxiH3Rs3kyZRPq32jfBQWU5Lap2M0utYN3QWZMguZx6ItOoyTH5oQAZ/ Cy31QEN5epo/5pmbBkNaATKAley+TA152JFZVxqFFauh9QjpBksGpC03GpDoA9/D5rNj j8Lw== X-Received: by 10.58.80.4 with SMTP id n4mr7403104vex.5.1368098222170; Thu, 09 May 2013 04:17:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.12.135 with HTTP; Thu, 9 May 2013 04:16:42 -0700 (PDT) In-Reply-To: References: <518B16F5.2040306@alumni.gwu.edu> From: Andrey Kuprianov Date: Thu, 9 May 2013 19:16:42 +0800 Message-ID: Subject: Re: Mass updates To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=047d7b5d62648cf25704dc473082 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d62648cf25704dc473082 Content-Type: text/plain; charset=ISO-8859-1 Rebuilding the views mentioned by James is hell! And the more docs and views you have, the longer your views will have to catch up with the updates. We dont have the best of the servers, but ours (dedicated) took several hours to rebuild our views (not too many as well) after we inserted ~150k documents (we use full text search with Lucene as well, so it also contributed to the overall sever slowdown). So my suggestion is: 1. Once you want to migrate your stuff, make a copy of your db. 2. Do migration on the copy 3. Allow for views to rebuild (you need to query each desing's document single view once to trigger for views to start catching up with the updates). You'd probably ask, if it was possible to limit resource usage of Couch, when views are rebuilding, but i dont have answer to that question. Maybe someone else can help here... 4. Switch database pointer from one DB to another. On Thu, May 9, 2013 at 1:41 PM, Paul Davis wrote: > On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein > wrote: > > I am trying to understand whether Couch is the way to go to meet some of > > my organization's needs. It seems pretty terrific. > > The main concern I have is maintaining a consistent state across code > > releases. Presumably, our data model will change over the course of > > time, and when it does, we need to make the several million old > > documents conform to the new model. > > > > Although I would love to pipe a view through an update handler and call > > it a day, I don't believe that option exists. The two ways I > > understandto do this are: > > > > 1. Query all documents, update each doc client-side, and PUT those > > changes in the _bulk_docs API (presumably this should be done in batches) > > 2. Query the ids for all docs, and one at a time, PUT them through an > > update handler > > > > You are correct that there's no server side way to do a migration like > you're asking for server side. > > The general pattern for these things is to write a view that only > includes the documents that need to be changed and then write > something that goes through and processes each doc in the view to the > desired form (that removes it from the view). This way you can easily > know when you're done working. Its definitely possible to write > something that stores state and/or just brute force a db scan each > time you write run the migration. > > Performance wise, your first suggestion would probably be the most > performant although depending on document sizes and latencies it may > be possible to get better numbers using an update handler but I doubt > it unless you have huge docs and a super slow connection with high > latencies. > > > Are these options reasonably performant? If we have to do a mass-update > > once a deployment, it's not terrible if it's not lightning-speed, but it > > shouldn't take terribly long. Also, I have read that update handlers > > have indexes built against them. If this is a fire-once option, is that > > worthwhile? > > > > I'm not sure what you mean that update handlers have indexes built > against them. That doesn't match anything that currently exist in > CouchDB. > > > Which option is better? Is there an even better way? > > > > There's nothing better than you're general ideas listed. > > > Thanks, > > Charles > --047d7b5d62648cf25704dc473082--