Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E59D9FC54 for ; Thu, 9 May 2013 12:17:32 +0000 (UTC) Received: (qmail 85352 invoked by uid 500); 9 May 2013 12:17:31 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 85239 invoked by uid 500); 9 May 2013 12:17:28 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 85196 invoked by uid 99); 9 May 2013 12:17:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 12:17:27 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrey.kouprianov@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 12:17:22 +0000 Received: by mail-vc0-f175.google.com with SMTP id lf10so2629926vcb.34 for ; Thu, 09 May 2013 05:17:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=HX7fdpXRCkRqJ78d5h8xWv+Zt4e5t29YvCgRrgKYrcc=; b=om8zBcuVp3dVy+bFnRx44hkd4q0u0ElrjsT2S9hkQfcpKMo8JKFFMkDFY8bkCX2uxx elDEg8aQ1OqvHEFUTz1Erw8adw26iyCXlj9lKMVLER4vFDMbuB4RqNrTvA2zjNAY9bEK j6SyQBkaVEW8k7guHe+gILpP50ZnLPUekCvGlGsgwyrQaU+vWnKUPvtI+uGvzlQjH2Rs KN+NcVr2k38/slP2uhpUrCLsGoVn3HUgzzgynWe9L9QeAcTQK9q8XbJhd0bZeA0V4Oim aYHNmgcfxmqiaRGUd1e4zDuMnRV++0+VCcQsCp5XvA9257cAn5bnlZHI/9+uhannB13u 68/Q== X-Received: by 10.52.90.112 with SMTP id bv16mr6341767vdb.62.1368101822090; Thu, 09 May 2013 05:17:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.12.135 with HTTP; Thu, 9 May 2013 05:16:42 -0700 (PDT) In-Reply-To: References: <518B16F5.2040306@alumni.gwu.edu> From: Andrey Kuprianov Date: Thu, 9 May 2013 20:16:42 +0800 Message-ID: Subject: Re: Mass updates To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf307f38181f523304dc480792 X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f38181f523304dc480792 Content-Type: text/plain; charset=ISO-8859-1 Regarding cpu usage limiting. I've just tried cpulimit and it works great. http://superuser.com/questions/442970/limit-a-processes-cpu-usage-methods On Thu, May 9, 2013 at 7:18 PM, Robert Newson wrote: > > http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment > > > On 9 May 2013 12:16, Andrey Kuprianov wrote: > > Rebuilding the views mentioned by James is hell! And the more docs and > > views you have, the longer your views will have to catch up with the > > updates. We dont have the best of the servers, but ours (dedicated) took > > several hours to rebuild our views (not too many as well) after we > inserted > > ~150k documents (we use full text search with Lucene as well, so it also > > contributed to the overall sever slowdown). > > > > So my suggestion is: > > > > 1. Once you want to migrate your stuff, make a copy of your db. > > 2. Do migration on the copy > > 3. Allow for views to rebuild (you need to query each desing's document > > single view once to trigger for views to start catching up with the > > updates). You'd probably ask, if it was possible to limit resource usage > of > > Couch, when views are rebuilding, but i dont have answer to that > question. > > Maybe someone else can help here... > > 4. Switch database pointer from one DB to another. > > > > > > > > > > On Thu, May 9, 2013 at 1:41 PM, Paul Davis >wrote: > > > >> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein > >> wrote: > >> > I am trying to understand whether Couch is the way to go to meet some > of > >> > my organization's needs. It seems pretty terrific. > >> > The main concern I have is maintaining a consistent state across code > >> > releases. Presumably, our data model will change over the course of > >> > time, and when it does, we need to make the several million old > >> > documents conform to the new model. > >> > > >> > Although I would love to pipe a view through an update handler and > call > >> > it a day, I don't believe that option exists. The two ways I > >> > understandto do this are: > >> > > >> > 1. Query all documents, update each doc client-side, and PUT those > >> > changes in the _bulk_docs API (presumably this should be done in > batches) > >> > 2. Query the ids for all docs, and one at a time, PUT them through an > >> > update handler > >> > > >> > >> You are correct that there's no server side way to do a migration like > >> you're asking for server side. > >> > >> The general pattern for these things is to write a view that only > >> includes the documents that need to be changed and then write > >> something that goes through and processes each doc in the view to the > >> desired form (that removes it from the view). This way you can easily > >> know when you're done working. Its definitely possible to write > >> something that stores state and/or just brute force a db scan each > >> time you write run the migration. > >> > >> Performance wise, your first suggestion would probably be the most > >> performant although depending on document sizes and latencies it may > >> be possible to get better numbers using an update handler but I doubt > >> it unless you have huge docs and a super slow connection with high > >> latencies. > >> > >> > Are these options reasonably performant? If we have to do a > mass-update > >> > once a deployment, it's not terrible if it's not lightning-speed, but > it > >> > shouldn't take terribly long. Also, I have read that update handlers > >> > have indexes built against them. If this is a fire-once option, is > that > >> > worthwhile? > >> > > >> > >> I'm not sure what you mean that update handlers have indexes built > >> against them. That doesn't match anything that currently exist in > >> CouchDB. > >> > >> > Which option is better? Is there an even better way? > >> > > >> > >> There's nothing better than you're general ideas listed. > >> > >> > Thanks, > >> > Charles > >> > --20cf307f38181f523304dc480792--