Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E682D115A5 for ; Mon, 13 May 2013 06:51:06 +0000 (UTC) Received: (qmail 20501 invoked by uid 500); 13 May 2013 06:25:13 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 20285 invoked by uid 500); 13 May 2013 06:25:06 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 20195 invoked by uid 99); 13 May 2013 06:25:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 06:25:01 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lancecarlson@gmail.com designates 209.85.219.50 as permitted sender) Received: from [209.85.219.50] (HELO mail-oa0-f50.google.com) (209.85.219.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 06:24:53 +0000 Received: by mail-oa0-f50.google.com with SMTP id l20so4146731oag.9 for ; Sun, 12 May 2013 23:24:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=ZdFt2y02oW4uZmRki0Kscfbm7Pd74e6CuLxKyS0eDGM=; b=kPvATY59Oa90OcSEFmGGdlz07od4IMm4pvSCAIIxzLimIk5SkflXlgOAqHBy2Hbiio F34EdqIez9+6gO0tbzjZCdHMxzWcABDeMa41OJH/MTqqpgQ3RGnpDi3KN5J2sB+bP8rs 1v3FdtWu16/dpgKXmkUPhs3Uf/H51OhQQY063/jy0l208Yzv6VeeHQ0gO8fxW7cYvSm7 KdUUgvfkjp0HCqowb2aK4a2+O1CT1kXwdljOf3qv1BCuOjHYBWSKtp1xauEwCYeEgGbQ Ax8dCj2G9/WZ7+mF0AgSdbEijbk+wFfohP79yWKrrDHfycBCoWc8G2+z3pxRxpnXI/F8 sHRQ== X-Received: by 10.182.241.134 with SMTP id wi6mr12260565obc.46.1368426272380; Sun, 12 May 2013 23:24:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.93.99 with HTTP; Sun, 12 May 2013 23:24:12 -0700 (PDT) In-Reply-To: <5A772B81-F184-4E2D-82E9-A96D094632D9@gmail.com> References: <518B16F5.2040306@alumni.gwu.edu> <5A772B81-F184-4E2D-82E9-A96D094632D9@gmail.com> From: Lance Carlson Date: Mon, 13 May 2013 02:24:12 -0400 Message-ID: Subject: Re: Mass updates To: "user@couchdb.apache.org" Content-Type: multipart/alternative; boundary=001a11c2b0eaddde3604dc9391fd X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2b0eaddde3604dc9391fd Content-Type: text/plain; charset=ISO-8859-1 Made a lot of updates to my couchout project. It now includes a couchin project as well. Might create another project for updating, but it's pretty easy for someone to script a node js script (or any language for that matter) that connects to redis, decodes and encodes base64. On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov < andrey.kouprianov@gmail.com> wrote: > We do that and we have a cron to touch view every 5 min. Its just that at > that particular time we had to insert those 150k in one go (we were > migrating from mysql) > > Sent from my iPhone > > On 11 May, 2013, at 1:02 PM, Benoit Chesneau wrote: > > > On May 9, 2013 1:17 PM, "Andrey Kuprianov" > > wrote: > >> > >> Rebuilding the views mentioned by James is hell! And the more docs and > >> views you have, the longer your views will have to catch up with the > >> updates. We dont have the best of the servers, but ours (dedicated) took > >> several hours to rebuild our views (not too many as well) after we > > inserted > >> ~150k documents (we use full text search with Lucene as well, so it also > >> contributed to the overall sever slowdown). > >> > >> So my suggestion is: > >> > >> 1. Once you want to migrate your stuff, make a copy of your db. > >> 2. Do migration on the copy > >> 3. Allow for views to rebuild (you need to query each desing's document > >> single view once to trigger for views to start catching up with the > >> updates). You'd probably ask, if it was possible to limit resource usage > > of > >> Couch, when views are rebuilding, but i dont have answer to that > question. > >> Maybe someone else can help here... > >> 4. Switch database pointer from one DB to another. > > > > You don' t need to wait that all the docs are here to triggerthe > viewupdat, > > Jus trigger it more often. So view calculation will happen on smaller > set. > > > > You caneven make it //by using different ddocs. > >> > >> > >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis >> wrote: > >> > >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein > >>> wrote: > >>>> I am trying to understand whether Couch is the way to go to meet some > > of > >>>> my organization's needs. It seems pretty terrific. > >>>> The main concern I have is maintaining a consistent state across code > >>>> releases. Presumably, our data model will change over the course of > >>>> time, and when it does, we need to make the several million old > >>>> documents conform to the new model. > >>>> > >>>> Although I would love to pipe a view through an update handler and > > call > >>>> it a day, I don't believe that option exists. The two ways I > >>>> understandto do this are: > >>>> > >>>> 1. Query all documents, update each doc client-side, and PUT those > >>>> changes in the _bulk_docs API (presumably this should be done in > > batches) > >>>> 2. Query the ids for all docs, and one at a time, PUT them through an > >>>> update handler > >>> > >>> You are correct that there's no server side way to do a migration like > >>> you're asking for server side. > >>> > >>> The general pattern for these things is to write a view that only > >>> includes the documents that need to be changed and then write > >>> something that goes through and processes each doc in the view to the > >>> desired form (that removes it from the view). This way you can easily > >>> know when you're done working. Its definitely possible to write > >>> something that stores state and/or just brute force a db scan each > >>> time you write run the migration. > >>> > >>> Performance wise, your first suggestion would probably be the most > >>> performant although depending on document sizes and latencies it may > >>> be possible to get better numbers using an update handler but I doubt > >>> it unless you have huge docs and a super slow connection with high > >>> latencies. > >>> > >>>> Are these options reasonably performant? If we have to do a > > mass-update > >>>> once a deployment, it's not terrible if it's not lightning-speed, but > > it > >>>> shouldn't take terribly long. Also, I have read that update handlers > >>>> have indexes built against them. If this is a fire-once option, is > > that > >>>> worthwhile? > >>> > >>> I'm not sure what you mean that update handlers have indexes built > >>> against them. That doesn't match anything that currently exist in > >>> CouchDB. > >>> > >>>> Which option is better? Is there an even better way? > >>> > >>> There's nothing better than you're general ideas listed. > >>> > >>>> Thanks, > >>>> Charles > >>> > --001a11c2b0eaddde3604dc9391fd--