Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 26362 invoked from network); 7 Mar 2009 10:56:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Mar 2009 10:56:11 -0000 Received: (qmail 98157 invoked by uid 500); 7 Mar 2009 10:56:09 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 98105 invoked by uid 500); 7 Mar 2009 10:56:09 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 98090 invoked by uid 99); 7 Mar 2009 10:56:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2009 02:56:09 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2009 10:56:01 +0000 Received: from [10.0.1.6] (e178247183.adsl.alicedsl.de [::ffff:85.178.247.183]) (AUTH: LOGIN jan, TLS: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Sat, 07 Mar 2009 10:55:37 +0000 Cc: dev@couchdb.apache.org Message-Id: <1373E91E-28E6-4D5D-8A40-35CE05947B5D@apache.org> From: Jan Lehnardt To: user@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Best way to "migrate" (a la Rails) Couch documents Date: Sat, 7 Mar 2009 11:55:06 +0100 References: <49B165E3.4040801@proven-corporation.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org CC'ing dev@ because it is a dev issue. On 6 Mar 2009, at 19:17, Chris Anderson wrote: > On Fri, Mar 6, 2009 at 10:05 AM, Jason Smith > wrote: >> Hi, list. >> >> While I am happy to be learning Couch for a new project, I am still >> unsure >> about some tricks that I used with Django and Rails, such as data >> migration: >> >> For example, suppose I change my code and instead of using a string >> timestamp in my documents, I would prefer a hash with "day", >> "month", and >> "year" keys. When I deploy the new code into production, obviously >> I want >> the data structures to change for all existing documents. >> >> So my question is: What is the preferred or recommended method to >> do this? >> So far, the only thing I can think of is to write some client code >> to do >> the following: >> >> 1. Fetch _all_docs >> 2. For each document that requires changing, modify it >> 3. Either PUT the new documents up one by one, or POST them to >> _bulk_docs, >> depending on the situation. >> >> This solution doesn't strike me as particularly horrible, but I was >> wondering if there is a better way, perhaps something server-side. > > This is basically the way to do it. If you want to be sure you've got > it right, the thing to do is create a view that emits for all docs > with the old timestamp format. Then you can process docs from that > view, until it is empty. This way you can be sure no docs slip through > the cracks. > > A migration function, written in JavaScript, and executed on the > server, can fit the CouchDB model, it just has not been implemented > yet. So the above is the way to proceed for the foreseeable future. It occurred to me that the easiest way to implement this would be the introduction of a "compaction function". Instead of sending an empty POST request to `/db/_compact` a user sends a JSON body that includes a compaction function and potentially options (or just the plain JS function, doesn't matter). The compaction routine would then launch a query server and pipe all latest documents through the function and write out the results into the new DB. Of course, the current behaviour stays in place and remains the default case. The proposed method would only help with changing large deployment situations. One problem I see is timing issues with client-code and multiple nodes. Client libs wouldn't know when to expect which document structure or would have to be needlessly complex. But I think that's a deployment issue in general and CouchDB could provide notifications to help with that, but not generally solve that problem. Is this worth thinking about? Cheers Jan --