Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 9188 invoked from network); 21 Dec 2008 04:11:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Dec 2008 04:11:37 -0000 Received: (qmail 33355 invoked by uid 500); 21 Dec 2008 04:11:30 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 33276 invoked by uid 500); 21 Dec 2008 04:11:30 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 33254 invoked by uid 500); 21 Dec 2008 04:11:30 -0000 Delivered-To: apmail-incubator-couchdb-dev@incubator.apache.org Received: (qmail 33239 invoked by uid 99); 21 Dec 2008 04:11:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Dec 2008 20:11:30 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of antony.blakey@gmail.com designates 209.85.198.251 as permitted sender) Received: from [209.85.198.251] (HELO rv-out-0708.google.com) (209.85.198.251) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Dec 2008 04:11:22 +0000 Received: by rv-out-0708.google.com with SMTP id k29so1965508rvb.0 for ; Sat, 20 Dec 2008 20:11:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=5gx3aV0Fs1sKNowDbdp8GtxQ1zXU/gORjj8uhgOqU0w=; b=cTsPbtQl3ttdsoQBYCLIxkI2QHevFTnadhLcMvtFFaIyjg1CB3xm8yiW4VOaA8RYXS bZT0qcMuTJq6PMQjLRngdLJTvdhYJKfSayt0mG1G65KxYTIUYJIOlR/o5n9wpKRMVu4z K34Lj5U8jUOAeUvhpTDSc4Z0YnA5cm+gFQCj0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=WL6YFlRAsk8XZz7BF95GoTe8yrlsosLW+DnPn4r4+Vo8I3KUnpjJB54p2kqgJz4cSa eMWdLk2fLht0KJ8UfqyuvlrvBVAHE9EGmnc8QGnigDMdpid6Z97HZvYU1vpZzUHrPYaj 95AbJom2uo1JxFHZEebo8pzlrrU6RIkRs8new= Received: by 10.141.48.10 with SMTP id a10mr2452620rvk.266.1229832662160; Sat, 20 Dec 2008 20:11:02 -0800 (PST) Received: from ?192.168.0.16? (ppp121-45-41-103.lns10.adl2.internode.on.net [121.45.41.103]) by mx.google.com with ESMTPS id c20sm13907258rvf.5.2008.12.20.20.11.00 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 20 Dec 2008 20:11:01 -0800 (PST) Message-Id: <6FC53902-7F6B-448C-A2E0-F82E027117B9@gmail.com> From: Antony Blakey To: Tim Parkin , couchdb-dev@incubator.apache.org, couchdb-user@incubator.apache.org In-Reply-To: <27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com> Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: couchdb (_external consistency issues and proposals) Date: Sun, 21 Dec 2008 14:40:57 +1030 References: <494D94AF.8040305@timparkin.co.uk> <27BFB81F-9AE2-419C-8911-3F26D82A7A44@gmail.com> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org (Posted to -dev because it has some development issues) This is wrong BTW: > elsif doc["Type"] =3D=3D "user" > doc["Roles"] && doc["Roles"].each do |r| > db.execute("replace into links values (?, ?, ?)", =20 > db_name, doc_id, r); > end because it doesn't handle modifications correctly. In my production =20 code I do this: db.execute("delete from links where db =3D ? and src =3D ?", db_name, = =20 doc_id); doc["Roles"] && doc["Roles"].each do |r| db.execute("insert into links values (?, ?, ?)", db_name, doc_id, =20= r); end i.e. always delete and recreate the derived document. You can do =20 incremental updates by reading from your indexes before updating. You =20= cannot reliably get the previous rev (for differencing) because it may =20= not exist. My code also doesn't handle a database being deleted and then re-=20 created - the _external will think it has valid records, but they =20 belong to a previous database. You could do that through =20 notifications, but once again I think it needs to be synchronous if =20 you want to reason about it. A likely-to-work-most-of-the-time =20 solution would be to detect update_seq < stored_update_seq. A better =20 solution would be for each db to have a UUID, so that you don't have =20 to rely on the name as the identity. Also, if your _external doesn't get triggered for a long time, and =20 while it's 'dormant' a document is deleted and the db is compacted, =20 you could miss deletions. One solution to that is that every _external =20= needs to be notified (synchronously) before a compaction so that it =20 can update to the update_seq of the MVCC snapshot that the compaction =20= will operate against. IMO a better solution is to have two UUID's for =20= the database - one is per database, and one is 'per compaction'. Thus =20= an external will know if it needs to revalidate all the documents it =20 has indexed to check for missed deletions updates. You could just have =20= a per-compaction UUID, which would change if a db was deleted and then =20= created, this triggering the same codepath, but this is a lot more =20 expensive than knowing that the entire db Finally, note that this external operates for *every* database, =20 whereas you may want to enable and configure it using a design =20 document. Thus your external should always monitor updated design =20 documents and check for enablement. You can record the configuration =20 in the database (and cache it in the _external) and just ignore all =20 other changes. Personally I don't bother because the lazy-creation =20 means that no work is done unless I do an _external query, so =20 databases which don't get queried, don't incur a cost, and I have no =20 configuration data. That's another reason to prefer a passive UUID-based identity scheme =20 for db-create/delete and compaction detection rather than a =20 notification system. It would be good if each DB had two UUIDs, one per-db and one per-=20 compaction i.e. changed in the MVCC snapshot during a compaction, and =20= that these be provided to every _external request. Antony Blakey -------------------------- CTO, Linkuistics Pty Ltd Ph: 0438 840 787 If at first you don=92t succeed, try, try again. Then quit. No use being = =20 a damn fool about it -- W.C. Fields