Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 99350 invoked from network); 13 Jan 2009 04:45:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Jan 2009 04:45:02 -0000 Received: (qmail 18375 invoked by uid 500); 13 Jan 2009 04:45:01 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 18342 invoked by uid 500); 13 Jan 2009 04:45:01 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 18331 invoked by uid 99); 13 Jan 2009 04:45:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Jan 2009 20:45:01 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.198.233 as permitted sender) Received: from [209.85.198.233] (HELO rv-out-0506.google.com) (209.85.198.233) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jan 2009 04:44:54 +0000 Received: by rv-out-0506.google.com with SMTP id g37so10911636rvb.35 for ; Mon, 12 Jan 2009 20:44:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=23BrK9WsAJCRG0WTMCppX64aZFjgxqZyS+vPj2eqeFc=; b=Xq28fzrJ358aX1VOPVje4h20uavGiLpjTEQljERpvvtFe9jlqV3/0m/L3CuBSHKtMG 2fTgQGGFgOuxy1O4I3XiqGS1Vw/5tr72KgJ1ODOQ22GrJbeQmJwPluBw9/eZ0wom5OzZ YJKckb3jpE/7W5xIe7LC0BHSYv/9X6owiFP4Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=kJCW3GdlbVUOkM+2EXDnwEBmUztBW5su9DWPH9O9lXQ6C3kvK2cOg6lxIOE2Lr6CXI hT8y11wX8CjKzk1Ub+LFAlGHHslXdAXb52emSJIrzXhh5O5IoTLupoPQq0BsmcYjpl2f EUr0Ut3pnJrRTdg83cOv4barJooiOSmRUIYao= Received: by 10.140.125.1 with SMTP id x1mr7079960rvc.265.1231821874556; Mon, 12 Jan 2009 20:44:34 -0800 (PST) Received: by 10.141.75.15 with HTTP; Mon, 12 Jan 2009 20:44:34 -0800 (PST) Message-ID: Date: Mon, 12 Jan 2009 23:44:34 -0500 From: "Paul Davis" To: user@couchdb.apache.org Subject: Re: Can I guarantee uniqueness in a field without using _id? In-Reply-To: <518815b70901122020v51be8478t6ef2c482ba346dae@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <518815b70901111530o23f6ce31yb40f974ea85467c0@mail.gmail.com> <25893274-A543-4F9A-9FE3-A8E93938051B@apache.org> <518815b70901121417ldf0d691l2838488fd5684697@mail.gmail.com> <518815b70901121505l2fd3fc21uab28dcdaf419103c@mail.gmail.com> <20090113011256.GN11136@translab.its.uci.edu> <6B1E25BD-C88A-4F24-8DFC-196B554514EC@gmail.com> <518815b70901122020v51be8478t6ef2c482ba346dae@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Jan 12, 2009 at 11:20 PM, Sunny Hirai wrote: > Thanks for all the responses. > > Just to clarify, I know that CouchDB is not relational and I know the > primary differences and limitations; however, I still have some questions if > you will permit. > We permit questions. :) Especially well thought out ones like you have below. > While it could be noted as a weakness of my implementation, the other thing > to note is that _id can then no longer be used generically. For example, I > can not include a reference from another SQL database unless I make the > reference a very long String encoded with all the unique values which seems > to be a bad way to relate tables/databases. > Hence the MD5 suggestion. It's a length limited string that (while probablistically, similar to UUIDs) guarantee global identity. > Note that there is no way to handle two unique fields (e.g. "name" and > "email" both unique). > Well, MD5 the string representation of them would be fine. (Assumptions are obvious) > I know that CouchDB has different pros and cons from relational databases > and I'm okay with there being cons. I just want to make sure that what I'm > asking is (a) impossible to do because of the way CouchDB works or (b) a > design choice that is not constrained by CouchDB. > > The reason I ask is that there appears to be some sort of a lock somewhere > to assure that you don't end up with two documents with the same id. For > example, if you PUT two new documents at exactly the same time in the same > server, one of them will fail because it will not have a "_rev". This is > actually an assertion, I'm not sure to be true. > Let me start with the fact that this is an assertion that is mostly true and only became slightly untrue in the last week or so. More on this below. > I understand that two documents PUT into two different servers that > replicate the same database can conflict; however, can two of the documents > conflict in the same database? > > If they CAN conflict, then the guaranteed uniqueness on a single server is > not actually guaranteed upon a successful insert. They can both "succeed" > but they can be in conflict. This could be bad for doing things like > creating a new user as two users can be granted the same name successfully > but only one actually gets it. In this case, the solutions to use _id to > guarantee a unique name can actually fail anyways, even though it may be > rare. > You're hitting on a part of the implementation that probably solely resides in Damien's head at this moment. I haven't seen the end implementation so I'll only be able to give you my best guess on what will happen in the coming days/weeks/months. > On the other hand, if _id CANNOT conflict within the same server, then it > appears there is some sort of lock somewhere. It might be very light, or > small, or whatever, but then there is a lock. > Its optional now with the "X-Couch-Full-Commit: true" header will ensure a full commit. > So, in other words, I would like to know which one is true: > > A. there can be conflicts _id conflicts on the same server. In that case, > _id doesn't guarantee uniqueness in the sense that two records can be > inserted successfuly, but only one is authoritative. Then I have to deal > with this somehow anyways. > > B. there aren't conflicts on the same server so you are guaranteed > uniqueness on the same server. The _id hack always works. In this case could > we not consider a similar situation to guarantee unique fields, perhaps in > the far (far) future? Even if not, I'd like to know that there can be no > conflicts on the same server. > > C. Something else completely that allows both a conflict-free _id in a > manner that is simultaneously lock free that I haven't thought of. > This is all from memory without reading or using any of this new code yet, but the situation is something like the following. Remember, I'm not entirely certain on all these things, its 11:33, and I've had beers. Please no pointing and laughing. Briefly: Old school style: Single node couchdb ensured global uniqueness when using PUT. When using POST to _bulk_docs there were transactional semantics, if one of the docs failed all failed. New school style: Giving transactional semantics on _bulk_docs is inefficient to do when contemplating multi-node setups. CouchDB multi-node setups refers to having the transparent Couch automagically hashes documents and distributes accordingly. Uncertain style: Damien commited code to make the transaction semantics optional using a header for the request. This was presented in terms of _bulk_docs. I have no idea how it affects PUT semantics on a single node or otherwise. Certainly muddy waters uncertain style: Given that I have no idea on the specifics of the header flag, if it's specified then I would be running under the assumption that you will get a notification that something at least conflicted or it might fail the request. Moving on... So the idea is that you're either going to wait for a possibly super long time for a transaction, or write code that deals with conflicts. The recommendation is that you write code that deals with conflicts. I'm sure I futzed something in there, so wait for corrections before you come to any grand conclusins ;) HTH, Paul Davis > Thanks for the feedback. > > Sunny >