Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 69684 invoked from network); 18 Sep 2008 15:14:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Sep 2008 15:14:03 -0000 Received: (qmail 40741 invoked by uid 500); 18 Sep 2008 15:13:58 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 40714 invoked by uid 500); 18 Sep 2008 15:13:58 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 40703 invoked by uid 99); 18 Sep 2008 15:13:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Sep 2008 08:13:58 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of super.ronny@gmail.com designates 74.125.78.144 as permitted sender) Received: from [74.125.78.144] (HELO ey-out-1920.google.com) (74.125.78.144) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Sep 2008 15:12:58 +0000 Received: by ey-out-1920.google.com with SMTP id 4so2099781eyg.54 for ; Thu, 18 Sep 2008 08:13:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=B3EOWkaK9HbJY5FPzXd0hOgJTqiNoWH0Z1e1vM9/EZU=; b=RXvmrAjCiSlJGFpJZ7cLrLbb4hoUhVaBqoYzjyMjbxglmYh/bmJlLmkXjYEdcYdFcO 81bLrgAjT3RttSz1+BTSf2bWGfhjhprcqGpQUddudi37FkatGAZmsdDZX0Fmj0OaTxHh RbB9ACvZdsFJ5oF2g1hTbILx5cE5hg6O0LPFo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=D4EDQ2PBZInfdkq8ssv1aYHegeBfQ9P9RSyqHXHG34pPfLxMYcDxvPP+f0bF+zONfK Qy6xCmscd7HYO8dKasUq8bK0d1ovAXoVtIDy24qSQvLRH3wgFlWWPekLsY6+sBKJF3XI LcQToZGHohGDaWG/l2opbP3eFyui/EemKL908= Received: by 10.210.133.2 with SMTP id g2mr5108738ebd.68.1221750809357; Thu, 18 Sep 2008 08:13:29 -0700 (PDT) Received: by 10.210.118.10 with HTTP; Thu, 18 Sep 2008 08:13:29 -0700 (PDT) Message-ID: <5871b9da0809180813pb68f9f0nde48e382080b2f7d@mail.gmail.com> Date: Thu, 18 Sep 2008 17:13:29 +0200 From: "Ronny Hanssen" To: couchdb-user@incubator.apache.org Subject: Re: Bulk Load In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_22275_1596528.1221750809346" References: <48CBEAC7.1030403@yahoo.com> <5871b9da0809140245j6a3933b8r6755f801e592b05d@mail.gmail.com> <076CE1E1-568E-4891-92BF-84E79082A7AB@apache.org> <5871b9da0809141449o396a14fehac44316cdeac435c@mail.gmail.com> <5871b9da0809171835v1ed56f4bv5761eaaf901d394d@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_22275_1596528.1221750809346 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Ok, I get it... I understand bulk_docs is atomic, but I missed out on that you actually preserved the *original* doc.id (doh). I thought that with clone you meant a new doc in CouchDB, with it's own id. And I just couldn't understand why you did that :). This now makes more sense to me. Sorry. > As to replication, what you'd need is a flag that says if a particular > node is the head node. Then your history docs should never clash. If > you get conflicts on the head node you resolve them and store all > conflicting previous revisions. In this manner your linked list > becomes a linked directed acyclic graph. (Yay college) This does mean > that at any given point in the history you could possibly have > multiple versions of the same doc, but replication works. Ok, but how is that flag supposed to be set? At the time of inserting with _bulk_docs the system needs to update the current, which means that any node racing during an update will flag it to be current and actual. Which means that replication in race conditions will conflict(?). I am just asking because the single node case could be handled by the internal CouchDB revision control. So, using the elaborate scheme you propose isn't really helping for that scenario. My impression was that we cannot use the internal CouchDB due to the difficulties in handling conflicts with multiple nodes involved (because conflicts could/would occur), and that this would be better handled by manual hand-coded rev-control. It seems to me that there are no solutions on how to do this by hand coding either. So, it seems we are saying "don't use the built-in rev-control for rev-control of data" to avoid people blaming CouchDB when the built in "revision control" conflicts. Thanks for your patience guys. ~Ronny 2008/9/18 Paul Davis > Ronny, > > There are two points that I think you're missing. > > 1. _bulk_docs is atomic. As in, if one doc fails, they all fail. > 2. I was trying to make sure that the latest _id of a doc is constant. > > Think of this as a linked list. You grab the head document (most > current revision) and clone it. Then we change the uuid of the second > doc and make our pointer links to fit into the list. Then after making > the necessary changes, we edit the head node to our desire. Now we > post *both* (in the same HTTP request!) docs to _bulk_docs. This > ensures that if someone else edited this particular doc, the revisions > will be different and the second edit would fail. Thus, on success 2 > docs are inserted, on failure, 0 docs. > > As to replication, what you'd need is a flag that says if a particular > node is the head node. Then your history docs should never clash. If > you get conflicts on the head node you resolve them and store all > conflicting previous revisions. In this manner your linked list > becomes a linked directed acyclic graph. (Yay college) This does mean > that at any given point in the history you could possibly have > multiple versions of the same doc, but replication works. > > For views, you'd just want to have a flag that says "Not the most > recent version." Then in your view you would know whether to emit > key/value pairs for it. This could be something like "No next version > pointer" or some such. Actually, this couldn't be a next pointer > without two initial gets because you'd need to get the head node and > next node. A boolean flag indicating head node status would be > sufficient though. And then you could have a history view if you ever > need to walk from tail to head > > HTH, > Paul > > > On Wed, Sep 17, 2008 at 9:35 PM, Ronny Hanssen > wrote: > > Hm. > > > > In Paul's case I am not 100% sure what is going on. Here's a use case for > > two concurrent edits: > > * First two users get the original. > > * Both makes a copy which they save. > > This means that there are two fresh docs in CouchDB (even on a single > > node). > > * Save the original using a new doc._id (which the copy is to persist in > > copy.previous_version). > > This means that the two new docs know where to find their previous > > versions. The problem I have with this scheme is that every change of a > > document means that it needs to store not only the new version, but also > > it's old version (in addition to the original). The fact that two racing > > updates will generate 4(!) new docs in addition to the original document > is > > worrying. I guess Paul also want the original to be marked as deleted in > the > > _bulk_docs? But, in any case the previous version are now new two new > docs, > > but they look exactly the same, except for the doc._id, naturally... > > > > Wouldn't this be enough Paul? > > 1. old = get_doc() > > 2. update = clone(old); > > 3. update.previous_version = old._id; > > 4. post via _bulk_docs > > > > This way there won't be multiple old docs around. > > > > Jan's way ensures that for a view there is always only one current > version > > of a doc, since it is using the built-in rev-control. Competing updates > on > > the same node may fail which is then what CouchDB is designed to handle. > If > > on different nodes, then the rev-control history might come "out of > synch" > > via concurrent updates. How does CouchDB handle this? Which update wins? > On > > a single node this is intercepted when saving the doc. For multiple nodes > > they might both get a response saying "save complete". So, these then > needs > > merging. How is that done? Jan further on secures the previous version by > > storing the previous version as a new doc, allowing them to be persisted > > beyond compaction. I guess Jan's sample would benefit nicely from > _bulk_docs > > too. I like this method due to the fact that it allows only one current > doc. > > But, I worry about how revision control handles conflicts, Jan? > > > > Paul and my updated suggestion always posts new versions, not using the > > revision system at all. The downside is that there may be multiple > current > > versions around... And this is a bit tricky I believe... Anyone? > > > > Paul's suggestion also keeps multiple copies of the previous version. I > am > > not sure why, Paul? > > > > > > Regards, > > Ronny > > > > 2008/9/17 Paul Davis > > > >> Good point chris. > >> > >> On Wed, Sep 17, 2008 at 11:39 AM, Chris Anderson > >> wrote: > >> > On Wed, Sep 17, 2008 at 11:34 AM, Paul Davis > >> > wrote: > >> >> Alternatively something like the following might work: > >> >> > >> >> Keep an eye on the specifics of _bulk_docs though. There have been > >> >> requests to make it non-atomic, but I think in the face of something > >> >> like this we might make non-atomic _bulk_docs a non-default or some > >> >> such. > >> > > >> > I think the need for non-transaction bulk-docs will be obviated when > >> > we have the failure response say which docs caused failure, that way > >> > one can retry once to save all the non-conflicting docs, and then loop > >> > back through to handle the conflicts. > >> > > >> > upshot: I bet you can count on bulk docs being transactional. > >> > > >> > > >> > -- > >> > Chris Anderson > >> > http://jchris.mfdz.com > >> > > >> > > > ------=_Part_22275_1596528.1221750809346--