Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 46690 invoked from network); 18 Mar 2008 08:04:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Mar 2008 08:04:47 -0000 Received: (qmail 81986 invoked by uid 500); 18 Mar 2008 08:04:44 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 81948 invoked by uid 500); 18 Mar 2008 08:04:44 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 81934 invoked by uid 99); 18 Mar 2008 08:04:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Mar 2008 01:04:44 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [217.197.215.36] (HELO mail.rodanotech.ch) (217.197.215.36) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Mar 2008 08:04:06 +0000 Received: from [192.168.1.44] (222-112.5-85.cust.bluewin.ch [85.5.112.222]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.rodanotech.ch (Postfix) with ESMTP id 499C4C025 for ; Tue, 18 Mar 2008 09:04:15 +0100 (CET) Message-Id: <2E9206AB-EFBF-4D50-A9C9-098529E96E23@rodanotech.ch> From: Alexander Lamb To: couchdb-user@incubator.apache.org In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: Relying on revisions for rollbacks Date: Tue, 18 Mar 2008 09:03:44 +0100 References: <1205753351.7678.18.camel@localhost> <828B1624-4992-4BE5-ADED-EB252DC88BD8@apache.org> <2DE218F0-6231-4382-B6BA-3638020D2851@apache.org> <47DEBD13.30303@theopenlearningcentre.com> X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org Juste so I understand: attaching previous versions as attachements is: 1) Last version of document containing a list of attachements or 2) Last version of document containing previous version as =20 attachement, which itself contains previous version as attachement, =20 etc... If the answer is (2), then merging updates from several servers might =20= really be difficult ! If the answer is (1), merging is simpler but it is not very easy to =20 generate a version number, except using revision dates. Ultimately, the reasons to keep revisions (in what I am considering =20 using couchdb for) are: 1) audit trail (for legal reasons) which means not only "show me who =20 changed what when in document X" but "show me a set of documents as =20 they were on Jan-3-2008 10:28" 2) have different document "status": archived (e.g. can't be changed), =20= published (for global use), published locally, work in progress (only =20= for the user editing) Point 2 is important because it means a document can be "live" with =20 several different revisions and depending who you are in the system, =20 you get to see one or another revision. It actually means that it should be easy to write views which say for =20= example: "give me all published document + all my work in progress documents" Since there could be many published revisions, it is actually "give me =20= the last revision with published status + last revision with work in =20 progress status" Then, when I finished working on my "work in progress" document I want =20= to store it as "published" and delete all revisions with status work =20 in progress I created between last published document and my new =20 version... In summary, what I am describing here is rather generic in document =20 management systems. Do we want this as custom built code, as actually =20= part of CouchDb or as an optional layer on top of CouchDb ? My 2 euro cents :-) Alex Le 17 mars 08 =E0 20:52, Damien Katz a =E9crit : > On Mar 17, 2008, at 2:48 PM, Alan Bell wrote: > >> Jan Lehnardt wrote: >>> >>> You can do that, too. With attachments, you'd have it all in one >>> place and would not need to write your views in a way that they >>> don't pick up old revisions. That said, it is certainly possible to >>> store older revisions in other documents, if that solves your >>> problems. >>> >>> Cheers >>> Jan >>> --=20 >> well I might be missing something about the way couchdb handles =20 >> attachments but this doesn't sound good to me. Adding attachments =20 >> to hold the revision history means that the attachments have to be =20= >> replicated each time a revision happens. > > Right now, this is true. But with attachment level incremental =20 > replication then only attachments that have changed will replicate. > >> Also a replication conflict is pretty much the same thing as a =20 >> revision, a client application would have no knowledge of a =20 >> replication conflict happening but this would be good to see in a =20 >> wiki-like page history. I can imagine in a distributed system it =20 >> would be very hard for the clients to maintain a revision history =20 >> as attachments. > > I disagree about the difficulty. It's surprisingly simple =20 > conceptually. > > The first thing is, every time you update the document, simply =20 > attach the previous revision when you save. Eventually there will be =20= > a flag you can pass in to do this automatically. > > Then, if there is a replication conflict to resolve, simply open the =20= > two conflicting documents (manually if necessary), update your =20 > chosen winner with any info you want to preserve from the loser =20 > (data, revision histories, etc) , then delete the loser revision. > > And that's it. The thing about this system is you can get very =20 > simple or very complicated with the revision history aspects, it's =20 > up to the application developer. The nice thing is you generally =20 > don't need to worry about concurrent or distributed updates with =20 > other nodes attempting the same thing. The same rules still apply =20 > and eventually the conflicts will be resolved. > >> As for writing views to not pick up old revisions, I think all =20 >> applications should assume that all documents are at all times =20 >> carrying a bundle of prior versions and replication/save conflicts. =20= >> One of the nasty things in Notes is that most applications assume =20 >> that replication conflicts don't happen and can break when they do =20= >> happen. I think a major feature of Couchdb is sensible handling of =20= >> revisions and conflicts. Purging revisions and conflicts is going =20 >> to be necessary for some applications, but in others it is =20 >> desirable to retain all versions. It would be good at least to be =20 >> able to specify which databases to run compaction on and which to =20 >> exclude. > > The scheduling of compaction is something that will be external to =20 > the core database code. Much of the work here isn't in the actually =20= > file level compaction code, but in creating tools to monitor things =20= > and initiate it with desired options. > >> >> What is the proposed rule for compaction? Just deleting all =20 >> revisions it finds? Deleting old revisions over a certain age? > > > For the first cut of compaction, it will unconditionally purge all =20 > previous revisions of a document from a database, leaving only the =20 > most recent revisions of the winner and it's conflicts. > > Then we will provide a way to perform selective purging during =20 > compaction, probably with a user provided function will be fed each =20= > document at compaction time, and it will return true or false if the =20= > document should be kept or discarded. This is also how deletion =20 > "stubs" will be purged as well (keeping some meta info about deleted =20= > documents is necessary for replication). > >> >> Another thought, it would be nice perhaps to run compaction on some =20= >> servers but not on others for replicas of the same database. Thus a =20= >> bunch of offline clients could compact fairly frequently and =20 >> aggressively, however a central server they all replicate with that =20= >> has lots of disk space could retain all versions. > > Ok, that's a neat use case but I'm not sure how you would handle the =20= > intermediate edits replicating back to the server. Maybe they just =20 > get lost. It seems possible to support such a thing without a lot of =20= > work. We'll see what is possible. > > >> I am thinking in particular of the scenario of OLPC XO laptops =20 >> replicating with a school server. > >> >> >> Alan. >