Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 62661 invoked from network); 3 Aug 2009 16:18:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Aug 2009 16:18:08 -0000 Received: (qmail 51960 invoked by uid 500); 3 Aug 2009 16:18:12 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 51874 invoked by uid 500); 3 Aug 2009 16:18:12 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 51864 invoked by uid 99); 3 Aug 2009 16:18:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 16:18:12 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dionne@dionne-associates.com designates 67.222.38.31 as permitted sender) Received: from [67.222.38.31] (HELO outbound-mail-141.bluehost.com) (67.222.38.31) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 03 Aug 2009 16:18:02 +0000 Received: (qmail 15786 invoked by uid 0); 3 Aug 2009 16:17:41 -0000 Received: from unknown (HELO host183.hostmonster.com) (74.220.207.183) by outboundproxy5.bluehost.com with SMTP; 3 Aug 2009 16:17:41 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=dionne-associates.com; h=Received:Message-Id:From:To:In-Reply-To:Content-Type:Content-Transfer-Encoding:Mime-Version:Subject:Date:References:X-Mailer:X-Identified-User; b=OrHPnItjAt0UzHNowlXb/eyhJkM7x4/NBpyU40oT+d5UNDyyLzCztGT6UUUvGrlXlUBHDJt4P/O7+8oWZL1aE5e6YueChPRjt+SXsx8q+6ikhJ5fivTz3cdXQAhtVonA; Received: from adsl-99-29-31-200.dsl.wlfrct.sbcglobal.net ([99.29.31.200] helo=[192.168.2.4]) by host183.hostmonster.com with esmtpa (Exim 4.69) (envelope-from ) id 1MY0EG-0005fq-UB for dev@couchdb.apache.org; Mon, 03 Aug 2009 10:17:41 -0600 Message-Id: From: Robert Dionne To: dev@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: History Proposal Date: Mon, 3 Aug 2009 12:17:39 -0400 References: X-Mailer: Apple Mail (2.935.3) X-Identified-User: {2551:host183.hostmonster.com:dionneas:dionne-associates.com} {sentby:smtp auth 99.29.31.200 authed with dionne@dionne-associates.com} X-Virus-Checked: Checked by ClamAV on apache.org On Aug 3, 2009, at 12:09 PM, Damien Katz wrote: > > On Aug 3, 2009, at 11:55 AM, Jason Davies wrote: > >> Hi Damien, >> >> On 3 Aug 2009, at 16:39, Damien Katz wrote: >> >>> On Aug 2, 2009, at 3:29 PM, Chris Anderson wrote: >>> >>>> On Sat, Aug 1, 2009 at 3:29 AM, Jason >>>> Davies wrote: >>>>> >>>>> On 31 Jul 2009, at 14:42, Benoit Chesneau wrote: >>>>> >>>>>> 2009/7/31 Jason Davies : >>>>>> >>>>>>> The main points of this proposal are: >>>>>>> >>>>>>> 1. Store the historical versions of documents in a separate >>>>>>> database. >>>>>>> This >>>>>>> is for a number of reasons: a) keeping it separate means we >>>>>>> don't clog up >>>>>>> the main database with historical data b) history-specific >>>>>>> views can be >>>>>>> kept >>>>>>> here c) non-intrusive implementation of this is easier. >>>>>>> 2. The change will be made at the couch_db layer so that *any* >>>>>>> change to >>>>>>> any >>>>>>> document in the target database will be mirrored to the >>>>>>> history database. >>>>>> >>>>>> seem good. >>>>>> >>>>>>> 3. Each and every change to a document will result in a new >>>>>>> document >>>>>>> being >>>>>>> created in the history database (with a new ID) containing an >>>>>>> exact copy >>>>>>> of >>>>>>> that document e.g. {_id: , doc: }. >>>>>> >>>>>> How would you handle case of attachements ? If attachements are >>>>>> copied >>>>>> for each revision of a doc, it would take a lot of place. Maybe >>>>>> storing attachements in their own doc could be solution though. >>>>>> So >>>>>> storing a revision would be >>>>>> >>>>>> store attachements in differents docs >>>>>> create a doc {_id: , doc: , attachments: [, ...]} >>>>>> >>>>>> attachements will be tests across revisions depending of their >>>>>> signature >>>>>> if signature change, a new atatchment doc is created. >>>>>> >>>>>> Just a thought anyway. >>>>> >>>>> Good idea, the disk space issue would be quite important for >>>>> larger >>>>> databases with larger number of changes. I wonder if some kind of >>>>> alternative storage layer supporting diffs would help here. >>>>> Probably >>>>> something to consider as a future improvement. >>>>> >>>>>> >>>>>> >>>>>>> 4. Adding meta-data to changes can be handled by a custom >>>>>>> _update handler >>>>>>> (yet to be developed) to set fields such as "last_modified" and >>>>>>> "last_modified_user". >>>> >>>> I've been quiet on this thread as I'm largely in agreement with >>>> the proposal. >>>> >>>> I think the best route for implementation is to allow Erlang >>>> callbacks >>>> on changes. This way we can write a simple history function that >>>> copies off each change to a backup db, setting timestamps and >>>> userCtx >>>> metadata on the way. >>>> >>>> The user interface could surface this function's activation in the >>>> node config as a check box, and applications wouldn't need to know >>>> about it at all. It should be possible to develop a generic futon- >>>> like >>>> interface for browsing old documents to revert individual >>>> changes, so >>>> users can work with non-backup-aware applications. >>>> >>>> As far as keeping track of time ranges when backups are turned off, >>>> the user interface could record a timestamped metadata document >>>> to the >>>> backup db whenever the switch is flipped. >>> >>> Some comments about the proposal >>> >>> 1. The callbacks must be synchronous. Queueing them for writing >>> later means the queue can get overloaded and changes lost. >>> 2 Changes can still get lost. We don't have commits across dbs, so >>> it's possible a crash during update will put the main and history >>> dbs out of sync. >>> 3. Replicated changes get lost. If a client makes 5 edits to local >>> replica of a document, then replicates it to a server db, only the >>> most recent change get recorded in the history. >>> >>> I would prefer to store the history as attachments to the main >>> document. >> >> Can you expand on your last sentence in a bit more detail? I >> assume you mean you would rather each document in the history db >> mirrored each document in the target db, with attachments storing >> historical versions? > > No, I mean the earlier revisions of the document, stored as > attachments to the current revision. This seems like a simple approach. If history is enabled the attachments could be generated when a document is updated, or lazily at compaction time? I'm not sure how delete would be handled. > > The history then replicates with the document, and is always > available. > >> >> To solve #3 we could also allow the history database to be >> replicated for use-cases where the entire history is desirable on >> all peers. > > The problems with the history database is there are a lot of edge > cases where the history gets out of sync, especially with > distributed edits. The system breaks easily in the face of network > and security errors. > > -Damien > >> >> -- >> Jason Davies >> >> www.jasondavies.com >> >