Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 85748 invoked from network); 3 Aug 2009 17:22:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Aug 2009 17:22:05 -0000 Received: (qmail 72700 invoked by uid 500); 3 Aug 2009 17:22:09 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 72617 invoked by uid 500); 3 Aug 2009 17:22:09 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 72607 invoked by uid 99); 3 Aug 2009 17:22:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 17:22:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jason@jasondavies.com designates 89.145.97.179 as permitted sender) Received: from [89.145.97.179] (HELO www1.netspade.com) (89.145.97.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 17:21:59 +0000 Received: from jddavies.gotadsl.co.uk ([82.133.112.184] helo=[10.0.1.2]) by www1.netspade.com with esmtpa (Exim 4.69) (envelope-from ) id 1MY1LB-0000np-91 for dev@couchdb.apache.org; Mon, 03 Aug 2009 17:28:55 +0000 Message-Id: From: Jason Davies To: dev@couchdb.apache.org In-Reply-To: <4A770FC8.9010701@trifork.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v935.3) Date: Mon, 3 Aug 2009 18:21:34 +0100 References: <4A770FC8.9010701@trifork.com> X-Mailer: Apple Mail (2.935.3) X-SA-Exim-Connect-IP: 82.133.112.184 X-SA-Exim-Mail-From: jason@jasondavies.com X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on www1.netspade.com X-Spam-Level: Subject: Re: History Proposal X-SA-Exim-Version: 4.2.1 (built Wed, 25 Jun 2008 17:14:11 +0000) X-SA-Exim-Scanned: Yes (on www1.netspade.com) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 On 3 Aug 2009, at 17:26, Rune Skou Larsen wrote: > Damien Katz skrev: >>>>> 2009/7/31 Jason Davies : >>>>> The main points of this proposal are: >>>>> >>>>> 1. Store the historical versions of documents in a separate >>>>> database. >>>>> This >>>>> is for a number of reasons: a) keeping it separate means we don't >>>>> clog up >>>>> the main database with historical data b) history-specific views >>>>> can be >>>>> kept >>>>> here c) non-intrusive implementation of this is easier. >>>>> >> Some comments about the proposal >> >> 1. The callbacks must be synchronous. Queueing them for writing =20 >> later >> means the queue can get overloaded and changes lost. >> 2 Changes can still get lost. We don't have commits across dbs, so >> it's possible a crash during update will put the main and history dbs >> out of sync. >> 3. Replicated changes get lost. If a client makes 5 edits to local >> replica of a document, then replicates it to a server db, only the >> most recent change get recorded in the history. >> >> I would prefer to store the history as attachments to the main =20 >> document. >> >> -Damien >> > I agree that _all versions of a document should be in the same =20 > database_ > because commit-scope of a change should include saving the undo-=20 > history. > What good is unreliable undo? > > But also for other reasons: > 1) Future versions > In my company, we need a system, where we can replicate data to all > couchdb-instances before it should be used. This is also very common =20= > in > the CMS-world for scheduling a change to the website. So we need to to > be able to store a future version, which becomes valid at a specified > time and make the "invisible" change between versions (we use a url > rewrite). Thats very tough if current data and history data are in > separate databases and in different formats. > > 2) Applying views > View'ing on historic docs should be as powerful as viewing "current" > docs. With the proposed format for historic documents, the same view > cannot be applied on current and history db. In fact, complex views > can't be used at all in the history db, since the one-dimensional > view-index must include time. > > I dream of a fully temporal couchdb, where all GET requests can =20 > include > the point in time for which I want to see the docs through my views, > lists and shows :-) > > Using attachments is not optimal, because there's still the "un-=20 > dynamic" > distinction between past, current and future, but its much better =20 > than a > seperate db. The attachments-proposal retains the possibility to > manipulate versions of the same doc in one commit-scope. We've just been discussing this some more on IRC and Beno=EEt suggested =20= adding a "_history" member to allow historical versions of documents =20 to be stored there (essentially as attachments, because doc._history =20 would by default only contain stubs). I'd prefer not to overpopulate =20= the "_" namespace so I'm not set on adding doc._history but let's run =20= with this for this discussion. The stubs would contain basic metadata: last modified timestamp and =20 userCtx that modified the doc (perhaps we can do away with =20 doc._history and add this metadata to the attachment metadata? Or =20 decide on a format for the attachment filename e.g. _history/=20 /.json?) This would then make it easy to write views that manipulated the =20 history via the doc._history stubs. I'm thinking we only probably =20 want to send the stubs to the view server, as serialising all the =20 historical data for each doc could get CPU-hungry. The other question is whether to make this a db-wide setting, perhaps =20= a special doc so that it will be replicated (_history_settings) or =20 perhaps put it in design docs, or do we want to configure it on a per-=20= doc level? Rune suggested something like { _history_settings: =20 { num_docs: 10, ... } }. I would probably lean towards putting it in =20= design docs, so that the decision can be made by the app developer. There is a possibility that this could be implemented in the _update =20 handler but I'd strongly prefer to have a core module written in =20 Erlang for performance reasons, and to make it easier for people to =20 turn it on and off. Finally, whartung pointed out this paper: = http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdf=20 which contains some interesting info on using B-trees to support =20 snapshots, maybe someone can comment on the feasibility of supporting =20= that? Comments welcomed! -- Jason Davies www.jasondavies.com