incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Davies <ja...@jasondavies.com>
Subject Re: History Proposal
Date Mon, 03 Aug 2009 17:21:34 GMT

On 3 Aug 2009, at 17:26, Rune Skou Larsen wrote:

> Damien Katz skrev:
>>>>> 2009/7/31 Jason Davies <jason@jasondavies.com>:
>>>>> The main points of this proposal are:
>>>>>
>>>>> 1. Store the historical versions of documents in a separate
>>>>> database.
>>>>> This
>>>>> is for a number of reasons: a) keeping it separate means we don't
>>>>> clog up
>>>>> the main database with historical data b) history-specific views
>>>>> can be
>>>>> kept
>>>>> here c) non-intrusive implementation of this is easier.
>>>>>
>> Some comments about the proposal
>>
>> 1.  The callbacks must be synchronous. Queueing them for writing  
>> later
>> means the queue can get overloaded and changes lost.
>> 2 Changes can still get lost. We don't have commits across dbs, so
>> it's possible a crash during update will put the main and history dbs
>> out of sync.
>> 3. Replicated changes get lost. If a client makes 5 edits to local
>> replica of a document, then replicates it to a server db, only the
>> most recent change get recorded in the history.
>>
>> I would prefer to store the history as attachments to the main  
>> document.
>>
>> -Damien
>>
> I agree that _all versions of a document should be in the same  
> database_
> because commit-scope of a change should include saving the undo- 
> history.
> What good is unreliable undo?
>
> But also for other reasons:
> 1) Future versions
> In my company, we need a system, where we can replicate data to all
> couchdb-instances before it should be used. This is also very common  
> in
> the CMS-world for scheduling a change to the website. So we need to to
> be able to store a future version, which becomes valid at a specified
> time and make the "invisible" change between versions (we use a url
> rewrite). Thats very tough if current data and history data are in
> separate databases and in different formats.
>
> 2) Applying views
> View'ing on historic docs should be as powerful as viewing "current"
> docs. With the proposed format for historic documents, the same view
> cannot be applied on current and history db. In fact, complex views
> can't  be used at all in the history db, since the one-dimensional
> view-index must include time.
>
> I dream of a fully temporal couchdb, where all GET requests can  
> include
> the point in time for which I want to see the docs through my views,
> lists and shows  :-)
>
> Using attachments is not optimal, because there's still the "un- 
> dynamic"
> distinction between past, current and future, but its much better  
> than a
> seperate db. The attachments-proposal retains the possibility to
> manipulate versions of the same doc in one commit-scope.

We've just been discussing this some more on IRC and BenoƮt suggested  
adding a "_history" member to allow historical versions of documents  
to be stored there (essentially as attachments, because doc._history  
would by default only contain stubs).  I'd prefer not to overpopulate  
the "_" namespace so I'm not set on adding doc._history but let's run  
with this for this discussion.

The stubs would contain basic metadata: last modified timestamp and  
userCtx that modified the doc (perhaps we can do away with  
doc._history and add this metadata to the attachment metadata?  Or  
decide on a format for the attachment filename e.g. _history/ 
<timestamp>/<userCtx>.json?)

This would then make it easy to write views that manipulated the  
history via the doc._history stubs.  I'm thinking we only probably  
want to send the stubs to the view server, as serialising all the  
historical data for each doc could get CPU-hungry.

The other question is whether to make this a db-wide setting, perhaps  
a special doc so that it will be replicated (_history_settings) or  
perhaps put it in design docs, or do we want to configure it on a per- 
doc level?  Rune suggested something like { _history_settings:  
{ num_docs: 10, ... } }.  I would probably lean towards putting it in  
design docs, so that the decision can be made by the app developer.

There is a possibility that this could be implemented in the _update  
handler but I'd strongly prefer to have a core module written in  
Erlang for performance reasons, and to make it easier for people to  
turn it on and off.

Finally, whartung pointed out this paper: http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdf

  which contains some interesting info on using B-trees to support  
snapshots, maybe someone can comment on the feasibility of supporting  
that?

Comments welcomed!
--
Jason Davies

www.jasondavies.com


Mime
View raw message