couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: History Proposal
Date Mon, 03 Aug 2009 16:17:39 GMT



On Aug 3, 2009, at 12:09 PM, Damien Katz wrote:

>
> On Aug 3, 2009, at 11:55 AM, Jason Davies wrote:
>
>> Hi Damien,
>>
>> On 3 Aug 2009, at 16:39, Damien Katz wrote:
>>
>>> On Aug 2, 2009, at 3:29 PM, Chris Anderson wrote:
>>>
>>>> On Sat, Aug 1, 2009 at 3:29 AM, Jason  
>>>> Davies<jason@jasondavies.com> wrote:
>>>>>
>>>>> On 31 Jul 2009, at 14:42, Benoit Chesneau wrote:
>>>>>
>>>>>> 2009/7/31 Jason Davies <jason@jasondavies.com>:
>>>>>>
>>>>>>> The main points of this proposal are:
>>>>>>>
>>>>>>> 1. Store the historical versions of documents in a separate 

>>>>>>> database.
>>>>>>> This
>>>>>>> is for a number of reasons: a) keeping it separate means we 

>>>>>>> don't clog up
>>>>>>> the main database with historical data b) history-specific  
>>>>>>> views can be
>>>>>>> kept
>>>>>>> here c) non-intrusive implementation of this is easier.
>>>>>>> 2. The change will be made at the couch_db layer so that *any*
 
>>>>>>> change to
>>>>>>> any
>>>>>>> document in the target database will be mirrored to the  
>>>>>>> history database.
>>>>>>
>>>>>> seem good.
>>>>>>
>>>>>>> 3. Each and every change to a document will result in a new 

>>>>>>> document
>>>>>>> being
>>>>>>> created in the history database (with a new ID) containing an
 
>>>>>>> exact copy
>>>>>>> of
>>>>>>> that document e.g. {_id: <new ID>, doc: <exact copy
of doc> }.
>>>>>>
>>>>>> How would you handle case of attachements ? If attachements are 

>>>>>> copied
>>>>>> for each revision of a doc, it would take a lot of place. Maybe
>>>>>> storing attachements in their own doc could be solution though. 

>>>>>> So
>>>>>> storing a revision would be
>>>>>>
>>>>>> store attachements in differents docs
>>>>>> create a doc  {_id: <id>, doc: <doc>, attachments: [<id1>,
...]}
>>>>>>
>>>>>> attachements will be tests across revisions depending of their  
>>>>>> signature
>>>>>> if signature change, a new atatchment doc is created.
>>>>>>
>>>>>> Just a thought anyway.
>>>>>
>>>>> Good idea, the disk space issue would be quite important for  
>>>>> larger
>>>>> databases with larger number of changes.  I wonder if some kind of
>>>>> alternative storage layer supporting diffs would help here.   
>>>>> Probably
>>>>> something to consider as a future improvement.
>>>>>
>>>>>>
>>>>>>
>>>>>>> 4. Adding meta-data to changes can be handled by a custom  
>>>>>>> _update handler
>>>>>>> (yet to be developed) to set fields such as "last_modified" and
>>>>>>> "last_modified_user".
>>>>
>>>> I've been quiet on this thread as I'm largely in agreement with  
>>>> the proposal.
>>>>
>>>> I think the best route for implementation is to allow Erlang  
>>>> callbacks
>>>> on changes. This way we can write a simple history function that
>>>> copies off each change to a backup db, setting timestamps and  
>>>> userCtx
>>>> metadata on the way.
>>>>
>>>> The user interface could surface this function's activation in the
>>>> node config as a check box, and applications wouldn't need to know
>>>> about it at all. It should be possible to develop a generic futon- 
>>>> like
>>>> interface for browsing old documents to revert individual  
>>>> changes, so
>>>> users can work with non-backup-aware applications.
>>>>
>>>> As far as keeping track of time ranges when backups are turned off,
>>>> the user interface could record a timestamped metadata document  
>>>> to the
>>>> backup db whenever the switch is flipped.
>>>
>>> Some comments about the proposal
>>>
>>> 1.  The callbacks must be synchronous. Queueing them for writing  
>>> later means the queue can get overloaded and changes lost.
>>> 2 Changes can still get lost. We don't have commits across dbs, so  
>>> it's possible a crash during update will put the main and history  
>>> dbs out of sync.
>>> 3. Replicated changes get lost. If a client makes 5 edits to local  
>>> replica of a document, then replicates it to a server db, only the  
>>> most recent change get recorded in the history.
>>>
>>> I would prefer to store the history as attachments to the main  
>>> document.
>>
>> Can you expand on your last sentence in a bit more detail?  I  
>> assume you mean you would rather each document in the history db  
>> mirrored each document in the target db, with attachments storing  
>> historical versions?
>
> No, I mean the earlier revisions of the document, stored as  
> attachments to the current revision.

This seems like a simple approach. If history is enabled the  
attachments could be generated when a document is updated, or lazily  
at compaction time? I'm not sure how delete would be handled.

>
> The history then replicates with the document, and is always  
> available.
>
>>
>> To solve #3 we could also allow the history database to be  
>> replicated for use-cases where the entire history is desirable on  
>> all peers.
>
> The problems with the history database is there are a lot of edge  
> cases where the history gets out of sync, especially with  
> distributed edits. The system breaks easily in the face of network  
> and security errors.
>
> -Damien
>
>>
>> --
>> Jason Davies
>>
>> www.jasondavies.com
>>
>


Mime
View raw message