couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: History Proposal
Date Mon, 03 Aug 2009 16:09:02 GMT

On Aug 3, 2009, at 11:55 AM, Jason Davies wrote:

> Hi Damien,
>
> On 3 Aug 2009, at 16:39, Damien Katz wrote:
>
>> On Aug 2, 2009, at 3:29 PM, Chris Anderson wrote:
>>
>>> On Sat, Aug 1, 2009 at 3:29 AM, Jason  
>>> Davies<jason@jasondavies.com> wrote:
>>>>
>>>> On 31 Jul 2009, at 14:42, Benoit Chesneau wrote:
>>>>
>>>>> 2009/7/31 Jason Davies <jason@jasondavies.com>:
>>>>>
>>>>>> The main points of this proposal are:
>>>>>>
>>>>>> 1. Store the historical versions of documents in a separate  
>>>>>> database.
>>>>>> This
>>>>>> is for a number of reasons: a) keeping it separate means we  
>>>>>> don't clog up
>>>>>> the main database with historical data b) history-specific  
>>>>>> views can be
>>>>>> kept
>>>>>> here c) non-intrusive implementation of this is easier.
>>>>>> 2. The change will be made at the couch_db layer so that *any*  
>>>>>> change to
>>>>>> any
>>>>>> document in the target database will be mirrored to the history 

>>>>>> database.
>>>>>
>>>>> seem good.
>>>>>
>>>>>> 3. Each and every change to a document will result in a new  
>>>>>> document
>>>>>> being
>>>>>> created in the history database (with a new ID) containing an  
>>>>>> exact copy
>>>>>> of
>>>>>> that document e.g. {_id: <new ID>, doc: <exact copy of doc>
}.
>>>>>
>>>>> How would you handle case of attachements ? If attachements are  
>>>>> copied
>>>>> for each revision of a doc, it would take a lot of place. Maybe
>>>>> storing attachements in their own doc could be solution though. So
>>>>> storing a revision would be
>>>>>
>>>>> store attachements in differents docs
>>>>> create a doc  {_id: <id>, doc: <doc>, attachments: [<id1>,
...]}
>>>>>
>>>>> attachements will be tests across revisions depending of their  
>>>>> signature
>>>>> if signature change, a new atatchment doc is created.
>>>>>
>>>>> Just a thought anyway.
>>>>
>>>> Good idea, the disk space issue would be quite important for larger
>>>> databases with larger number of changes.  I wonder if some kind of
>>>> alternative storage layer supporting diffs would help here.   
>>>> Probably
>>>> something to consider as a future improvement.
>>>>
>>>>>
>>>>>
>>>>>> 4. Adding meta-data to changes can be handled by a custom  
>>>>>> _update handler
>>>>>> (yet to be developed) to set fields such as "last_modified" and
>>>>>> "last_modified_user".
>>>
>>> I've been quiet on this thread as I'm largely in agreement with  
>>> the proposal.
>>>
>>> I think the best route for implementation is to allow Erlang  
>>> callbacks
>>> on changes. This way we can write a simple history function that
>>> copies off each change to a backup db, setting timestamps and  
>>> userCtx
>>> metadata on the way.
>>>
>>> The user interface could surface this function's activation in the
>>> node config as a check box, and applications wouldn't need to know
>>> about it at all. It should be possible to develop a generic futon- 
>>> like
>>> interface for browsing old documents to revert individual changes,  
>>> so
>>> users can work with non-backup-aware applications.
>>>
>>> As far as keeping track of time ranges when backups are turned off,
>>> the user interface could record a timestamped metadata document to  
>>> the
>>> backup db whenever the switch is flipped.
>>
>> Some comments about the proposal
>>
>> 1.  The callbacks must be synchronous. Queueing them for writing  
>> later means the queue can get overloaded and changes lost.
>> 2 Changes can still get lost. We don't have commits across dbs, so  
>> it's possible a crash during update will put the main and history  
>> dbs out of sync.
>> 3. Replicated changes get lost. If a client makes 5 edits to local  
>> replica of a document, then replicates it to a server db, only the  
>> most recent change get recorded in the history.
>>
>> I would prefer to store the history as attachments to the main  
>> document.
>
> Can you expand on your last sentence in a bit more detail?  I assume  
> you mean you would rather each document in the history db mirrored  
> each document in the target db, with attachments storing historical  
> versions?

No, I mean the earlier revisions of the document, stored as  
attachments to the current revision.

The history then replicates with the document, and is always available.

>
> To solve #3 we could also allow the history database to be  
> replicated for use-cases where the entire history is desirable on  
> all peers.

The problems with the history database is there are a lot of edge  
cases where the history gets out of sync, especially with distributed  
edits. The system breaks easily in the face of network and security  
errors.

-Damien

>
> --
> Jason Davies
>
> www.jasondavies.com
>


Mime
View raw message