couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Davies <ja...@jasondavies.com>
Subject Re: History Proposal
Date Mon, 03 Aug 2009 15:55:04 GMT
Hi Damien,

On 3 Aug 2009, at 16:39, Damien Katz wrote:

> On Aug 2, 2009, at 3:29 PM, Chris Anderson wrote:
>
>> On Sat, Aug 1, 2009 at 3:29 AM, Jason Davies<jason@jasondavies.com>  
>> wrote:
>>>
>>> On 31 Jul 2009, at 14:42, Benoit Chesneau wrote:
>>>
>>>> 2009/7/31 Jason Davies <jason@jasondavies.com>:
>>>>
>>>>> The main points of this proposal are:
>>>>>
>>>>> 1. Store the historical versions of documents in a separate  
>>>>> database.
>>>>> This
>>>>> is for a number of reasons: a) keeping it separate means we  
>>>>> don't clog up
>>>>> the main database with historical data b) history-specific views  
>>>>> can be
>>>>> kept
>>>>> here c) non-intrusive implementation of this is easier.
>>>>> 2. The change will be made at the couch_db layer so that *any*  
>>>>> change to
>>>>> any
>>>>> document in the target database will be mirrored to the history  
>>>>> database.
>>>>
>>>> seem good.
>>>>
>>>>> 3. Each and every change to a document will result in a new  
>>>>> document
>>>>> being
>>>>> created in the history database (with a new ID) containing an  
>>>>> exact copy
>>>>> of
>>>>> that document e.g. {_id: <new ID>, doc: <exact copy of doc>
}.
>>>>
>>>> How would you handle case of attachements ? If attachements are  
>>>> copied
>>>> for each revision of a doc, it would take a lot of place. Maybe
>>>> storing attachements in their own doc could be solution though. So
>>>> storing a revision would be
>>>>
>>>> store attachements in differents docs
>>>> create a doc  {_id: <id>, doc: <doc>, attachments: [<id1>,
...]}
>>>>
>>>> attachements will be tests across revisions depending of their  
>>>> signature
>>>> if signature change, a new atatchment doc is created.
>>>>
>>>> Just a thought anyway.
>>>
>>> Good idea, the disk space issue would be quite important for larger
>>> databases with larger number of changes.  I wonder if some kind of
>>> alternative storage layer supporting diffs would help here.   
>>> Probably
>>> something to consider as a future improvement.
>>>
>>>>
>>>>
>>>>> 4. Adding meta-data to changes can be handled by a custom  
>>>>> _update handler
>>>>> (yet to be developed) to set fields such as "last_modified" and
>>>>> "last_modified_user".
>>
>> I've been quiet on this thread as I'm largely in agreement with the  
>> proposal.
>>
>> I think the best route for implementation is to allow Erlang  
>> callbacks
>> on changes. This way we can write a simple history function that
>> copies off each change to a backup db, setting timestamps and userCtx
>> metadata on the way.
>>
>> The user interface could surface this function's activation in the
>> node config as a check box, and applications wouldn't need to know
>> about it at all. It should be possible to develop a generic futon- 
>> like
>> interface for browsing old documents to revert individual changes, so
>> users can work with non-backup-aware applications.
>>
>> As far as keeping track of time ranges when backups are turned off,
>> the user interface could record a timestamped metadata document to  
>> the
>> backup db whenever the switch is flipped.
>
> Some comments about the proposal
>
> 1.  The callbacks must be synchronous. Queueing them for writing  
> later means the queue can get overloaded and changes lost.
> 2 Changes can still get lost. We don't have commits across dbs, so  
> it's possible a crash during update will put the main and history  
> dbs out of sync.
> 3. Replicated changes get lost. If a client makes 5 edits to local  
> replica of a document, then replicates it to a server db, only the  
> most recent change get recorded in the history.
>
> I would prefer to store the history as attachments to the main  
> document.

Can you expand on your last sentence in a bit more detail?  I assume  
you mean you would rather each document in the history db mirrored  
each document in the target db, with attachments storing historical  
versions?

To solve #3 we could also allow the history database to be replicated  
for use-cases where the entire history is desirable on all peers.

--
Jason Davies

www.jasondavies.com


Mime
View raw message