incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: History Proposal
Date Mon, 03 Aug 2009 15:39:52 GMT

On Aug 2, 2009, at 3:29 PM, Chris Anderson wrote:

> On Sat, Aug 1, 2009 at 3:29 AM, Jason Davies<jason@jasondavies.com>  
> wrote:
>>
>> On 31 Jul 2009, at 14:42, Benoit Chesneau wrote:
>>
>>> 2009/7/31 Jason Davies <jason@jasondavies.com>:
>>>
>>>> The main points of this proposal are:
>>>>
>>>> 1. Store the historical versions of documents in a separate  
>>>> database.
>>>>  This
>>>> is for a number of reasons: a) keeping it separate means we don't  
>>>> clog up
>>>> the main database with historical data b) history-specific views  
>>>> can be
>>>> kept
>>>> here c) non-intrusive implementation of this is easier.
>>>> 2. The change will be made at the couch_db layer so that *any*  
>>>> change to
>>>> any
>>>> document in the target database will be mirrored to the history  
>>>> database.
>>>
>>> seem good.
>>>
>>>> 3. Each and every change to a document will result in a new  
>>>> document
>>>> being
>>>> created in the history database (with a new ID) containing an  
>>>> exact copy
>>>> of
>>>> that document e.g. {_id: <new ID>, doc: <exact copy of doc> }.
>>>
>>> How would you handle case of attachements ? If attachements are  
>>> copied
>>> for each revision of a doc, it would take a lot of place. Maybe
>>> storing attachements in their own doc could be solution though. So
>>> storing a revision would be
>>>
>>> store attachements in differents docs
>>> create a doc  {_id: <id>, doc: <doc>, attachments: [<id1>,
...]}
>>>
>>> attachements will be tests across revisions depending of their  
>>> signature
>>> if signature change, a new atatchment doc is created.
>>>
>>> Just a thought anyway.
>>
>> Good idea, the disk space issue would be quite important for larger
>> databases with larger number of changes.  I wonder if some kind of
>> alternative storage layer supporting diffs would help here.  Probably
>> something to consider as a future improvement.
>>
>>>
>>>
>>>> 4. Adding meta-data to changes can be handled by a custom _update  
>>>> handler
>>>> (yet to be developed) to set fields such as "last_modified" and
>>>> "last_modified_user".
>
> I've been quiet on this thread as I'm largely in agreement with the  
> proposal.
>
> I think the best route for implementation is to allow Erlang callbacks
> on changes. This way we can write a simple history function that
> copies off each change to a backup db, setting timestamps and userCtx
> metadata on the way.
>
> The user interface could surface this function's activation in the
> node config as a check box, and applications wouldn't need to know
> about it at all. It should be possible to develop a generic futon-like
> interface for browsing old documents to revert individual changes, so
> users can work with non-backup-aware applications.
>
> As far as keeping track of time ranges when backups are turned off,
> the user interface could record a timestamped metadata document to the
> backup db whenever the switch is flipped.

Some comments about the proposal

1.  The callbacks must be synchronous. Queueing them for writing later  
means the queue can get overloaded and changes lost.
2 Changes can still get lost. We don't have commits across dbs, so  
it's possible a crash during update will put the main and history dbs  
out of sync.
3. Replicated changes get lost. If a client makes 5 edits to local  
replica of a document, then replicates it to a server db, only the  
most recent change get recorded in the history.

I would prefer to store the history as attachments to the main document.

-Damien

>
> Chris
>
>
>>>
>>> why not adding date metadata when storing revision . The obvious  
>>> one I
>>> mean userCtx, and date?
>>
>> My idea was that userCtx and date could be stored using _update, or  
>> do you
>> think this should be done automatically?  It's certainly a  
>> possibility but I
>> wouldn't want to add unnecessary data if the user doesn't need it,  
>> although
>> I imagine in 99% of cases they would need the "date/time" of the  
>> change in
>> the history.
>>
>>>
>>>>
>>>> One use case we'd like to support is effectively (from the point  
>>>> of the
>>>> user) being able to "roll back" a view to a specific point in  
>>>> time, but
>>>> how
>>>> this would look in the history database has me stumped so far.   
>>>> Rolling
>>>> back
>>>> a specific doc is easy, but multiple docs, not so easy it seems.   
>>>> Any
>>>> suggestions welcome!
>>>>
>>>
>>> rolling back could be handled on a view based on date in history  
>>> database
>>> ?
>>
>> Indeed, but I haven't been able to come up with such a view without  
>> blowing
>> the reduce limitations.  I want to do something like fetch all the  
>> latest
>> history docs that were changed before some particular date.  As Jan  
>> pointed
>> out though, this could be solved using snapshot databases instead.
>>
>> --
>> Jason Davies
>>
>> www.jasondavies.com
>>
>>
>
>
>
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couch.io


Mime
View raw message