couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Lamb <alexander.l...@rodanotech.ch>
Subject Re: Relying on revisions for rollbacks
Date Tue, 18 Mar 2008 08:03:44 GMT
Juste so I understand:

attaching previous versions as attachements is:

1) Last version of document containing a list of attachements

or

2) Last version of document containing previous version as  
attachement, which itself contains previous version as attachement,  
etc...

If the answer is (2), then merging updates from several servers might  
really be difficult !

If the answer is (1), merging is simpler but it is not very easy to  
generate a version number, except using revision dates.

Ultimately, the reasons to keep revisions (in what I am considering  
using couchdb for) are:

1) audit trail (for legal reasons) which means not only "show me who  
changed what when in document X" but "show me a set of documents as  
they were on Jan-3-2008 10:28"
2) have different document "status": archived (e.g. can't be changed),  
published (for global use), published locally, work in progress (only  
for the user editing)

Point 2 is important because it means a document can be "live" with  
several different revisions and depending who you are in the system,  
you get to see one or another revision.

It actually means that it should be easy to write views which say for  
example:

"give me all published document + all my work in progress documents"

Since there could be many published revisions, it is actually "give me  
the last revision with published status + last revision with work in  
progress status"

Then, when I finished working on my "work in progress" document I want  
to store it as "published" and delete all revisions with status work  
in progress I created between last published document and my new  
version...

In summary, what I am describing here is rather generic in document  
management systems. Do we want this as custom built code, as actually  
part of CouchDb or as an optional layer on top of CouchDb ?

My 2 euro cents :-)

Alex

Le 17 mars 08 à 20:52, Damien Katz a écrit :

> On Mar 17, 2008, at 2:48 PM, Alan Bell wrote:
>
>> Jan Lehnardt wrote:
>>>
>>> You can do that, too. With attachments, you'd have it all in one
>>> place and would not need to write your views in a way that they
>>> don't pick up old revisions. That said, it is certainly possible to
>>> store older revisions in other documents, if that solves your
>>> problems.
>>>
>>> Cheers
>>> Jan
>>> -- 
>> well I might be missing something about the way couchdb handles  
>> attachments but this doesn't sound good to me. Adding attachments  
>> to hold the revision history means that the attachments have to be  
>> replicated each time a revision happens.
>
> Right now, this is true. But with attachment level incremental  
> replication then only attachments that have changed will replicate.
>
>> Also a replication conflict is pretty much the same thing as a  
>> revision, a client application would have no knowledge of a  
>> replication conflict happening but this would be good to see in a  
>> wiki-like page history. I can imagine in a distributed system it  
>> would be very hard for the clients to maintain a revision history  
>> as attachments.
>
> I disagree about the difficulty. It's surprisingly simple  
> conceptually.
>
> The first thing is, every time you update the document, simply  
> attach the previous revision when you save. Eventually there will be  
> a flag you can pass in to do this automatically.
>
> Then, if there is a replication conflict to resolve, simply open the  
> two conflicting documents (manually if necessary), update your  
> chosen winner with any info you want to preserve from the loser  
> (data, revision histories, etc) , then delete the loser revision.
>
> And that's it. The thing about this system is you can get very  
> simple or very complicated with the revision history aspects, it's  
> up to the application developer. The nice thing is you generally  
> don't need to worry about concurrent or distributed updates with  
> other nodes attempting the same thing. The same rules still apply  
> and eventually the conflicts will be resolved.
>
>> As for writing views to not pick up old revisions, I think all  
>> applications should assume that all documents are at all times  
>> carrying a bundle of prior versions and replication/save conflicts.  
>> One of the nasty things in Notes is that most applications assume  
>> that replication conflicts don't happen and can break when they do  
>> happen. I think a major feature of Couchdb is sensible handling of  
>> revisions and conflicts. Purging revisions and conflicts is going  
>> to be necessary for some applications, but in others it is  
>> desirable to retain all versions. It would be good at least to be  
>> able to specify which databases to run compaction on and which to  
>> exclude.
>
> The scheduling of compaction is something that will be external to  
> the core database code. Much of the work here isn't in the actually  
> file level compaction code, but in creating tools to monitor things  
> and initiate it with desired options.
>
>>
>> What is the proposed rule for compaction? Just deleting all  
>> revisions it finds? Deleting old revisions over a certain age?
>
>
> For the first cut of compaction, it will unconditionally purge all  
> previous revisions of a document from a database, leaving only the  
> most recent revisions of the winner and it's conflicts.
>
> Then we will provide a way to perform selective purging during  
> compaction, probably with a user provided function will be fed each  
> document at compaction time, and it will return true or false if the  
> document should be kept or discarded. This is also how deletion  
> "stubs" will be purged as well (keeping some meta info about deleted  
> documents is necessary for replication).
>
>>
>> Another thought, it would be nice perhaps to run compaction on some  
>> servers but not on others for replicas of the same database. Thus a  
>> bunch of offline clients could compact fairly frequently and  
>> aggressively, however a central server they all replicate with that  
>> has lots of disk space could retain all versions.
>
> Ok, that's a neat use case but I'm not sure how you would handle the  
> intermediate edits replicating back to the server. Maybe they just  
> get lost. It seems possible to support such a thing without a lot of  
> work. We'll see what is possible.
>
>
>> I am thinking in particular of the scenario of OLPC XO laptops  
>> replicating with a school server.
>
>>
>>
>> Alan.
>


Mime
View raw message