incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Davies <ja...@jasondavies.com>
Subject Re: History Proposal
Date Thu, 06 Aug 2009 16:04:34 GMT
Hi Brian,

On 4 Aug 2009, at 10:56, Brian Candler wrote:

> On Mon, Aug 03, 2009 at 06:21:34PM +0100, Jason Davies wrote:
>> Comments welcomed!
>
> ISTM that the "historical" versions are already stored, so why  
> duplicate
> them in the form of an attachment to a new version? And what about
> historical versions of attachments anyway?
>
> Wouldn't it be simpler to:
>
> - keep the historical versions by _rev as they are now
>
> - somehow mark these historical versions as worth keeping or not
>  (could be as simple as reusing the _deleted flag)
>
> - make the "worth keeping" versions survive compaction
>
> Then when you PUT a document, you'd have two options: apply the  
> _deleted
> flag automatically to the old revision, or not. This could be chosen  
> by URL
> parameter perhaps.
>
> Some views might want access to historical revs, but perhaps this  
> should be
> controlled by a view parameter to filter them out for views which  
> are only
> interested in the most recent one. (Incidentally, I would like views  
> to have
> access to live conflicting revs too, but that's a separate issue)


I like the simplicity of your idea, but I'd be interested to hear  
Damien's opinion on essentially using MVCC revisions as history too.   
Is there a potential difficulty with doing this that we're missing?

You said that it seems unnecessary to duplicate the historical  
versions as attachments.  Yes, you may have a point, but in the  
current way of doing things the duplicates would be removed after  
compaction.  If I understand things correctly, only new attachments  
get written out to disk every time they are added, so it's not as if  
*all* historical versions are appended to the database file every time  
a document is modified, only a single old version would be appended  
(as an attachment) as well as the new doc, of course.  The other good  
thing about storing historical versions as attachments is that they  
would get replicated.  Currently we don't replicate old MVCC versions,  
this would have to change as well as preventing them from being  
compacted as you say.

Good point about storing attachments in the history, this could  
potentially become a space issue assuming we simply write the  
attachments as JSON docs with the attachments embedded as base64.  A  
better approach would be to store hashes and store the attachments  
themselves separate from the historical versions (using with some kind  
of prefix).  This way we only write a new historical attachment if it  
changes.

All in all, it seems to me that reusing _rev for history saves us  
having to doing an additional read and an additional write (reading  
the old doc or attachment and then writing it as an attachment).  Is  
this a good enough reason to reuse _rev for this?

Thanks,
--
Jason Davies

www.jasondavies.com


Mime
View raw message