incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <>
Subject Re: Relying on revisions for rollbacks
Date Sat, 12 Apr 2008 09:14:15 GMT
Heya Ralf,

Thanks for your input and engaging in this discussion!

On Apr 12, 2008, at 04:36, Ralf Nieuwenhuijsen wrote:
> Hi,
> I've joined this mailing-list, because i wanted to reply to this  
> discussion
> specifically.
> I was hoping you could clear a number of things up for me.
> 1. Why make compacting the default? Isn't more likely that in this  
> day &
> age, most will prefer revisions for all data?

Because the storage system is pretty wasteful and you'd end up with  
several Gigabytes of database files for just a few hundred Megabytes  
of actual data. So we do need compaction in one form or another. A  
compaction that retains revisions is a lot harder to write.  Also,  
dealing with revisions in a distributed setup is less than trivial and  
would complicate the replication system quite a bit.

> 2. Compacting seems like very specific behavior, wouldn't a built-in
> cron-like system be much more generic? It could allow for all kinds of
> background proccessing, like replication, fulltext-search using  
> javascript,
> compacting, searching-for-dead-urls, etc.

Compacting is a manual process at the moment. If we would introduce a  
scheduling mechanism, it would certainly be more general purpose and  
you could hook in al sorts of operations, including compaction.

> 3. Is support for some sort of reduce behavior, as part of the views,
> planned and ifso, what can we expect?


> 4. What is the default conflict behavor? Most recent version wins?

There's no 'recent' in a distributed system. At the moment, the  
revision with the most changes wins, if I remember correctly.

> 5. Is it possible to merge on conflicts, or ifnot, how could  
> attachments
> possible properly model revisions. Wouldn't we loose a whole  
> revision tree?

You don't merge, at least at the moment, but declare one revision to  
be the winner when resolving the conflict. Since this is a manual  
process, you can make sure you don't lose revision trees. Merge might  
be in at some point, but no thoughts (at least public) went into that.

> 6. Without merging, we need to store revisions in seperate documents,
> thereby prohibiting usefull doc-is for documents under revision.

I don't understand what you mean here :) What is 'doc-is' in this  

> 7. What added benefit do manual revisisons have when we can just  
> store extra
> revision data to each document anyway?
> I'm quite sure my understanding of CouchDB can be lacking. But to me  
> it
> seems like garantueed revisisions are the killer feature.

The revisions are not, at least at this point, meant to implement  
revision control systems, they rather exists for the optimistic  
concurrency control that allows any number of parallel readers while  
serialised writes are happening and to power replication.

> The alternative of a cron-like system, could work much like the
> view-documents. These documents could contain a source url (possibly  
> local),
> a schedule-parameter and a function that maps a document to an array  
> of
> documents that is treated as a batch-put. This way we could easily  
> setup
> replication, but also all kinds of delayed and/or scheduled  
> proccessing of
> data.

Indeed. No planning went into such a thing at the moment. You might  
want to open a feature request at 
  or come up with a patch.

> Likewise, being able to define a conflict function that could merge  
> data or
> decide who wins, seems like a much better alternative to the 'atomic'
> batch-put-operations, that break down when distributed. (thereby no  
> longer
> garantueeing the scalability; another killer-feature).

Conflict resolution and merge functions do sound interesting, I don't  
understand the "not guaranteeing scalability" remark though. In the  
current implementation, this feature actually makes CouchDB scalable  
by ensuring, that all node participating in a cluster eventually end  
up with the same data. If you really do need two-phase-commit (if I  
understand correctly, you want that), that would need to be part of  
your application or a intermediate storage layer.


View raw message