couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ralf Nieuwenhuijsen" <>
Subject Re: Relying on revisions for rollbacks
Date Sun, 13 Apr 2008 00:33:17 GMT
> Because the storage system is pretty wasteful and you'd end up with
> several Gigabytes of database files for just a few hundred Megabytes of
> actual data. So we do need compaction in one form or another. A compaction
> that retains revisions is a lot harder to write.  Also, dealing with
> revisions in a distributed setup is less than trivial and would complicate
> the replication system quite a bit.

The gigabytes versus hundred megabytes seem acceptable to me. Esspecially
when we can scale that easily. Also, it seems to depend on how often data
changes. A simple solution to compact revisions would be to store each
revision as a reverse-diff as well. The normal data can then be compacted,
whereas the reverse-diff is just kept. From the most recent version the
older versions can be established.

Question 1: How would manual revisions be any more space efficient?

Compacting is a manual process at the moment. If we would introduce a
> scheduling mechanism, it would certainly be more general purpose and you
> could hook in al sorts of operations, including compaction.

Question 2: In which case 'compacting' (aka as destroying the revisions)
would still be optional; something we can turn off?

Question 3: Can we use older revisions in views?

> and

Question 4: It appears from the comments this will behave much like a
combinator. So the algorithm complexity of adding one new document would be
O(1) ?

You don't merge, at least at the moment, but declare one revision to be the
> winner when resolving the conflict. Since this is a manual process, you can
> make sure you don't lose revision trees. Merge might be in at some point,
> but no thoughts (at least public) went into that.

Question 5: Is manually implementing a conflict resolver possible at the
moment (didn't find it on the wiki) and if so, why not let that function
just return the winning _data_. That way we could easily implement a merger.
(which would be a much more sane approach for most documents)

I don't understand what you mean here :) What is 'doc-is' in this context?

Oops, i meant 'doc-ID's' .. if i have several revisions of the same document
as seperate documents, then the doc-id can no longer be some nice name.
Since doc-id's have to be unqiue.

The alternative of a cron-like system, could work much like the
> > view-documents. These documents could contain a source url (possibly
> > local),
> > a schedule-parameter and a function that maps a document to an array of
> > documents that is treated as a batch-put. This way we could easily setup
> > replication, but also all kinds of delayed and/or scheduled proccessing
> > of
> > data.
> >
> Indeed. No planning went into such a thing at the moment. You might want
> to open a feature request at or
> come up with a patch.

Perhaps i will look into it myself, if it turns out I need this desperately.
I don't have any erlang experience, but i think my experience with haskell
will pull me through ;-)

Conflict resolution and merge functions do sound interesting, I don't
> understand the "not guaranteeing scalability" remark though. In the current
> implementation, this feature actually makes CouchDB scalable by ensuring,
> that all node participating in a cluster eventually end up with the same
> data. If you really do need two-phase-commit (if I understand correctly, you
> want that), that would need to be part of your application or a intermediate
> storage layer.

No, no need for two-phase-commits. Rather, i would suggest the complete
other extreme. No failed inserts/updates ever, including batch puts. Just a
generic merging conflict solver.

JSON seems very merge friendly to me ;-) It would seem that 99% of all
documents and use cases could be treated with the same genericl merge


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message