couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <>
Subject Re: reiterating transactions vs. replication
Date Mon, 25 May 2009 06:22:02 GMT
On Sun, May 24, 2009 at 10:44 PM, Scott Shumaker <> wrote:
> Chris - you said:
> "Distributed bulk transactions would make for chaotic behavior, as
> someone's mostly unrelated change on a remote node could eventually
> replicate to me (months later) and knock an entire line of work that
> I've done into a conflict state."
> Months later?  If you're really out of sync with a remote node for
> several months, you should expect to get lots of conflicts if people
> are editing your shared documents, with or without distributed bulk
> transactions.

Right, but at least in those cases I can diff the document to figure
out what the conflict is. The bulk transactions you describe could
become conflicting and before I could save the doc I'm working on, I'd
have to figure which other doc was causing the conflict. I'm not sure
why not to just call the larger unit of data a single document, if
that's how you want to use it.

> And if you really have a system designed to be offline for several
> months, you're solving very different problems (albeit interesting
> ones) than traditional data 'replication' as most people think about
> it (when one entity controls the servers, even if they are
> geographically distributed) - and getting into DVCS-like territory.

The idea is to support offline access to distributed data across all
devices. Some devices are very small (like a phone) and others are
very large (like a datacenter). It is our aim to provide a unified API
across all of them.

> If that's the case, it does explain why there are some decisions in
> CouchDB I find strange, like the fact that authentication runs during
> replication.

Auth during replication has to happen because as far as CouchDB's
concerned replication is just another client. Having the same API for
usage and replication is part of what keeps CouchDB simple.

> You're using replication for a completely different
> purpose than I need it for - I need it for redundancy and read
> scaling, you're using it to synchronize disparate data sources.  Very
> different problems that call for very different solutions.
> On Sun, May 24, 2009 at 10:31 PM, Scott Shumaker <> wrote:
>> Inter-document dependencies come up pretty quickly when you start
>> trying to represent complex data structures in CouchDB.  There are
>> still a few cases we've encountered where there isn't a great way to
>> avoid needing transactions.  A few examples:
>> 1)
>> 'A' maintains an ordered list of 'B' elements, where the order is
>> user-defined - imagine allowing a user to re-order photos in a
>> slideshow.  You want to store the photos separately from the
>> slideshow, because they can be used in multiple slideshows.  Whenever
>> you add or remove a photo, you need to update the order list as well.
>> I've seen some people suggest some sort of gross solution where you
>> try to store floating point order id's inside the B elements and
>> change that to wedge an item in between another items (averaging the
>> two new siblings' orders), but this is incredibly brittle and breaks
>> down with enough re-ordering.
>> Another unpleasant approach is to create separate 'order objects' in
>> couchdb (representing the order of an item within a folder), storing
>> an internal version number (timestamp) inside the 'order object' - so
>> you never change the order node, you just create a new node.  Then,
>> you only use use the 'latest' version of this order node (either on
>> the client side or with a reduce).  To make this work, you need to
>> ensure that your 'internal version numbers' are monotonically
>> increasing.  This isn't a problem for some applications, and can be
>> solved in general with a specialized 'number server'.
>> 2)
>> Representing graph/linked-list datastructures.
>> If you delete a node from a linked list, you need to update two nodes
>> - the previous node and the node itself.  You can try the second
>> suggestion in the previous item to make this work (effectively, store
>> the link relationship as separate objects and generate new link
>> objects with incrementing version numbers)
>> I'm sure there are other cases - these two just have been a thorn in
>> our side.  But for a lot of non-trivial data applications,
>> transactions still end up being very useful.
>> On Fri, May 22, 2009 at 2:45 PM, Nathan Stott <> wrote:
>>> As a user, when I chose couchdb for my most recent project, I chose it
>>> because I didn't care about transactions.  I would've used RDBMS if that
>>> were important.
>>> I chose it because couch solved the problems I needed solved very well.
>>> I don't think transactions should be a big dev focus.
>>> On Fri, May 22, 2009 at 4:30 PM, Chris Anderson <> wrote:
>>>> On Thu, May 21, 2009 at 8:30 PM, Yuval Kogman <>
>>>> wrote:
>>>> > 2009/5/21 Adam Kocoloski <>:
>>>> >> Hi Yuval, thanks for this well-written proposal.  I don't really
want to
>>>> >> rehash all the discussion from back in February (see the thread
>>>> beginning at
>>>> >>
>>>>  for
>>>> >> a particularly detailed discussion), but I do want to comment on
>>>> aspect.
>>>> >>
>>>> >> Updating the replicator to be smart about atomic bulk transactions
>>>> doable
>>>> >> (although a major undertaking), but when you throw DB compaction
>>>> >> revision stemming into the mix things get really hairy.  Recall
>>>> CouchDB
>>>> >> revisions are used for concurrency control, not for maintaining
>>>> >>  Consider the following sequence of events:
>>>> >>
>>>> >> 1) Generate foo/1 and bar/1 in an atomic _bulk_docs operation
>>>> >> 2) Update foo -> foo/2
>>>> >> Compact the DB (foo/1 is deleted)
>>>> >> Start replicating to a mirror
>>>> >> Replication crashes before it reaches foo/2
>>>> >
>>>> > By crash you mean an error due to a conflict between foo/2 and foo/1'
>>>> > (the mirror's version of foo), right?
>>>> >
>>>> >> In your proposal, we should expect foo/1 to exist on the mirror,
>>>>  I
>>>> >> think this means we'd need to modify the compaction algorithm to
>>>> >> revisions of documents if a) the revision was part of an atomic
>>>> _bulk_docs,
>>>> >> and b) any of the documents in that transaction are still at the
>>>> revision
>>>> >> generated by the transaction.  Same thing goes for revision stemming
>>>> we
>>>> >> can never drop revisions if they were part of an atomic upload and
>>>> least
>>>> >> one of the document revs in the upload is still current.
>>>> >
>>>> > Yep. Personally I see this is a tradeoff, not a limitation per se. If
>>>> > you specify 'atomic' then you must pay more in terms of data size,
>>>> > performance, etc.
>>>> The problem as I see it is that someone else's bulk transaction will
>>>> have to sit around in my database, until I edit all the docs in it.
>>>> Hopefully I won't get any distributed conflicts on other old versions
>>>> of docs in the group because this would put edits that I've done
>>>> locally to other documents in the bulk group, somehow less valid.
>>>> Distributed bulk transactions would make for chaotic behavior, as
>>>> someone's mostly unrelated change on a remote node could eventually
>>>> replicate to me (months later) and knock an entire line of work that
>>>> I've done into a conflict state.
>>>> If you want atomicity, put it in a single document.
>>>> Chris
>>>> >
>>>> > In 0.8 you would have theoretically had to pay by default, but didn't
>>>> > because replication broke transactions.
>>>> >
>>>> > The basic algorithm is still the same, but the garbage collected unit
>>>> > is changed (instead of garbage collecting document revisions it
>>>> > garbage collects revision sets, with the current case being a set with
>>>> > one member. The rules still apply (if this object is wholly shadowed
>>>> > by non conflicting changes then it can be disposed of)). IIRC the
>>>> > algorithm is a copying garbage collector, so this is pretty easy to
>>>> > (you walk a DAG instead of a linked list).
>>>> >
>>>> > Under the proposed model you'd choose which operations are
>>>> > transactional and will have to pay for those.
>>>> >
>>>> >
>>>> > Anwyay, thanks for your link as well, I was reading through a rather
>>>> > boring thread and didn't see this one, so I guess I did miss out. It
>>>> > seemed to imply the discussion was done only on IRC.
>>>> >
>>>> > Anyway, here goes...
>>>> >
>>>> > The fundamental problem is that any consistent data model needs at the
>>>> > very least to have atomic primitives and ordered message passing (with
>>>> > transactional message handlers) at the per-partition level, or
>>>> > atomicity and consistency is restricted to a single document.
>>>> >
>>>> > What concerns me is Damien's post
>>>> > (
>>>> ):
>>>> >
>>>> >> No, CouchDB replication doesn't support replicating the transactions.
>>>> >> Never has, never will. That's more like transaction log replication
>>>> >> that's in traditonal dbs, a different beast.
>>>> >>
>>>> >> For the new bulk transaction model, I'm only proposing supporting
>>>> >> eventual consistency. All changes are safe to disk, but the db may
>>>> >> be in a consistent state right away.
>>>> >
>>>> > From what I know this assumption is wrong. Eventual consistency still
>>>> > needs atomic primitives, it's not about whether or not you have
>>>> > transactions, it's about what data they affect (eventual consistency
>>>> > involves breaking them down).
>>>> >
>>>> > Anyway, "never will" sounds pretty binding, but for the sake of argument:
>>>> >
>>>> > By using only insertions and idempotent updates for the bulk of the
>>>> > data changes and a message queue whose handlers use atomic updates to
>>>> > integrate this data one can implement a truly atomic distributed
>>>> > model, or an eventual consistency, but without this updates need to
>>>> > restricted to exactly one document.
>>>> >
>>>> > Eventual consistency is still possible using either locks or by
>>>> > breaking down what would have been large distributed transactions into
>>>> > smaller ones, but the key is that the code that will make things
>>>> > actually consistent must still have ACID guarantees (and be dispatched
>>>> > in order).
>>>> >
>>>> > The 0.9 model CouchDB is effectively MyISAM without data loss, but
>>>> > just because the data is around doesn't mean it's possible to know
>>>> > what to do with it (loss of context), or even fix it safely (the
>>>> > conflict resolution code is susceptible to conflicts too).
>>>> >
>>>> > Unfortunately for eventual consistency to actually work the breaking
>>>> > down of operations must be done on application level, the database
>>>> > can't decide which data can be deferred and which data cannot.
>>>> >
>>>> > All immutable data and all new data can obviously be added to the
>>>> > database outside of a transaction, but eventually a transaction
>>>> > linking this data must be part of an atomic mutation.
>>>> >
>>>> > The only way to support this without atomic operations on a unit
>>>> > larger than a document, is to have a "master" document for every
>>>> > transitive closure the graph structure requiring consistency, which
>>>> > effect only actually relates to immutable snapshot documents (e.g.
>>>> > where the ID is a hash of the data). If these closures overlap then
>>>> > single "master" for the whole graph will be needed.
>>>> >
>>>> >
>>>> > To illustrate, let's make up a social networking example. Let's say
>>>> > you are adding a friend on this social network, and that this
>>>> > operation involves 3 updates, one to add a link from your profile to
>>>> > your friend's ID, another for the inverse, and a third update to
>>>> > update to send a "hello" message to the friend, updating their inbox.
>>>> > The first update lives in one partition, and the second and third
>>>> > updates are on a second one.
>>>> >
>>>> > The back pointers in your new friends must be updated. In an fully
>>>> > transactional model this would lock the friend's document and yours
>>>> > the same time, in an eventual consistency model this would queue a
>>>> > message for the friend's partition, and a message handler on the
>>>> > friend's partition would update this atomically "eventually". It's
>>>> > fine for the link to be out of date for a while, but eventually it
>>>> > needs to be fixed (e.g. if you want to remove the friend, message
>>>> > them, etc).
>>>> >
>>>> > In couchdb 0.9 one of the writes will get a "conflict" error back, and
>>>> > they could refetch the updated version and try the edit again. The
>>>> > problem is that if the wrote the third update update to another
>>>> > document on the same node making assumptions about the same data, that
>>>> > write may have succeeded, leaving the data inconsistent. Under an
>>>> > eventual consistency model you still use transactions to do these
>>>> > updates, you just must design your model to break them down into
>>>> > smaller units.
>>>> >
>>>> > The reason a graph structure is more susceptible to inconsistency is
>>>> > that while in a relational model many data linkage operations can be
>>>> > done with a single insert/update (e.g. `insert into edges (node1_id,
>>>> > node2_id)`), in a document based database this type of opreation
>>>> > involves modifying all the affected documents. The chance of
>>>> > inconsistency is increased because contention is higher and there is
>>>> > more data that must be synchronized.
>>>> >
>>>> > However, in another post Damien said:
>>>> >
>>>> >> Which is why in general you want to avoid inter-document dependencies,
>>>> >> or be relaxed in how you deal with them.
>>>> >
>>>> > So I think I best shut up after this without some decision maker
>>>> > telling me not to, if my use case is not covered by the intended
>>>> > design then that's that, but I do think this thread sort of covers
>>>> > this:
>>>> >
>>>> >> As far as distributed transactions go, I'd be thrilled if we could
>>>> >> implement it and also support the rest of couchdb, like views and
>>>> >> directional replication. Please start up a discussion here in dev@
>>>> >> about it and see if you can work out a design.
>>>> >
>>>> > Without going too pie-in-the-sky.
>>>> >
>>>> > Cheers,
>>>> > Yuval
>>>> >
>>>> --
>>>> Chris Anderson

Chris Anderson

View raw message