couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: reiterating transactions vs. replication
Date Mon, 25 May 2009 06:04:53 GMT
On Mon, May 25, 2009 at 1:44 AM, Scott Shumaker <> wrote:
> Chris - you said:
> "Distributed bulk transactions would make for chaotic behavior, as
> someone's mostly unrelated change on a remote node could eventually
> replicate to me (months later) and knock an entire line of work that
> I've done into a conflict state."
> Months later?  If you're really out of sync with a remote node for
> several months, you should expect to get lots of conflicts if people
> are editing your shared documents, with or without distributed bulk
> transactions.

I'm still mulling over Chris's concerns. My gut-brain tells me there's
a way around this but the idea hasn't floated up to the same brain
that controls actual thoughts.

> And if you really have a system designed to be offline for several
> months, you're solving very different problems (albeit interesting
> ones) than traditional data 'replication' as most people think about
> it (when one entity controls the servers, even if they are
> geographically distributed) - and getting into DVCS-like territory.

Heh. I think even in DVCS territory there'd still probably be a murder
if a dev didn't push for a couple months. Which is part of the idea
floating around. Ie, perhaps having a method for rejecting replication
that is too far out of whack (Think of git's "You can't push because
its not a fast forward") (This is all extreme hand waving so no one
take that too seriously).

> If that's the case, it does explain why there are some decisions in
> CouchDB I find strange, like the fact that authentication runs during
> replication.  You're using replication for a completely different
> purpose than I need it for - I need it for redundancy and read
> scaling, you're using it to synchronize disparate data sources.  Very
> different problems that call for very different solutions.

CouchDB has always had a focus on being able to be run in a completely
decentralized manner. As such there are features in CouchDB that
support this model. That said, it's also a goal of supporting as wide
a range of models as possible that fit into the general scheme of
things. For instance, if replication validation is something you want
to have configurable then you can create a ticket and dev@ thread
discussing the issue. I don't detect any issues in having that
disable-able but we should discuss that on a dedicated thread for
future-me's sanity.

Paul Davis

> On Sun, May 24, 2009 at 10:31 PM, Scott Shumaker <> wrote:
>> Inter-document dependencies come up pretty quickly when you start
>> trying to represent complex data structures in CouchDB.  There are
>> still a few cases we've encountered where there isn't a great way to
>> avoid needing transactions.  A few examples:
>> 1)
>> 'A' maintains an ordered list of 'B' elements, where the order is
>> user-defined - imagine allowing a user to re-order photos in a
>> slideshow.  You want to store the photos separately from the
>> slideshow, because they can be used in multiple slideshows.  Whenever
>> you add or remove a photo, you need to update the order list as well.
>> I've seen some people suggest some sort of gross solution where you
>> try to store floating point order id's inside the B elements and
>> change that to wedge an item in between another items (averaging the
>> two new siblings' orders), but this is incredibly brittle and breaks
>> down with enough re-ordering.
>> Another unpleasant approach is to create separate 'order objects' in
>> couchdb (representing the order of an item within a folder), storing
>> an internal version number (timestamp) inside the 'order object' - so
>> you never change the order node, you just create a new node.  Then,
>> you only use use the 'latest' version of this order node (either on
>> the client side or with a reduce).  To make this work, you need to
>> ensure that your 'internal version numbers' are monotonically
>> increasing.  This isn't a problem for some applications, and can be
>> solved in general with a specialized 'number server'.
>> 2)
>> Representing graph/linked-list datastructures.
>> If you delete a node from a linked list, you need to update two nodes
>> - the previous node and the node itself.  You can try the second
>> suggestion in the previous item to make this work (effectively, store
>> the link relationship as separate objects and generate new link
>> objects with incrementing version numbers)
>> I'm sure there are other cases - these two just have been a thorn in
>> our side.  But for a lot of non-trivial data applications,
>> transactions still end up being very useful.
>> On Fri, May 22, 2009 at 2:45 PM, Nathan Stott <> wrote:
>>> As a user, when I chose couchdb for my most recent project, I chose it
>>> because I didn't care about transactions.  I would've used RDBMS if that
>>> were important.
>>> I chose it because couch solved the problems I needed solved very well.
>>> I don't think transactions should be a big dev focus.
>>> On Fri, May 22, 2009 at 4:30 PM, Chris Anderson <> wrote:
>>>> On Thu, May 21, 2009 at 8:30 PM, Yuval Kogman <>
>>>> wrote:
>>>> > 2009/5/21 Adam Kocoloski <>:
>>>> >> Hi Yuval, thanks for this well-written proposal.  I don't really
want to
>>>> >> rehash all the discussion from back in February (see the thread
>>>> beginning at
>>>> >>
>>>>  for
>>>> >> a particularly detailed discussion), but I do want to comment on
>>>> aspect.
>>>> >>
>>>> >> Updating the replicator to be smart about atomic bulk transactions
>>>> doable
>>>> >> (although a major undertaking), but when you throw DB compaction
>>>> >> revision stemming into the mix things get really hairy.  Recall
>>>> CouchDB
>>>> >> revisions are used for concurrency control, not for maintaining
>>>> >>  Consider the following sequence of events:
>>>> >>
>>>> >> 1) Generate foo/1 and bar/1 in an atomic _bulk_docs operation
>>>> >> 2) Update foo -> foo/2
>>>> >> Compact the DB (foo/1 is deleted)
>>>> >> Start replicating to a mirror
>>>> >> Replication crashes before it reaches foo/2
>>>> >
>>>> > By crash you mean an error due to a conflict between foo/2 and foo/1'
>>>> > (the mirror's version of foo), right?
>>>> >
>>>> >> In your proposal, we should expect foo/1 to exist on the mirror,
>>>>  I
>>>> >> think this means we'd need to modify the compaction algorithm to
>>>> >> revisions of documents if a) the revision was part of an atomic
>>>> _bulk_docs,
>>>> >> and b) any of the documents in that transaction are still at the
>>>> revision
>>>> >> generated by the transaction.  Same thing goes for revision stemming
>>>> we
>>>> >> can never drop revisions if they were part of an atomic upload and
>>>> least
>>>> >> one of the document revs in the upload is still current.
>>>> >
>>>> > Yep. Personally I see this is a tradeoff, not a limitation per se. If
>>>> > you specify 'atomic' then you must pay more in terms of data size,
>>>> > performance, etc.
>>>> The problem as I see it is that someone else's bulk transaction will
>>>> have to sit around in my database, until I edit all the docs in it.
>>>> Hopefully I won't get any distributed conflicts on other old versions
>>>> of docs in the group because this would put edits that I've done
>>>> locally to other documents in the bulk group, somehow less valid.
>>>> Distributed bulk transactions would make for chaotic behavior, as
>>>> someone's mostly unrelated change on a remote node could eventually
>>>> replicate to me (months later) and knock an entire line of work that
>>>> I've done into a conflict state.
>>>> If you want atomicity, put it in a single document.
>>>> Chris
>>>> >
>>>> > In 0.8 you would have theoretically had to pay by default, but didn't
>>>> > because replication broke transactions.
>>>> >
>>>> > The basic algorithm is still the same, but the garbage collected unit
>>>> > is changed (instead of garbage collecting document revisions it
>>>> > garbage collects revision sets, with the current case being a set with
>>>> > one member. The rules still apply (if this object is wholly shadowed
>>>> > by non conflicting changes then it can be disposed of)). IIRC the
>>>> > algorithm is a copying garbage collector, so this is pretty easy to
>>>> > (you walk a DAG instead of a linked list).
>>>> >
>>>> > Under the proposed model you'd choose which operations are
>>>> > transactional and will have to pay for those.
>>>> >
>>>> >
>>>> > Anwyay, thanks for your link as well, I was reading through a rather
>>>> > boring thread and didn't see this one, so I guess I did miss out. It
>>>> > seemed to imply the discussion was done only on IRC.
>>>> >
>>>> > Anyway, here goes...
>>>> >
>>>> > The fundamental problem is that any consistent data model needs at the
>>>> > very least to have atomic primitives and ordered message passing (with
>>>> > transactional message handlers) at the per-partition level, or
>>>> > atomicity and consistency is restricted to a single document.
>>>> >
>>>> > What concerns me is Damien's post
>>>> > (
>>>> ):
>>>> >
>>>> >> No, CouchDB replication doesn't support replicating the transactions.
>>>> >> Never has, never will. That's more like transaction log replication
>>>> >> that's in traditonal dbs, a different beast.
>>>> >>
>>>> >> For the new bulk transaction model, I'm only proposing supporting
>>>> >> eventual consistency. All changes are safe to disk, but the db may
>>>> >> be in a consistent state right away.
>>>> >
>>>> > From what I know this assumption is wrong. Eventual consistency still
>>>> > needs atomic primitives, it's not about whether or not you have
>>>> > transactions, it's about what data they affect (eventual consistency
>>>> > involves breaking them down).
>>>> >
>>>> > Anyway, "never will" sounds pretty binding, but for the sake of argument:
>>>> >
>>>> > By using only insertions and idempotent updates for the bulk of the
>>>> > data changes and a message queue whose handlers use atomic updates to
>>>> > integrate this data one can implement a truly atomic distributed
>>>> > model, or an eventual consistency, but without this updates need to
>>>> > restricted to exactly one document.
>>>> >
>>>> > Eventual consistency is still possible using either locks or by
>>>> > breaking down what would have been large distributed transactions into
>>>> > smaller ones, but the key is that the code that will make things
>>>> > actually consistent must still have ACID guarantees (and be dispatched
>>>> > in order).
>>>> >
>>>> > The 0.9 model CouchDB is effectively MyISAM without data loss, but
>>>> > just because the data is around doesn't mean it's possible to know
>>>> > what to do with it (loss of context), or even fix it safely (the
>>>> > conflict resolution code is susceptible to conflicts too).
>>>> >
>>>> > Unfortunately for eventual consistency to actually work the breaking
>>>> > down of operations must be done on application level, the database
>>>> > can't decide which data can be deferred and which data cannot.
>>>> >
>>>> > All immutable data and all new data can obviously be added to the
>>>> > database outside of a transaction, but eventually a transaction
>>>> > linking this data must be part of an atomic mutation.
>>>> >
>>>> > The only way to support this without atomic operations on a unit
>>>> > larger than a document, is to have a "master" document for every
>>>> > transitive closure the graph structure requiring consistency, which
>>>> > effect only actually relates to immutable snapshot documents (e.g.
>>>> > where the ID is a hash of the data). If these closures overlap then
>>>> > single "master" for the whole graph will be needed.
>>>> >
>>>> >
>>>> > To illustrate, let's make up a social networking example. Let's say
>>>> > you are adding a friend on this social network, and that this
>>>> > operation involves 3 updates, one to add a link from your profile to
>>>> > your friend's ID, another for the inverse, and a third update to
>>>> > update to send a "hello" message to the friend, updating their inbox.
>>>> > The first update lives in one partition, and the second and third
>>>> > updates are on a second one.
>>>> >
>>>> > The back pointers in your new friends must be updated. In an fully
>>>> > transactional model this would lock the friend's document and yours
>>>> > the same time, in an eventual consistency model this would queue a
>>>> > message for the friend's partition, and a message handler on the
>>>> > friend's partition would update this atomically "eventually". It's
>>>> > fine for the link to be out of date for a while, but eventually it
>>>> > needs to be fixed (e.g. if you want to remove the friend, message
>>>> > them, etc).
>>>> >
>>>> > In couchdb 0.9 one of the writes will get a "conflict" error back, and
>>>> > they could refetch the updated version and try the edit again. The
>>>> > problem is that if the wrote the third update update to another
>>>> > document on the same node making assumptions about the same data, that
>>>> > write may have succeeded, leaving the data inconsistent. Under an
>>>> > eventual consistency model you still use transactions to do these
>>>> > updates, you just must design your model to break them down into
>>>> > smaller units.
>>>> >
>>>> > The reason a graph structure is more susceptible to inconsistency is
>>>> > that while in a relational model many data linkage operations can be
>>>> > done with a single insert/update (e.g. `insert into edges (node1_id,
>>>> > node2_id)`), in a document based database this type of opreation
>>>> > involves modifying all the affected documents. The chance of
>>>> > inconsistency is increased because contention is higher and there is
>>>> > more data that must be synchronized.
>>>> >
>>>> > However, in another post Damien said:
>>>> >
>>>> >> Which is why in general you want to avoid inter-document dependencies,
>>>> >> or be relaxed in how you deal with them.
>>>> >
>>>> > So I think I best shut up after this without some decision maker
>>>> > telling me not to, if my use case is not covered by the intended
>>>> > design then that's that, but I do think this thread sort of covers
>>>> > this:
>>>> >
>>>> >> As far as distributed transactions go, I'd be thrilled if we could
>>>> >> implement it and also support the rest of couchdb, like views and
>>>> >> directional replication. Please start up a discussion here in dev@
>>>> >> about it and see if you can work out a design.
>>>> >
>>>> > Without going too pie-in-the-sky.
>>>> >
>>>> > Cheers,
>>>> > Yuval
>>>> >
>>>> --
>>>> Chris Anderson

View raw message