couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Couchdb Wiki] Update of "Replications_and_conflicts" by BrianCandler
Date Fri, 30 Oct 2009 13:05:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Replications_and_conflicts" page has been changed by BrianCandler.
The comment on this change is: Initial version.


New page:
= Replication and conflict model =

Let's take the following example to illustrate replication and conflict

 * Alice has a document containing Bob's business card
 * She synchronizes it between her desktop PC and her laptop
 * On the desktop PC, she updates Bob's E-mail address. Without
 syncing again, she updates Bob's mobile number on the laptop.
 * Then she replicates the two to each other again

The question is, what happens to these conflicting updated documents?

== CouchDB replication ==

CouchDB works with JSON documents inside databases.  Replication of
databases takes place over HTTP, and can be either a "pull" or a "push", but
is unidirectional.  So the easiest way to perform a full sync is to do a
"push" followed by a "pull" (or vice versa).

So, Alice creates v1 and sync it. She updates to v2a on one side and v2b on
the other, and then replicates. What happens?

The answer is simple: ''both'' versions exist on both sides!

     DESKTOP                          LAPTOP
   | /db/bob |                                     INITIAL
   |   v1    |                                     CREATION

   +---------+                      +---------+
   | /db/bob |  ----------------->  | /db/bob |     PUSH
   |   v1    |                      |   v1    |
   +---------+                      +---------+

   +---------+                      +---------+  INDEPENDENT
   | /db/bob |                      | /db/bob |     LOCAL
   |   v2a   |                      |   v2b   |     EDITS
   +---------+                      +---------+

   +---------+                      +---------+
   | /db/bob |  ----------------->  | /db/bob |     PUSH
   |   v2a   |                      |   v2a   |
   +---------+                      |   v2b   |

   +---------+                      +---------+
   | /db/bob |  <-----------------  | /db/bob |     PULL
   |   v2a   |                      |   v2a   |
   |   v2b   |                      |   v2b   |
   +---------+                      +---------+

After all, this is not a filesystem, so there's no restriction that only one
document can exist with the name /db/bob. These are just "conflicting"
revisions under the same name.

Because the changes are always replicated, the data is safe. Both machines
have identical copies of both documents, so failure of a hard drive on
either side won't lose any of the changes.

Another thing to notice is that peers do not have to be configured or
tracked.  You can do regular replications to peers, or you can do one-off,
ad-hoc pushes or pulls.  After the replication has taken place, there is no
record kept of which peer any particular document or revision came from.

So the question now is: what happens when you try to read /db/bob? By
default, CouchDB picks one arbitrary revision as the "winner", using a
deterministic algorithm so that the same choice will be made on all peers.
The same happens with views: the deterministically-chosen winner is the only
revision fed into your map function.

Let's say that the winner is v2a. On the desktop, if Alice reads the
document she'll see v2a, which is what she saved there. But on the laptop,
after replication, she'll also see only v2a. It could look as if the changes
she made there have been lost - but of course they have not, they have just
been hidden away as a conflicting revision. But eventually she'll need these
changes merged into Bob's business card, otherwise they ''will'' effectively
have been lost.

Any sensible business-card application will, at minimum, have to present the
conflicting versions to Alice and allow her to create a new version
incorporating information from them all. Ideally it would merge the updates

== Conflict avoidance ==

When working on a single node, CouchDB will avoid creating conflicting
revisions by returning a 409 HTTP error.  This is because, when you PUT a
new version of a document, you must give the _rev of the previous version. 
If that _rev has already been superceded, the update is rejected with a 409.

So imagine two users on the same node are fetching Bob's business card,
updating it concurrently, and writing it back:

USER1    ----------->  GET /db/bob
         <-----------  {"_rev":"1-aaa", ...}

USER2    ----------->  GET /db/bob
         <-----------  {"_rev":"1-aaa", ...}

USER1    ----------->  PUT /db/bob?rev=1-aaa
         <-----------  {"_rev":"2-bbb", ...}

USER2    ----------->  PUT /db/bob?rev=1-aaa
         <-----------  409 Conflict  (not saved)

User2's changes are rejected, so it's up to the app to fetch /db/bob again,
and either:
 * apply the same changes as were applied to the earlier revision, and
 submit a new PUT
 * redisplay the document so the user has to edit it again
 * just overwrite it with the document being saved before (which is not
 advisable, as user1's changes will be silently lost)

So when working in this mode, your application still has to be able to
handle these conflicts and have a suitable retry strategy, but these
conflicts never end up inside the database itself.

== Conflicts in batches ==

There are two different ways that conflicts can end up in the database:
 1. Conflicting changes made on different databases, which are replicated
 to each other, as shown earlier.
 2. Changes are written to the database using _bulk_docs and all_or_nothing,
 which bypasses the 409 mechanism.

The _bulk_docs API lets you submit multiple updates (and/or deletes) in a
single HTTP POST.  Normally, these are treated as independent updates; some
in the batch may fail because the _rev is stale (just like a 409 from a PUT)
whilst others are written successfully.  The response from _bulk_docs lists
the success/fail separately for each document in the batch.

However there is another mode of working, whereby you specify
`{"all_or_nothing":true}` as part of the request.  This is CouchDB's nearest
equivalent of a "transaction", but it's not the same as a database
transaction which can fail and roll back.  Rather, it means that ''all'' of
the changes in the request will be forcibly applied to the database, even if
that introduces conflicts.  The only guarantee you are given is that they
will either all be applied to the database, or none of them (e.g.  if the
power is pulled out before the update is finished writing to disk). They
also replicate together as an atomic unit.

So this gives you a way to introduce conflicts within a single database
instance.  If you choose to do this instead of PUT, it means you don't have
to write any code for the possibility of getting a 409 response, because you
will never get one.  Rather, you have to deal with conflicts appearing later
in the database, which is what you'd have to do in a multi-master
application anyway.

POST /db/_bulk_docs
  "all_or_nothing": true,
  "docs": [
    {"_id":"x", "_rev":"1-xxx", ...},
    {"_id":"y", "_rev":"1-yyy", ...},

= Working with conflicting documents =

== HTTP API ==

The basic `GET /db/bob` operation will not show you any information about
conflicts. You see only the deterministically-chosen winner, and get no
indication as to whether other conflicting revisions exist or not.

If you do `GET /db/bob?conflicts=true`, and the document is in a conflict
state, then you will get the winner plus a _conflicts member containing an
array of the revs of the other, conflicting revision(s). You can then fetch
them individually using subsequent `GET /db/bob?rev=xxxx` operations.

As far as I can tell, from the list of _conflicts you cannot fetch all those
versions in one go with a multi-document fetch (_all_docs).  They have to be
individual GETs.

Your application can then choose to display them all to the user. Or it
could attempt to merge them, write back the merged version, and delete the
conflicting versions - that is, "resolve" the conflict.

If you are merging multiple conflicts into a single version, you need to
delete all the conflicting revisions explicitly.  However a single
_bulk_docs update can write the new version and simultaneously delete all
the other ones.

== View API ==

Views also only get the winning revision of a document. However they do also
get a _conflicts member if there are any conflicting revisions.  This means
you can write a view whose job is specifically to locate documents with

If you do this, you can have a separate "sweep" proces which periodically
scans your database, looks for documents which have conflicts, fetches
the conflicting revisions, and resolves them.

Whilst this keeps the main application simple, the problem with this
approach is that there will be a window between a conflict being introduced
and it being resolved. From a user's viewpoint, this may appear that
the document they just saved successfully may suddenly lose their changes,
only to be resurrected some time later. This may or may not be acceptable.

Also, it's easy to forget to start the sweeper, or not to implement it
properly, and this will introduce odd behaviour which will be hard to track

== Suggested code for fetch with conflict resolution ==

  1. GET docid?conflicts=true
  2. For each member in the _conflicts array:
       GET docid?rev=xxx
     If any errors occur at this stage, restart from step 1.
     (There could be a race where someone else has already resolved this
     conflict and deleted that rev)
  3. Perform application-specific merging
  4. Write _bulk_docs with an update to the first rev and deletes of
     the other revs.

This could either be done on every read (in which case you could replace all
calls to GET in your application with calls to a library which does the
above), or as part of your sweeper code.

And here is an example of this in Ruby using the low-level RestClient.

require 'rubygems'
require 'restclient'
require 'json'

# Write multiple documents as all_or_nothing, can introduce conflicts
def writem(docs)
  JSON.parse("#{DB}/_bulk_docs", {
    "all_or_nothing" => true,
    "docs" => docs,

# Write one document, return the rev
def write1(doc, id=nil, rev=nil)
  doc['_id'] = id if id
  doc['_rev'] = rev if rev

# Read a document, return *all* revs
def read1(id)
  retries = 0
  loop do
    # FIXME: escape id
    res = [JSON.parse(RestClient.get("#{DB}/#{id}?conflicts=true"))]
    if revs = res.first.delete('_conflicts')
        revs.each do |rev|
          res << JSON.parse(RestClient.get("#{DB}/#{id}?rev=#{rev}"))
        retries += 1
        raise if retries >= 5
    return res

# Create DB
RestClient.delete DB rescue nil
RestClient.put DB, {}.to_json

# Write a document
rev1 = write1({"hello"=>"xxx"},"test")
p read1("test")

# Make three conflicting versions

res = read1("test")
p res

# Now let's replace these three with one
res.first['hello'] = "foo+bar+baz"
res.each_with_index do |r,i|
  unless i == 0
    r.replace({'_id'=>r['_id'], '_rev'=>r['_rev'], '_deleted'=>true})

p read1("test")

An application written this way never has to deal with a PUT 409, and is
automatically multi-master capable.

You can see that it's straightforward enough when you know what you're
doing.  It's just that CouchDB doesn't currently provide a convenient HTTP
API for "fetch all conflicting revisions", nor "PUT to supercede these N
revisions", so you need to wrap these yourself.  I also don't know of any
client-side libraries which provide support for this.

== Merging and revision history ==

Actually performing the merge is an application-specific function. It
depends on the structure of your data. Sometimes it will be easy: e.g. if a
document contains a list which is only ever appended to, then you can
perform a union of the two list versions.

Some merge strategies look at the changes made to an object, compared to its
previous version.  This is how git's merge function works.  For example, to
merge Bob's business card versions v2a and v2b, you could look at the
differences between v1 and v2b, and then apply these changes to v2a as well.

With CouchDB, you can sometimes get hold of old revisions of a document. 
For example, if you fetch `/db/bob?rev=v2b&revs_info=true` you'll get a list
of the previous revision ids which ended up with revision v2b.  Doing the
same for v2a you can find their common ancestor revision.  However if the
database has been compacted, the content of that document revision will have
been lost.  revs_info will still show that v1 was an ancestor, but report it
as "missing".


     ,-> v2a                     v2a
     `-> v2b                     v2b

So if you want to work with diffs, the recommended way is to store those
diffs within the new revision itself.  That is: when you replace v1 with
v2a, include an extra field or attachment in v2a which says which fields
were changed from v1 to v2a.  This unfortunately does mean additional
book-keeping for your application.

= Comparison with other replicating data stores =

The same issues arise with other replicating systems, so it can be
instructive to look at these and see how they compare with CouchDB. Please
feel free to add other examples.

== Unison ==

[ Unison] is a bi-directional file
synchronisation tool. In this case, the business card would be a file,
say bob.vcf.

When you run unison, changes propagate both ways. If a file has changed on
one side but not the other, the new replaces the old. (Unison maintains a
local state file so that it knows whether a file has changed since the last
successful replication).

In our example it has changed on both sides. Only one file called "bob.vcf"
can exist within the filesystem.  Unison solves the problem by simply
ducking out: the user can choose to replace the remote version with the
local version, or vice versa (both of which would lose data), but the
default action is to leave both sides unchanged.

>>From Alice's point of view, at least this is a simple solution. Whenever
she's on the desktop she'll see the version she last edited on the desktop,
and whenever she's on the laptop she'll see the version she last edited

But because no replication has actually taken place, the data is not
protected. If her laptop hard drive dies, she'll lose all her changes made
on the laptop; ditto if her desktop hard drive dies.

It's up to her to copy across one of the versions manually (under a
different filename), merge the two, and then finally push the merged version
to the other side.

Note also that the original file (version v1) has been lost by this point.
So it's not going to be known from inspection alone which of v2a and v2b has
the most up-to-date E-mail address for Bob, and which has the most
up-to-date mobile number.  Alice has to remember which she entered last.

== Git ==

[ Git] is a well-known distributed source control system.
Like unison, git deals with files. However, git considers the state of a
whole set of files as a single object, the "tree". Whenever you save an
update, you create a "commit" which points to both the updated tree and the
previous commit(s), which in turn point to the previous tree(s). You
therefore have a full history of all the states of the files. This forms a
branch, and a pointer is kept to the tip of the branch, from which you can
work backwards to any previous state. The "pointer" is actually an SHA1 hash
of the tip commit.

If you are replicating with one or more peers, a separate branch is made for
each of the peers. For example, you might have
    master               -- my local branch
    remotes/foo/master   -- branch on peer 'foo'
    remotes/bar/master   -- branch on peer 'bar'

In the normal way of working, replication is a "pull", importing changes
from a remote peer into the local repository.  A "pull" does two things:
first "fetch" the state of the peer into the remote tracking branch for that
peer; and then attempt to "merge" those changes into the local branch.

Now let's consider the business card. Alice has created a git repo
containing bob.vcf, and cloned it across to the other machine. The
branches look like this, where AAAAAAAA is the SHA1 of the commit.

  ---------- desktop ----------           ---------- laptop ----------
  master: AAAAAAAA                        master: AAAAAAAA
  remotes/laptop/master: AAAAAAAA         remotes/desktop/master: AAAAAAAA

Now she makes a change on the desktop, and commits it into the desktop repo;
then she makes a different change on the laptop, and commits it into the
laptop repo.

  ---------- desktop ----------           ---------- laptop ----------
  master: BBBBBBBB                        master: CCCCCCCC
  remotes/laptop/master: AAAAAAAA         remotes/desktop/master: AAAAAAAA

Now on the desktop she does "git pull laptop". Firstly, the remote objects
are copied across into the local repo and the remote tracking branch is

  ---------- desktop ----------           ---------- laptop ----------
  master: BBBBBBBB                        master: CCCCCCCC
  remotes/laptop/master: CCCCCCCC         remotes/desktop/master: AAAAAAAA

  (note: repo still contains AAAAAAAA
  because commits BBBBBBBB and
  CCCCCCCC point to it)

Then git will attempt to merge the changes in. It can do this because it
knows the parent commit to CCCCCCCC is AAAAAAAA, so it takes a diff between
AAAAAAAA and CCCCCCCC and tries to apply it to BBBBBBBB. If this is
successful, then you'll get a new version with a merge commit.

  ---------- desktop ----------           ---------- laptop ----------
  master: DDDDDDDD                        master: CCCCCCCC
  remotes/laptop/master: CCCCCCCC         remotes/desktop/master: AAAAAAAA

Then Alice has to logon to the laptop and run "git pull desktop". A similar
process occurs. The remote tracking branch is updated:

  ---------- desktop ----------           ---------- laptop ----------
  master: DDDDDDDD                        master: CCCCCCCC
  remotes/laptop/master: CCCCCCCC         remotes/desktop/master: DDDDDDDD

then a merge takes place. This is a special-case: CCCCCCCC one of the parent
commits of DDDDDDDD, so the laptop can "fast forward" update from CCCCCCCC
to DDDDDDDD directly without having to do any complex merging. This leaves
the final state as:

  ---------- desktop ----------           ---------- laptop ----------
  master: DDDDDDDD                        master: DDDDDDDD
  remotes/laptop/master: CCCCCCCC         remotes/desktop/master: DDDDDDDD

Now this is all and good, but you may wonder how this is relevant when
thinking about couchdb.

Firstly, note what happens in the case when the merge algorithm fails. The
changes ''are'' still propagated from the remote repo into the local one,
and are available in the remote tracking branch; so unlike unison, you know
the data is protected. It's just that the local working copy may fail to
update, or may diverge from the remote version. It's up to you to create and
commit the combined version yourself, but you are guaranteed to have all the
history you might need to do this.

Note that whilst it's possible to build new merge algorithms into Git, the
standard ones are focussed on line-based changes to source code.  They don't
work well for XML or JSON if it's presented without any line breaks.

The other interesting consideration is multiple peers. In this case you have
multiple remote tracking branches, some of which may match your local
branch, some of which may be behind you, and some of which may be ahead of
you (i.e. contain changes that you haven't yet merged).

  master: AAAAAAAA
  remotes/foo/master: BBBBBBBB
  remotes/bar/master: CCCCCCCC
  remotes/baz/master: AAAAAAAA

Note that each peer is explicitly tracked, and therefore has to be
explicitly created.  If a peer becomes stale or is no longer needed, it's up
to you to remove it from your configuration and delete the remote tracking
branch.  This is different to couchdb, which doesn't keep any peer state in
the database.

== Amazon Dynamo ==

[ Dynamo]
is designed as an "always writeable" key-value store, so it has no equivalent to the
409 conflict avoidance. It encourages users to perform conflict resolution
on read: a read operation provides all the conflicting versions plus a
"context". The write operation supplies the same context and this
supercedes all the previous revisions in one go.

View raw message