couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Binns <>
Subject Re: Two Concerns
Date Thu, 31 Dec 2009 00:48:21 GMT
Hash: SHA1

Sean Clark Hess wrote:
> the author
> says that Couch has performance issues because of the json/http layer. Is he
> doing something wrong?

It looks like the wrapper for his programming language did not support the
bulk apis and possibly not streaming data either.  Yes there will latency
problems going over a network reading/writing items one at a time.  This
applies to anything network related (eg SQL servers, web, file serving).
CouchDB provides bulk reading and writing, and streams data down as it is
found so those problems will only be as the result of a less than functional
access library.

> Second, the replication system seems to be hotly contested. I don't really
> understand how letting my data be inconsistent solves more problems than it
> creates. I would think that data inconsistency would only be acceptable for
> very specific apps.

For data to be consistent across any data storage system when you have a
group of servers providing the data then there are only two available
approaches.  One is to severely constrain the data (eg it can't reference
other data, ids are generated in some way that will never clash, only new
items can be created - existing ones can't be changed or deleted).  That
isn't exactly useful.

The second is that there has to be some sort of locking or serialization
system across all the servers.  For example you can designate one a master,
require all writes to that and have it replicate.  Or you can have some sort
of distributed lock manager.  This significantly affects performance, and
requires rather elaborate design and monitoring.

> If I DO need consistency, will it be easy to replicate/scale horizontally?

You need to be careful in exactly what you mean by consistency.  If for
example you mean that everyone always sees exactly the same view of data and
updates are transactional then you cannot use more than one CouchDB
server(*).  A multi-server solution is hard and expensive.  Oracle will be
happy to sell you one.

>  Or will it require as much or more work as a "normal" master-slave
> environment?

CouchDB has no notion of masters or slaves.  Anyone can replicate with
anyone else at any time.  The underlying structures are specifically
designed for this.

In normal use cases there are no conflicts when dealing with data 99.99% of
the time.  CouchDB optimises for this use case.  You add/modify/delete data
against the most convenient CouchDB instance as you see fit.  Then you
replicate as needed.  In a very small number of cases there will be
conflicts.  CouchDB lets you find those conflicts and address them as
needed.  (No information is lost or overwritten.)  Until you address the
conflicts it uses a heuristic of which version of the document to offer.

You should also be careful in data design for replication.  For example if
you store a blog posting and its comments as a single document then you are
likely to get conflicts when comments are added/changed/deleted from
different instances.  The solution would be to have the post as one document
and each comment as separate documents.  That would be very replication
friendly and you'd only get conflicts if the same post or comment is changed
on different instances concurrently (a rare event and easy to reconcile).

(*) The Lounge project lets you have the appearance of a single server while
talking to multiple backends.  In theory it could have all sorts of hooks to
have redundant backends, monitoring, replication triggers etc.  In other
words re-invent all the distributed locking and similar stuff you'd get in a
clustered database.

Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


View raw message