incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Rantil <>
Subject Re: CouchDB sharding orchestration
Date Mon, 02 Dec 2013 23:47:20 GMT
Hi Vivek,

I was considering using the Go programming language; It is easy to deploy, highly concurrent,
fairly easy to learn, has great support for HTTP and has a fairly good JavaScript engine coming
up[1]. Also, I've lately looked at goraft[2], which could be useful for a high-availability
implementation of this. Are you familiar with Go?


So far I haven't written a single line of code, but since there is interest I just set up
a Github repos[1]. I named it "cushions" in lack of a better name (taken?). Feel free to send
me your Github username if you have one.

As of API I'm thinking view querying and document API being identical to CouchDB. The sharding
API will be HTTP/ReST based just like CouchDB under /_shards/ or similar.


On 2 dec 2013, at 02:58, Vivek Pathak <> wrote:

> Hi Jens,
> Quick question,  what is the language/environment you planned to use for the feature.
> On some thinking, this looks like something which might be a useful to me.  If your work
is public domain, and if I have the skills on your environment, I would like to contribute.
> Thanks
> Vivek
> On 11/28/13, 11:59 PM, Vivek Pathak wrote:
>> Hi
>> Just curious when you say this is a problem:
>>   2. Inconsistent views; While transferring documents from one shard to
>>   another there will be a moment when a document resides on two databases.
>> Are you talking about the view that shows which document do not belong - or you are
talking about general views?
>> If you mean general view, one possibility is to specifically ignore all documents
in the wrong shard.  This will give you a stale view - but that is exactly what you get if
you use stale=ok etc in single db case as well.
>> Thanks
>> Vivek
>> On 11/28/2013 06:28 PM, Jens Rantil wrote:
>>> Hey couchers,
>>> For the past week, I've been playing with the thought of implementing a
>>> sharding orchestrator for CouchDB instances. My use case is that I have a
>>> lot of data/documents that can't be stored on a single machine and I'd like
>>> to be able to query the data with fairly low latencies. Oh, there's a big
>>> write throughput constantly, too.
>>> Querying could be done by rereducing the view results coming in from the
>>> various shards. AFAIK, this is how BigCouch does it too.
>>> Now, the rebalancing act of all this is pretty straight forward; each shard
>>> is given a generated view that displays all documents that don't belong
>>> there in the shard (and thus need to be transferred). The orchestrator will
>>> then periodically check if any documents should be transferred to another
>>> server, replicate those documents and purge* them on the source server.
>>> * Purging since I am not interested in keeping them on disk on the source
>>> database anymore, not even tombstones. All the document data should only
>>> reside on the new shard.
>>> My issue with the described solution is that there will be a moment when
>>> views will be showing incorrect information.
>>> As I see it, there are two problems that needs to be solved with this
>>> solution:
>>>    1. Purging of documents: According to the purge
>>> documentation<>,

>>>    purging documents will require rebuilding entire views. That ain't good.
>>>    2. Inconsistent views; While transferring documents from one shard to
>>>    another there will be a moment when a document resides on two databases.
>>> 1) could possibly be solved by introducing an replica database of the
>>> source database so that views could be recreated offline. However, this
>>> makes replication way more tedious than I initially expected. 2) could be
>>> solved by simply locking view reads until the inconsistency is fixed. But
>>> hey, we're not bug fans of that kind of global locking. Possibly patching
>>> all map-functions in the views would also fix the issue of double
>>> documents. At the same time, that feels kind of like a hack.
>>> Now, I know BigCouch is around the corner. How does it cope with the above
>>> issues? Maybe it's introduced some new magic for this that currently is not
>>> in CouchDB? I've been trying to Google this, but so far haven't found too
>>> much. Feel free to point me in the right direction if you know of any
>>> material.
>>> Cheers,
>>> Jens

View raw message