couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Pathak <vpat...@orgmeta.com>
Subject Re: CouchDB sharding orchestration
Date Fri, 29 Nov 2013 04:59:48 GMT

Hi

Just curious when you say this is a problem:

    2. Inconsistent views; While transferring documents from one shard to
    another there will be a moment when a document resides on two databases.

Are you talking about the view that shows which document do not belong - 
or you are talking about general views?

If you mean general view, one possibility is to specifically ignore all 
documents in the wrong shard.  This will give you a stale view - but 
that is exactly what you get if you use stale=ok etc in single db case 
as well.

Thanks
Vivek



On 11/28/2013 06:28 PM, Jens Rantil wrote:
> Hey couchers,
>
> For the past week, I've been playing with the thought of implementing a
> sharding orchestrator for CouchDB instances. My use case is that I have a
> lot of data/documents that can't be stored on a single machine and I'd like
> to be able to query the data with fairly low latencies. Oh, there's a big
> write throughput constantly, too.
>
> Querying could be done by rereducing the view results coming in from the
> various shards. AFAIK, this is how BigCouch does it too.
>
> Now, the rebalancing act of all this is pretty straight forward; each shard
> is given a generated view that displays all documents that don't belong
> there in the shard (and thus need to be transferred). The orchestrator will
> then periodically check if any documents should be transferred to another
> server, replicate those documents and purge* them on the source server.
>
> * Purging since I am not interested in keeping them on disk on the source
> database anymore, not even tombstones. All the document data should only
> reside on the new shard.
>
> My issue with the described solution is that there will be a moment when
> views will be showing incorrect information.
>
> As I see it, there are two problems that needs to be solved with this
> solution:
>
>     1. Purging of documents: According to the purge
> documentation<http://docs.couchdb.org/en/latest/api/database/misc.html#post--db-_purge>,
>     purging documents will require rebuilding entire views. That ain't good.
>     2. Inconsistent views; While transferring documents from one shard to
>     another there will be a moment when a document resides on two databases.
>
> 1) could possibly be solved by introducing an replica database of the
> source database so that views could be recreated offline. However, this
> makes replication way more tedious than I initially expected. 2) could be
> solved by simply locking view reads until the inconsistency is fixed. But
> hey, we're not bug fans of that kind of global locking. Possibly patching
> all map-functions in the views would also fix the issue of double
> documents. At the same time, that feels kind of like a hack.
>
> Now, I know BigCouch is around the corner. How does it cope with the above
> issues? Maybe it's introduced some new magic for this that currently is not
> in CouchDB? I've been trying to Google this, but so far haven't found too
> much. Feel free to point me in the right direction if you know of any
> material.
>
> Cheers,
> Jens
>


Mime
View raw message