couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Pathak <vpat...@orgmeta.com>
Subject Re: CouchDB sharding orchestration
Date Mon, 02 Dec 2013 01:58:13 GMT
Hi Jens,

Quick question,  what is the language/environment you planned to use for 
the feature.

On some thinking, this looks like something which might be a useful to 
me.  If your work is public domain, and if I have the skills on your 
environment, I would like to contribute.

Thanks
Vivek


On 11/28/13, 11:59 PM, Vivek Pathak wrote:
>
> Hi
>
> Just curious when you say this is a problem:
>
>    2. Inconsistent views; While transferring documents from one shard to
>    another there will be a moment when a document resides on two 
> databases.
>
> Are you talking about the view that shows which document do not belong 
> - or you are talking about general views?
>
> If you mean general view, one possibility is to specifically ignore 
> all documents in the wrong shard.  This will give you a stale view - 
> but that is exactly what you get if you use stale=ok etc in single db 
> case as well.
>
> Thanks
> Vivek
>
>
>
> On 11/28/2013 06:28 PM, Jens Rantil wrote:
>> Hey couchers,
>>
>> For the past week, I've been playing with the thought of implementing a
>> sharding orchestrator for CouchDB instances. My use case is that I 
>> have a
>> lot of data/documents that can't be stored on a single machine and 
>> I'd like
>> to be able to query the data with fairly low latencies. Oh, there's a 
>> big
>> write throughput constantly, too.
>>
>> Querying could be done by rereducing the view results coming in from the
>> various shards. AFAIK, this is how BigCouch does it too.
>>
>> Now, the rebalancing act of all this is pretty straight forward; each 
>> shard
>> is given a generated view that displays all documents that don't belong
>> there in the shard (and thus need to be transferred). The 
>> orchestrator will
>> then periodically check if any documents should be transferred to 
>> another
>> server, replicate those documents and purge* them on the source server.
>>
>> * Purging since I am not interested in keeping them on disk on the 
>> source
>> database anymore, not even tombstones. All the document data should only
>> reside on the new shard.
>>
>> My issue with the described solution is that there will be a moment when
>> views will be showing incorrect information.
>>
>> As I see it, there are two problems that needs to be solved with this
>> solution:
>>
>>     1. Purging of documents: According to the purge
>> documentation<http://docs.couchdb.org/en/latest/api/database/misc.html#post--db-_purge>,

>>
>>     purging documents will require rebuilding entire views. That 
>> ain't good.
>>     2. Inconsistent views; While transferring documents from one 
>> shard to
>>     another there will be a moment when a document resides on two 
>> databases.
>>
>> 1) could possibly be solved by introducing an replica database of the
>> source database so that views could be recreated offline. However, this
>> makes replication way more tedious than I initially expected. 2) 
>> could be
>> solved by simply locking view reads until the inconsistency is fixed. 
>> But
>> hey, we're not bug fans of that kind of global locking. Possibly 
>> patching
>> all map-functions in the views would also fix the issue of double
>> documents. At the same time, that feels kind of like a hack.
>>
>> Now, I know BigCouch is around the corner. How does it cope with the 
>> above
>> issues? Maybe it's introduced some new magic for this that currently 
>> is not
>> in CouchDB? I've been trying to Google this, but so far haven't found 
>> too
>> much. Feel free to point me in the right direction if you know of any
>> material.
>>
>> Cheers,
>> Jens
>>
>


Mime
View raw message