couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Rantil <jens.ran...@gmail.com>
Subject Re: CouchDB sharding orchestration
Date Fri, 29 Nov 2013 10:57:25 GMT
I was talking about general views. By ignoring, I guess you mean
ignoring them in the map function? Absolutely, that would work. My
only concern with that solution is that my orchestrator will have to
patch the map functions when the sharding topology changes. This feels
a bit hacky.

I've now slept on the problem and realized that all my views will have
to be rewritten for every topology change. Since requirements the
database to be fairly responsive, I think the best way is to set up a
brand new topology using filtered continuous replication and then with
a moments notice flip over to the new set of databases and drop the
old ones.

If the same machines are to be used for topology changes, the
databases could double in size throughout the transition. Probably a
good idea to let the orchestrator also orchestrate DB compression...

Jens

Sent from my iPhone 6

> 29 nov 2013 kl. 06:00 skrev Vivek Pathak <vpathak@orgmeta.com>:
>
>
> Hi
>
> Just curious when you say this is a problem:
>
>   2. Inconsistent views; While transferring documents from one shard to
>   another there will be a moment when a document resides on two databases.
>
> Are you talking about the view that shows which document do not belong - or you are talking
about general views?
>
> If you mean general view, one possibility is to specifically ignore all documents in
the wrong shard.  This will give you a stale view - but that is exactly what you get if you
use stale=ok etc in single db case as well.
>
> Thanks
> Vivek
>
>
>
>> On 11/28/2013 06:28 PM, Jens Rantil wrote:
>> Hey couchers,
>>
>> For the past week, I've been playing with the thought of implementing a
>> sharding orchestrator for CouchDB instances. My use case is that I have a
>> lot of data/documents that can't be stored on a single machine and I'd like
>> to be able to query the data with fairly low latencies. Oh, there's a big
>> write throughput constantly, too.
>>
>> Querying could be done by rereducing the view results coming in from the
>> various shards. AFAIK, this is how BigCouch does it too.
>>
>> Now, the rebalancing act of all this is pretty straight forward; each shard
>> is given a generated view that displays all documents that don't belong
>> there in the shard (and thus need to be transferred). The orchestrator will
>> then periodically check if any documents should be transferred to another
>> server, replicate those documents and purge* them on the source server.
>>
>> * Purging since I am not interested in keeping them on disk on the source
>> database anymore, not even tombstones. All the document data should only
>> reside on the new shard.
>>
>> My issue with the described solution is that there will be a moment when
>> views will be showing incorrect information.
>>
>> As I see it, there are two problems that needs to be solved with this
>> solution:
>>
>>    1. Purging of documents: According to the purge
>> documentation<http://docs.couchdb.org/en/latest/api/database/misc.html#post--db-_purge>,
>>    purging documents will require rebuilding entire views. That ain't good.
>>    2. Inconsistent views; While transferring documents from one shard to
>>    another there will be a moment when a document resides on two databases.
>>
>> 1) could possibly be solved by introducing an replica database of the
>> source database so that views could be recreated offline. However, this
>> makes replication way more tedious than I initially expected. 2) could be
>> solved by simply locking view reads until the inconsistency is fixed. But
>> hey, we're not bug fans of that kind of global locking. Possibly patching
>> all map-functions in the views would also fix the issue of double
>> documents. At the same time, that feels kind of like a hack.
>>
>> Now, I know BigCouch is around the corner. How does it cope with the above
>> issues? Maybe it's introduced some new magic for this that currently is not
>> in CouchDB? I've been trying to Google this, but so far haven't found too
>> much. Feel free to point me in the right direction if you know of any
>> material.
>>
>> Cheers,
>> Jens
>

Mime
View raw message