incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Rantil <jens.ran...@gmail.com>
Subject Re: CouchDB sharding orchestration
Date Mon, 02 Dec 2013 23:48:36 GMT
Woops, link to the repo: https://github.com/JensRantil/cushions

Cheers,
Jens


On Tue, Dec 3, 2013 at 12:47 AM, Jens Rantil <jens.rantil@gmail.com> wrote:

> Hi Vivek,
>
> I was considering using the Go programming language; It is easy to deploy,
> highly concurrent, fairly easy to learn, has great support for HTTP and has
> a fairly good JavaScript engine coming up[1]. Also, I've lately looked at
> goraft[2], which could be useful for a high-availability implementation of
> this. Are you familiar with Go?
>
> [1] https://github.com/robertkrimen/otto
> [2] https://github.com/goraft/raft
>
> So far I haven't written a single line of code, but since there is
> interest I just set up a Github repos[1]. I named it "cushions" in lack of
> a better name (taken?). Feel free to send me your Github username if you
> have one.
>
> As of API I'm thinking view querying and document API being identical to
> CouchDB. The sharding API will be HTTP/ReST based just like CouchDB under
> /_shards/ or similar.
>
> Cheers,
> Jens
>
> On 2 dec 2013, at 02:58, Vivek Pathak <vpathak@orgmeta.com> wrote:
>
> Hi Jens,
>
> Quick question,  what is the language/environment you planned to use for
> the feature.
>
> On some thinking, this looks like something which might be a useful to me.
>  If your work is public domain, and if I have the skills on your
> environment, I would like to contribute.
>
> Thanks
> Vivek
>
>
> On 11/28/13, 11:59 PM, Vivek Pathak wrote:
>
>
> Hi
>
> Just curious when you say this is a problem:
>
>   2. Inconsistent views; While transferring documents from one shard to
>   another there will be a moment when a document resides on two databases.
>
> Are you talking about the view that shows which document do not belong -
> or you are talking about general views?
>
> If you mean general view, one possibility is to specifically ignore all
> documents in the wrong shard.  This will give you a stale view - but that
> is exactly what you get if you use stale=ok etc in single db case as well.
>
> Thanks
> Vivek
>
>
>
> On 11/28/2013 06:28 PM, Jens Rantil wrote:
>
> Hey couchers,
>
> For the past week, I've been playing with the thought of implementing a
> sharding orchestrator for CouchDB instances. My use case is that I have a
> lot of data/documents that can't be stored on a single machine and I'd like
> to be able to query the data with fairly low latencies. Oh, there's a big
> write throughput constantly, too.
>
> Querying could be done by rereducing the view results coming in from the
> various shards. AFAIK, this is how BigCouch does it too.
>
> Now, the rebalancing act of all this is pretty straight forward; each shard
> is given a generated view that displays all documents that don't belong
> there in the shard (and thus need to be transferred). The orchestrator will
> then periodically check if any documents should be transferred to another
> server, replicate those documents and purge* them on the source server.
>
> * Purging since I am not interested in keeping them on disk on the source
> database anymore, not even tombstones. All the document data should only
> reside on the new shard.
>
> My issue with the described solution is that there will be a moment when
> views will be showing incorrect information.
>
> As I see it, there are two problems that needs to be solved with this
> solution:
>
>    1. Purging of documents: According to the purge
> documentation<
> http://docs.couchdb.org/en/latest/api/database/misc.html#post--db-_purge>,
>
>    purging documents will require rebuilding entire views. That ain't good.
>    2. Inconsistent views; While transferring documents from one shard to
>    another there will be a moment when a document resides on two databases.
>
> 1) could possibly be solved by introducing an replica database of the
> source database so that views could be recreated offline. However, this
> makes replication way more tedious than I initially expected. 2) could be
> solved by simply locking view reads until the inconsistency is fixed. But
> hey, we're not bug fans of that kind of global locking. Possibly patching
> all map-functions in the views would also fix the issue of double
> documents. At the same time, that feels kind of like a hack.
>
> Now, I know BigCouch is around the corner. How does it cope with the above
> issues? Maybe it's introduced some new magic for this that currently is not
> in CouchDB? I've been trying to Google this, but so far haven't found too
> much. Feel free to point me in the right direction if you know of any
> material.
>
> Cheers,
> Jens
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message