couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: BigCouch vs. CouchDB Lounge vs. Cassandra
Date Mon, 06 Sep 2010 05:05:43 GMT
Wee! I like this question. :)

I've probably been the most active Lounge developer for a little while
now and though I'm not intimately familiar with BigCouch yet I think
I'm pretty clear on its overall architecture and design. I'll try to
highlight some of the differences for you without really selling
either system.

On simplicity and production use:

Lounge is "fairly simple" in that it runs outside CouchDB. It is not
simple to set up or maintain. Thankfully, since 0.11 added filtered
replication there is no longer a need to patch CouchDB. If you like
compiling nginx from source and you want to roll some of your own
management scripts then Lounge is a great choice. Meebo skipped the
0.11 cycle and that's why I haven't been pushing a lot of changes. I
hope to change that in a couple weeks after I get back from Couch Camp
and bring Lounge up to date with 1.0.1.

BigCouch is, obviously, in production use by Cloudant. Therefore, it's
clearly been battle tested a bit. It has the advantage of having
Dynamo-style R/W/N consistency settings. I've always said that if
someone needed this feature it'd be easy enough to add to the Lounge
and I stand by that. However, either way you will be rolling your own
management scripts since, as I understand it, this is part of the
value add of the commercial Cloudant offering.

On stability:

Until I've personally reviewed the changes to the core parts of
CouchDB I won't outright recommend you put your production data in
BigCouch, but I'm 99% sure it'd be fine. You probably won't lose data
but maybe you'll get a few 50x responses. I could probably say the
same about Lounge, though. I'll be playing with BigCouch a bit myself
in the next few weeks and will have a better idea.

On scalability:

In short. CouchDB can scale just like Cassandra and there are (at
least two) viable options for doing so. Nothing has out-of-the-box
easy management in place. There will be some operational overhead if
you decide to manage it yourself. If you don't want that, take a
hosted solution like Cloudant.

If you do not need the scale immediately the best option is to not
worry about it. Like Mr. Miyagi said in Karate Kid 2, "The best way to
avoid a punch is not be there." Set up your single couch instance or
set up two with bidirectional, continuous replication so you have
redundant copies of your production data. If you're worried about
consistency use one of them as a write master and the other as a read
slave/hot standby. You can always migrate later.

If you still care about the technical differences, read on.

--------------------------------------------------------------------------

I believe BigCouch *does* use filtered replication. Lounge does not,
instead opting for database suffixes to distinguish shards. A database
in BigCouch called "stuff" will be called "stuff" on every node. In
Lounge you will see "stuff0", "stuff1", ... I don't think there's
anything more to say about that. It's a choice, but I do not see it as
a particularly significant one.

BigCouch has dynamic re-sharding with key ranges much like Cassandra
that can be split as needed. Lounge is almost more like Riak in that
it buckets the whole hashed key space into a fixed number of shards
and then distributes membership of them to nodes. Arguments can be
made for both of these approaches. If you "overshard" enough in Lounge
there shouldn't be much concern about the lack of key-range
splitting/merging.

BigCouch uses distributed Erlang for internode communication. I don't
*think* replication happens this way, so that really just means the
proxying is faster and you can round-robin all your BigCouch nodes.
Lounge goes directly to the node with the data always through nginx
and this is also very fast.

The most significant difference is, of course, that BigCouch is
written in Erlang. For the reason I'd predict better longevity for the
project simply because it stands a chance of being included in CouchDB
(in whole or in part).

Oh. To its credit, Cassandra makes nice use of the JMX console. :)

If I have said anything false or misleading I encourage anyone to
please chime in.

-Randall

On Sun, Sep 5, 2010 at 20:49, ithkuil <ithkuil@gmail.com> wrote:
>
> What advantages does BigCouch have over Lounge?  Lounge seems fairly simple
> which is a big plus, but since Cloudant is using BigCouch in their
> commercial product that looks like a bigger plus.
>
> Do either of these solutions take advantage of new features like replication
> filters?
>
> What is the direction of internal CouchDB development in regards to
> "complete" partitioning functionality?  Is the need for Lounge or BigCouch
> (for many use cases) really a clue that if I need a completely partitioned
> distributed database I should look at something like Cassandra (do not
> like)?
>
> I'm sorry if you are tired of answering this question.  Please consider just
> ignoring it until you are in a really good mood.  That could be two weeks
> down the line if you like, or never.  Also, I know this could be on the user
> list, but I am asking here because I want to know what CouchDB internal
> developers think of the options and the direction for the future.
>
> Also, here is a tiny virtual representation of me which you can imagine
> stabbing in the eye with a tiny pencil, if that helps:
>
>  O
> \|/
>  |
> / \
>
> --
> View this message in context: http://couchdb-development.1959287.n2.nabble.com/BigCouch-vs-CouchDB-Lounge-vs-Cassandra-tp5501938p5501938.html
> Sent from the CouchDB Development mailing list archive at Nabble.com.
>

Mime
View raw message