Mailing-List: contact cassandra-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-dev@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 209.85.219.212 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type:content-transfer-encoding;
        b=tIfv1QreevTdtKyLYLRJ9ESl7wWLK+6DwONObItrw5anztGDvtye6uQGJDPxcVFQWg
         5CwAKXMJCdU+dbX8tRSWo7+urvWxtafs14yu2oN1lRM9sXsZcsGWRtg/UIfoN4RMdlpX
         8FYxiDlVZkffNR+h9TkzfZZVss/lGpQVvgXOU=
MIME-Version: 1.0
In-Reply-To: <35bb42690910291420m2a45ca6fn4caf0e5547cdbc53@mail.gmail.com>
References: <860544ed0910291348t75fbc295v485207768e1c3346@mail.gmail.com>
	<f4d6a21a0910291351l6955c357ld7cc82dd0cbcf9a2@mail.gmail.com>
	<C872342E-30CF-40DF-8D5F-86A7692F5C49@digg.com>
 <e06563880910291415l795100ekc084f67e74ab50f4@mail.gmail.com>
	<35bb42690910291420m2a45ca6fn4caf0e5547cdbc53@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Thu, 29 Oct 2009 15:59:18 -0600
Message-ID: <e06563880910291459h2aaa0ccg9e40b0b920190555@mail.gmail.com>
Subject: Re: HBase vs. Cassandra: new article!
To: cassandra-user@incubator.apache.org, chris@chriswere.com
Cc: cassandra-dev@incubator.apache.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Okay, here are some corrections.  It's a bit choppy because it's just
that; a list of corrections.

Again, this is just trying to address factual errors; I disagree with
many of the expressed opinions, too. :)

> Cassandra relies mostly on Key-Value pairs for storage

No more than hbase does.  Cassandra's columnfamily model does away
with historical values, and adds supercolumns, but the two have a lot
more in commmon with each other than with actual k/v stores.

> it=92s a fact that far more people are using HBase than Cassandra at this=
 moment

While it's possible that more people are using HBase right now, with
90 people in the cassandra irc chanel, and 55 in hbase, I'm
comfortable that Cassandra's community is healthy.

> despite both being similarly recent

HBase is roughly 2x as old as Cassandra.

> HBase values strong consistency and High Availability while Cassandra val=
ues Availability and Partitioning tolerance

HBase actually picks CP.

> Efficiently running MapReduce on Cassandra, on the other hand, is difficu=
lt because all of its keys are in one big =93space=94, so the MapReduce fra=
mework doesn=92t know how to split and divide the data natively. There need=
s to be some hackery in place to handle all of that.

Writing a hadoop input generator is a Feature, to use the article's
terminology.  It doesn't have to be hackish; in fact, trunk now has a
key range splitter that could easily be adapted to Hadoop.

Quoting an old patchset to "prove" that cassandra can only poorly
interface to hadoop is weak.

> Cassandra is only a Ruby gem install away.

Or a tar download, or a deb package...

> You still have to do quite a bit of manual configuration

Other than columnfamily definition (which must also be done for
hbase), I'm not sure what the author was thinking of here.
bin/cassandra works out of the box, and (unlike hbase) there is only
one type of process to deal with, which is a huge win for ops in
production.

> in HBase, if a region server is down, writes will be blocked for affected=
 data until the data is redistributed

(that is why hbase really has CP out of CAP, not CA)

> Cassandra, however, has an internal method of resolving up-to-dateness is=
sues with vector clocks =97 a complex but workable solution where basically=
 the latest timestamp wins

No; Cassandra uses latest-timestamp-wins, which is totally different
from vector clocks.

> Another architectural quibble is that Cassandra only supports one table p=
er install. That means you can=92t denormalize your data to make it more us=
able in analytical scenarios.

Not even a kernel of truth there.  wtf?

> Cassandra is really more of a Key Value store than a Data Warehouse.

Again: wtf?

> Furthermore, schema changes require a cluster restart

This part is true, for now.  But, misleading since "schema change"
means "adding CFs or keyspaces," not merely "modifying columns" like
in traditional dbs.

> it=92s difficult to claim that Cassandra implements the BigTable model

We never claimed to be a pure bigtable clone.  We don't want to be,
because of the single points of failures and operational complexity
involved.

> Cassandra is optimized for small datacenters (hundreds of nodes) connecte=
d by very fast fiber. HBase, being based on research originally published b=
y Google, is happy to handle replication to thousands of planet-strewn node=
s across the =92slow=92, unpredictable Internet

Cassandra has multi-datacenter support already.  HBase didn't, last I
checked.  So this is weird.

> This first diagram is a model of the Cassandra replication scheme.

Note that all these steps happen in parallel.

> it=92s impossible to tell when the required number of replicas will be up=
-to-date. This can be extremely painful in a live situation =97 when one of=
 your DCs goes down, you often want to know *exactly* when to expect data c=
onsistency

Cassandra provides consistency when R + W > N (read replica count +
write replica count > replication factor).  If you do writes and reads
both with QUORUM, for one example, you can expect data consistency as
soon as there are enough nodes for a quorum (which may not even
require the DC to be online).  That is not "impossible to tell" at
all.

> It=92s important to note that Cassandra relies on high-speed fiber betwee=
n datacenters.

Simply flat-out wrong.

> If your writes are taking 1 or 2 ms, that=92s fine. But when a DC goes ou=
t and you have to revert to a secondary one in China instead of 20 miles aw=
ay, the incredible latency will lead to write timeouts and highly inconsist=
ent data.

Sure, "incredible" latency of 100ms or so is bad, but it's not the end
of the world, and won't cause either write timeouts or inconsistent
data, assuming that you are in fact using R + W > N.