hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fondermann <bernd.fonderm...@googlemail.com>
Subject Re: HBase and Cassandra on StackOverflow
Date Tue, 30 Aug 2011 11:29:25 GMT
On Tue, Aug 30, 2011 at 11:47, Andrew Purtell <apurtell@apache.org> wrote:
> Hi Chris,
>
> Appreciate your answer on the post.
>
> Personally speaking however the endless Cassandra vs. HBase discussion is tiresome and
rarely do blog posts or emails in this regard shed any light. Often, Cassandra proponents
mis-state their case out of ignorance of HBase or due to commercial or personal agendas. It
is difficult to find clear eyed analysis among the partisans. I'm not sure it will make any
difference posting a rebuttal to some random thing jbellis says. Better to focus on improving
HBase than play whack a mole.
>
>
> Regarding some of the specific points in that post:
>
> HBase is proven in production deployments larger than the largest publicly reported Cassandra
cluster, ~1K versus 400 or 700 or somesuch. But basically this is the same order of magnitude,
with HBase having a slight edge. I don't see a meaningful difference here. Stating otherwise
is false.
>
> HBase supports replication between clusters (i.e. data centers). I believe, but admit
I'm not super familiar with the Cassandra option here, that the main difference is HBase provides
simple mechanism and the user must build a replication architecture useful for them; while
Cassandra attempts to hide some of that complexity. I do not know if they succeed there, but
large scale cross data center replication is rarely one size fits all so I doubt it.
>
> Cassandra does not have strong consistency in the sense that HBase provides. It can provide
strong consistency, but at the cost of failing any read if there is insufficient quorum. HBase/HDFS
does not have that limitation. On the other hand, HBase has its own and different scenarios
where data may not be immediately available. The differences between the systems are nuanced
and which to use depends on the use case requirements.
>
> Cassandra's RandomPartitioner / hash based partitioning means efficient MapReduce or
table scanning is not possible, whereas HBase's distributed ordered tree is naturally efficient
for such use cases, I believe explaining why Hadoop users often prefer it. This may or may
not be a problem for any given use case. Using an ordered partitioner with Cassandra used
to require frequent manual rebalancing to avoid blowing up nodes. I don't know if more recent
versions still have this mis-feature.
>
> Cassandra is no less complex than HBase. All of this complexity is "hidden" in the sense
that with Hadoop/HBase the layering is obvious -- HDFS, HBase, etc. -- but the Cassandra internals
are no less layered. An impartial analysis of implementation and algorithms will reveal that
Cassandra's theory of operation in its full detail is substantially more complex. Compare
the BigTable and Dynamo papers and this is clear. There are actually more opportunities for
something to go wrong with Cassandra.
>
> While we are looking at codebases, it should be noted that HBase has substantially more
unit tests.
>
> With Cassandra, all RPC is via Thrift with various wrappers, so actually all Cassandra
clients are second class in the sense that jbellis means when he states "Non-Java clients
are not second-class citizens".
>
> The master-slave versus peer-to-peer argument is larger than Cassandra vs. HBase, and
not nearly as one sided as claimed. The famous (infamous?) global failure of Amazon's S3 in
2008, a fully peer-to-peer system, due to a single flipped bit in a gossip message demonstrates
how in peer to peer systems every node can be a single point of failure. There is no obvious
winner, instead, a series of trade offs. Claiming otherwise is intellectually dishonest. Master-slave
architectures seem easier to operate and reason about in my experience. Of course, I'm partial
there.
>
> I have just scratched the surface.

+1, insightful.

Thanks for posting this.

  Bernd

Mime
View raw message