hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Time Less <timelessn...@gmail.com>
Subject Re: HBase and Cassandra on StackOverflow
Date Wed, 31 Aug 2011 05:34:51 GMT
Most of your points are dead-on.

> Cassandra is no less complex than HBase. All of this complexity is
> "hidden" in the sense that with Hadoop/HBase the layering is obvious --
> HDFS, HBase, etc. -- but the Cassandra internals are no less layered.
> Operationally, however, HBase is more complex.  Admins have to configure
> and manage ZooKeeper, HDFS, and HBase.  Could this be improved?

I strongly disagree with the premise[1]. Having personally been involved in
the Digg Cassandra rollout, and spent up until a couple months ago being in
part-time weekly contact with the Digg Cassandra administrator, and having
very close ties to the SimpleGeo Cassandra admin, I know it is a fickle
beast. Having also spent a good amount of time at StumbleUpon and Mozilla
(and now Riot Games) I also see first-hand that HBase is far more stable and
-- dare I say it? -- operationally more simple.

So okay, HBase is "harder to set up" if following a step-by-step guide on a
wiki is "hard,"[2] but it's FAR easier to administer. Cassandra is rife with
cascading cluster failure scenarios. I would not recommend running Cassandra
in a highly-available high-volume data scenario, but don't hesitate to do so
for HBase.

I do not know if this is a guaranteed (provable due to architecture) result,
or just the result of the Cassandra community being... how shall I say...
hostile to administrators. But then, to me it doesn't matter. Results do.

Tim Ellis
Data Architect, Riot Games
[1] That said, the other part of your statement is spot-on, too. It's surely
possible to improve the HBase architecture or simplify it.
[2] I went from having never set up HBase nor ever used Chef to having
functional Chef recipes that installed a functional HBase/HDFS cluster in
about 2 weeks. From my POV, the biggest stumbling point was that HDFS by
default stores critical data in the underlying filesystem's /tmp directory
by default, which is, for lack of a better word, insane. If I had to suggest
how to simplify "HBase installation," I'd ask for sane HDFS config files
that are extremely common and difficult-to-ignore.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message