From Ian Varley <ivar...@salesforce.com>
Subject Re: write throughput in cassandra, understanding hbase
Date Tue, 22 Jan 2013 19:23:45 GMT
One notable design difference is that in Cassandra, every piece of data is handled by a quorum
of independent peers, whereas in HBase, every piece of data is handled by a single process
at any given time (the RegionServer). HBase deals with data replication by delegating the
actual file storage down to the underlying distributed file system (HDFS) which makes its
own replicas in a pipelined way (typically, 3 of them).

In Cassandra, by contrast, the client deals directly with multiple replicas, and a common
configuration is to "ACK" a write back to the client as successful as soon as 1 of the N replica
you're sending it is successful. You can also wait to get more ACKs, for stronger guarantees:
if the # of write ACKs you wait for (W), plus the total # of replicas you read from for a
successful read (R) exceeds the total number of replicas for any datum (N), then your data
is fully consistent. If not, then it's "eventually consistent". As you'd imagine, this is
faster than being fully consistent (but, generally speaking, harder to program against, because
there are many more possible failure scenarios to think of).

You can, of course, change most of these parameters, in either system; which is why it's really
important to know if you're comparing apples to apples. :) One key difference, though, is
that there's no "Eventual Consistency" option in HBase: writes are always atomic and consistent
(to a single row). That's also why you can do stuff like atomic increment, check & put,


On Jan 22, 2013, at 1:12 PM, S Ahmed wrote:

Thanks, I think Lars's comment hints to what might be one reason.

I don't have a cluster setup to test, I'm really an enthusiast (I'm
currently going through the codebase and trying to get a low level feel for
what's going on) and want to know what the possible technical reason is
(both cassandra and hbase are designed differently, so was curious what
could be at the root of the issue).

I'm not here to start a flame war or anything so please don't take it that

Where do you see that HBase is doing only 2-3k writes/s?
I must have mis-read it or that was from another benchmark.

What I was thinking is that designs have tradeoffs, and possible
cassandra's design was built where write throughput was more important, at
the cost of x, while hbase's design was more suited for y (which maybe
range scans is?).....

On Tue, Jan 22, 2013 at 2:06 PM, Kevin O'dell <kevin.odell@cloudera.com<mailto:kevin.odell@cloudera.com>>wrote:

Hi S Ahmed,

 How are you today?  I wanted to echo what Lars said Most of these tests
have an agenda.  With that being said, have you done an of your own
internal testing?  If so do you have configs, row keys, or results that you
can share with us so that we can help you tune your cluster for success?

On Tue, Jan 22, 2013 at 2:03 PM, lars hofhansl <larsh@apache.org<mailto:larsh@apache.org>>

Where do you see that HBase is doing only 2-3k writes/s?
How was the data distributed? Was the table split?
Cassandra uses a random partitioner by default, which will nicely
distribute the data over the cluster but won't allow to perform range
over your data.
HBase always partitions by key ranges, so that the keys can the range
scanned. If that is not done correctly and you create monotonically
increasing keys, you'll hotspot a single region server.

Even then, you can do more than this on single RegionServer.

Also note that many of the benchmarks have agendas and cherry pick the
They probably "forgot" to disabled Nagle's and to distribute the table

-- Lars

From: S Ahmed <sahmed1020@gmail.com<mailto:sahmed1020@gmail.com>>
To: user@hbase.apache.org<mailto:user@hbase.apache.org>
Sent: Tuesday, January 22, 2013 10:38 AM
Subject: RE: write throughput in cassandra, understanding hbase

I've read articles online where I see cassandra doing like 20K writers
second, and hbase around 2-3K.

I understand both systems have their strenghts, but I am curious as to
is holding hbase from reaching similiar results?

Is it HDFS that is the issue?  Or hbase does certain things (to its
advantage) that slows the write path down?

Kevin O'Dell
Customer Operations Engineer, Cloudera

