incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cassandra versus HBase performance study
Date Sun, 21 Feb 2010 22:18:18 GMT
On Wed, Feb 3, 2010 at 7:45 PM, Brian Frank Cooper
<cooperb@yahoo-inc.com> wrote:
> One thing that is puzzling is the scan performance. The scan experiment is to scan between
1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230
operations/sec, compared to >1400 ops/sec for other systems. The latency is quite a bit
higher. A chart with these results is here:
>
> http://www.brianfrankcooper.net/pubs/scans.png
>
> Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken
values that should evenly partition the data (and the amount of data in /var/cassandra/data
is about the same on all servers). I'm using get_range_slice() from Java (code snippet below).

This got some attention for 0.6, since we have added Hadoop support in
that release.  (0.6 is branched now, Beta / RC coming soon.)  Turns
out the (or more likely:: "a" :) main bottleneck was, our memtables
were not kept ordered by key, so it had to sort them for each range
query.  Switching from NonBlockingHashMap to ConcurrentSkiplistMap
made things much faster.  (CASSANDRA-799)

We're planning on optimizing this more for 0.7, and we've added range
queries to our stress test tool (CASSANDRA-765) for that.

-Jonathan

Mime
View raw message