cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Frank Cooper <coop...@yahoo-inc.com>
Subject RE: Cassandra versus HBase performance study
Date Fri, 05 Feb 2010 22:51:35 GMT
Yes, I had used the default 0.1.

These boxes have 8 GB of RAM and I was giving 6 GB to the JVM (-Xmx). Does Cassandra do a
read caching of data? It seems from the text in storage.conf that keys cache fraction refers
only to indexing the keys, not caching the content. So I would imagine increasing the keys
cached fraction would decrease the memory used for data caching; I wonder what effect that
would have on performance. Anyways, I will put it on the stack of "experiments to try."

Brian

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Friday, February 05, 2010 2:07 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

Great!

Are you using the default keys cache fraction of 0.01?  If you have
the room on your JVM heap, I'd recommend 0.2 or more.  (This will only
affect read performance, not write.)

-Jonathan

On Fri, Feb 5, 2010 at 3:54 PM, Brian Frank Cooper
<cooperb@yahoo-inc.com> wrote:
> Yes, 0.5 is significantly faster. I have uploaded a new PDF of the slides, so if you
grab it again and look at slides 16 and 17 you'll see some direct comparisons of 0.4.2 and
0.5. The older slides (9 and 10) still reflect 0.4.2. At some point I'll replace those with
data from version 0.5, as well as update the paper; I just haven't gotten to it yet...
>
> Brian
>
> -----Original Message-----
> From: Ian Holsman [mailto:ian@holsman.net]
> Sent: Thursday, February 04, 2010 2:40 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra versus HBase performance study
>
> Hi Brian.
> was there any performance changes on the other tests with v0.5 ?
> the graphs on the other pages looks remarkably identical.
>
> On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote:
>
>> 0.5 does seem to be significantly faster - the latency is better and it provides
significantly more throughput. I'm updating my charts with new values now.
>>
>> One thing that is puzzling is the scan performance. The scan experiment is to scan
between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about
230 operations/sec, compared to >1400 ops/sec for other systems. The latency is quite a
bit higher. A chart with these results is here:
>>
>> http://www.brianfrankcooper.net/pubs/scans.png
>>
>> Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken
values that should evenly partition the data (and the amount of data in /var/cassandra/data
is about the same on all servers). I'm using get_range_slice() from Java (code snippet below).
>>
>> At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies
from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with
the busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both
in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and
the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies
are so high even in this case.
>>
>> Code snippet - fields is a Set<String> listing the columns I want; recordcount
is the number of records to return.
>>
>> SlicePredicate predicate;
>> if (fields==null)
>> {
>>       predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,1000000));
>> }
>> else
>> {
>>       Vector<byte[]> fieldlist=new Vector<byte[]>();
>>       for (String s : fields)
>>       {
>>               fieldlist.add(s.getBytes("UTF-8"));
>>       }
>>       predicate = new SlicePredicate(fieldlist,null);
>> }
>> ColumnParent parent = new ColumnParent("data", null);
>>
>> List<KeySlice> results = client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
>>
>> Thanks!
>>
>> Brian
>>
>> ________________________________________
>> From: Brian Frank Cooper
>> Sent: Saturday, January 30, 2010 7:56 AM
>> To: cassandra-user@incubator.apache.org
>> Subject: RE: Cassandra versus HBase performance study
>>
>> Good idea, we'll benchmark 0.5 next.
>>
>> brian
>>
>> -----Original Message-----
>> From: Jonathan Ellis [mailto:jbellis@gmail.com]
>> Sent: Friday, January 29, 2010 1:13 PM
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: Cassandra versus HBase performance study
>>
>> Thanks for posting your results; it is an interesting read and we are
>> pleased to beat HBase in most workloads. :)
>>
>> Since you originally benchmarked 0.4.2, you might be interested in the
>> speed gains in 0.5.  A couple graphs here:
>> http://spyced.blogspot.com/2010/01/cassandra-05.html
>>
>> 0.6 (beta in a few weeks?) is looking even better. :)
>>
>> -Jonathan
>
> --
> Ian Holsman
> Ian@Holsman.net
>
>
>
>

Mime
View raw message