cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <...@holsman.net>
Subject Re: Cassandra versus HBase performance study
Date Thu, 04 Feb 2010 22:40:22 GMT
Hi Brian.
was there any performance changes on the other tests with v0.5 ?
the graphs on the other pages looks remarkably identical.

On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote:

> 0.5 does seem to be significantly faster - the latency is better and it provides significantly
more throughput. I'm updating my charts with new values now.
> 
> One thing that is puzzling is the scan performance. The scan experiment is to scan between
1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230
operations/sec, compared to >1400 ops/sec for other systems. The latency is quite a bit
higher. A chart with these results is here:
> 
> http://www.brianfrankcooper.net/pubs/scans.png
> 
> Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken
values that should evenly partition the data (and the amount of data in /var/cassandra/data
is about the same on all servers). I'm using get_range_slice() from Java (code snippet below).

> 
> At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies from
~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with the
busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both
in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and
the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies
are so high even in this case.
> 
> Code snippet - fields is a Set<String> listing the columns I want; recordcount
is the number of records to return.
> 
> SlicePredicate predicate;
> if (fields==null)
> {
> 	predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,1000000));
> }
> else
> {
> 	Vector<byte[]> fieldlist=new Vector<byte[]>();
> 	for (String s : fields)
> 	{
> 		fieldlist.add(s.getBytes("UTF-8"));
> 	}
> 	predicate = new SlicePredicate(fieldlist,null);
> }
> ColumnParent parent = new ColumnParent("data", null);
> 		
> List<KeySlice> results = client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
> 			
> Thanks!
> 
> Brian
> 
> ________________________________________
> From: Brian Frank Cooper
> Sent: Saturday, January 30, 2010 7:56 AM
> To: cassandra-user@incubator.apache.org
> Subject: RE: Cassandra versus HBase performance study
> 
> Good idea, we'll benchmark 0.5 next.
> 
> brian
> 
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Friday, January 29, 2010 1:13 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra versus HBase performance study
> 
> Thanks for posting your results; it is an interesting read and we are
> pleased to beat HBase in most workloads. :)
> 
> Since you originally benchmarked 0.4.2, you might be interested in the
> speed gains in 0.5.  A couple graphs here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
> 
> 0.6 (beta in a few weeks?) is looking even better. :)
> 
> -Jonathan

--
Ian Holsman
Ian@Holsman.net




Mime
View raw message