cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huming Wu <huming...@gmail.com>
Subject Cassandra performance
Date Mon, 17 Aug 2009 19:14:41 GMT
I did some performance test and I am not impressed :). The data set is
880K unique keys and there are 4 columns with 2 columns being string
and the other 2 are integers (from client side, to the backend it is
all byte[]). After high throughput set (very fast), 220MB are injected
via batch_insert. I restarted the cassandra and started a client
calling get_slice at 5000rps with 100 connections. Here are some
graphs over 2 days:

1. rps/qps:  http://farm3.static.flickr.com/2585/3831093496_068b90caa0_o.png
2. latency:  http://farm4.static.flickr.com/3421/3830297179_8decd66e34_o.png
3. CPU: http://farm4.static.flickr.com/3432/3831093584_b5bd459f55_o.png
4. mem: http://farm4.static.flickr.com/3526/3830356879_d09ac2695c_o.png

A couple of observations:

a) Read is too CPU intensive. With the actual peak rps around 3000,
the CPU usage is 70% already. I doubt it I can double the rps and have
the same read latency.
b) The memory footprint is too big given the data size. I used
incremental QC. I am pretty new to JAVA especially for the performance
tuning. So maybe something is not right in the setting. But here is
the JVM config:

-Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m
-XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode

The machines are 8 cores and 8G RAM.  here are some configuration
parameters (client is doing non block get_slice):
    <ReplicationFactor>2</ReplicationFactor>
    <MemtableSizeInMB>1024</MemtableSizeInMB>
    <MemtableObjectCountInMillions>2</MemtableObjectCountInMillions>
    <KeysCachedFraction>1</KeysCachedFraction>
    <ConcurrentReads>8</ConcurrentReads>
    <ConcurrentWrites>32</ConcurrentWrites>

The performance is very important to us (under high throughput). I did
some preliminary test on sustained put and get and the performance is
worse. But I thought I started the report with read only first.

Any comments on those numbers?

Thanks,
Huming

p.s. I am using trunk as of Aug. 12

svn info
Path: .
URL: https://svn.apache.org/repos/asf/incubator/cassandra/trunk
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 803947
Node Kind: directory
Schedule: normal
Last Changed Author: jbellis
Last Changed Rev: 803716
Last Changed Date: 2009-08-12 21:27:24 +0000 (Wed, 12 Aug 2009)

Mime
View raw message