cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huming Wu <>
Subject Cassandra performance
Date Mon, 17 Aug 2009 19:14:41 GMT
I did some performance test and I am not impressed :). The data set is
880K unique keys and there are 4 columns with 2 columns being string
and the other 2 are integers (from client side, to the backend it is
all byte[]). After high throughput set (very fast), 220MB are injected
via batch_insert. I restarted the cassandra and started a client
calling get_slice at 5000rps with 100 connections. Here are some
graphs over 2 days:

1. rps/qps:
2. latency:
3. CPU:
4. mem:

A couple of observations:

a) Read is too CPU intensive. With the actual peak rps around 3000,
the CPU usage is 70% already. I doubt it I can double the rps and have
the same read latency.
b) The memory footprint is too big given the data size. I used
incremental QC. I am pretty new to JAVA especially for the performance
tuning. So maybe something is not right in the setting. But here is
the JVM config:

-Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m
-XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC

The machines are 8 cores and 8G RAM.  here are some configuration
parameters (client is doing non block get_slice):

The performance is very important to us (under high throughput). I did
some preliminary test on sustained put and get and the performance is
worse. But I thought I started the report with read only first.

Any comments on those numbers?


p.s. I am using trunk as of Aug. 12

svn info
Path: .
Repository Root:
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 803947
Node Kind: directory
Schedule: normal
Last Changed Author: jbellis
Last Changed Rev: 803716
Last Changed Date: 2009-08-12 21:27:24 +0000 (Wed, 12 Aug 2009)

View raw message