incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Cassandra performance
Date Sat, 18 Sep 2010 01:46:06 GMT
It appears you are doing several things that assure terrible
performance, so I am not surprised you are getting it.

On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo <kgs4242@gmail.com> wrote:
> My main tool was stress.py for benchmarks (or equivalent written in
> C++ to deal with python2.5 lack of multiprocessing). I will focus only
> on reads (random with normal distribution, what is default in
> stress.py) because writes were /quite/ good.
>
> I have 8 machines (xen quests with dedicated pair of 2TB SATA disks
> combined in RAID-O for every guest). Every machine has 4 individual
> cores of 2.4 Ghz and 4GB RAM.
>

First problem: I/O in Xen is very poor and Cassandra is generally very
sensitive to I/O performance.

> Cassandra commitlog and data dirs were on the same disk,

This is not recommended if you want best performance.  You should have
a dedicated commitlog drive.

> I gave 2.5GB
> for Heap for Cassandra, key and row cached were disabled (standard
> Keyspace1 schema, all tests use Standard1 CF).
> All other options were
> defaults. I've disabled cache because I was testing random (or semi
> random - normal distribution) reads so it wouldnt help so much (and
> also because 4GB of RAM is not a lot).
>

Disabling row cache in this case makes sense, but disabling key cache
is probably hurting your performance quite a bit.  If you wrote 20GB
of data per node, with narrow rows as you describe, and had default
memtable settings, you now have a huge number of sstables on disk.
You did not indicate you use nodetool compact to trigger a major
compaction, so I'm assuming you did not.

> For first test I installed Cassandra on only one machine to test it
> and remember results for further comparisons with large cluster and
> other DBs.
>
> 1) RF was set to 1. I've inserted ~20GB of data (this is number
> reported in load column form nodetool ring output) using stress.py
> (100 colums per row). Then I've tested reads and got 200 rows/second
> (reading 100 columns per row, CL=ONE, disks were bottleneck, util was
> 100%). There was no other operation pending during reads (compaction,
> insertion, etc..).
>

This is normal behavior under random reads for _any_ data base.  If
the dataset can't fit in RAM, you are I/O bound.  I don't know why you
would expect anything else.  You did not indicate your disk access
mode, but if it is mmap and you are not using code that calls
mlockall, then with that size dataset you are almost certainly
swapping, as well.  You can check that with vmstat.

Given the combination of very little RAM in comparison to the data
set, very little disk I/O, key caching disabled, a large number of
sstables, and likely mmap I/O without mlockall, you have created about
the worst possible setup.  If you are _actually_ dealing with that
much data AND random reads, then you either need enough RAM to hold it
all, or you need SSDs.  And that is not specific to Cassandra.

If you are saying you have similarly misconfigured MySQL and still
gotten better performance, then kudos.  You are very lucky.


b

Mime
View raw message