cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Stress testing disk configurations. Your thoughts?
Date Mon, 18 Apr 2011 13:24:48 GMT
Separate commitlog matters the most when you are

(a) doing mixed read/write workload (i.e. most real-world scenarios) and
(b) using full CL durability (batch mode rather than default periodic sync)

If your hot data set fits in memory, reads are about as fast as
writes. Otherwise they will be substantially slower since they have to
do random i/o.

I definitely recommend #2 over #3, btw.

On Thu, Apr 14, 2011 at 11:34 AM, Nathan Milford <> wrote:
> Ahoy,
> I'm building out a new 0.7.4 cluster to migrate our 0.6.6 cluster to.
> While I'm waiting for the dev-side to get time to work on their side of the
> project I have a 10 node cluster evenly split across two data centers (NY &
> LA) and was looking to do some testing while I could.
> My primary focus is on disk configurations.  Space isn't a huge issue, our
> current data set is ~30G on each node and I imagine that'll go up since I
> intend on tweaking the RF on the new cluster.
> Each node has 6 x 146G 10K SAS drives.  I want to test:
> 1) 6 disks in R0 where everything is written to the same stripe
> 2) 1 disk for OS+Commitlog and 5 disks in R0 for data.
> 3) 1 disk for OS+Commitlog and 5 individual disks defined
> as separate data_file_directories.
> I suspect I'll see best performance with option 3, but the issue has become
> political\religious and there are internal doubts that separating the commit
> log and data will truly improve performance despite documentation and logic
> indicating otherwise.  Thus the test :)
> Right now I've been tinkering and not being very scientific while I work out
> a testing methodology and get used to the tools.  I've just been running
> zznate's cassandra-stress against a single node and measuring the time it
> takes to read and write N rows.
> Unscientifically I've found that they all perform about the same. It is hard
> to judge because, when writing to a single node, reads take exponentially
> longer.  Writing 10M rows may take ~500 seconds, but reading will take ~5000
> seconds.  I'm sure this will even out when I test across more than one node.
> Early next week I'll be able to test against all 10 nodes with a realistic
> replication factor.
> I'd really love to hear some people's thoughts on methodologies and what I
> should be looking at/for other than iostat and the time for the test to
> inset/read.
> Thanks,
> nathan

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message