cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Cruz <>
Subject Advice on architecture
Date Tue, 27 Mar 2012 17:10:04 GMT

I'm developing a system that will require me to store large (<=4MB) columns in Cassandra.
Right now I'm storing 1 column per row, in a single CF. The machines I have at my disposal
are 32GB RAM machines with 10 SATA drives each. I would prefer to have a larger number of
smaller nodes, but this is what I have to work with. Some issues that I have are: RAID0 Vs
separate data dirs, and SizeTiered compaction Vs Leveled compaction. I will have approximately
2 times more writes than reads.

RAID0 would help me use more efficiently the total disk space available at each node, but
tests have shown that under write load it behaves much worse than using separate data dirs,
one per disk. I used a 3-node cluster, and the node with RAID0 kept getting behind the other
two nodes which had separate data dirs. The problem with separate data dirs is that it seems
to be difficult for Cassandra to use the space efficiently due to the compactions. I first
tried the new Leveled compactions scheme, which seemed promising since it would create "small"
files that could be scattered by the data dirs, but the IO necessary for this compaction scheme
is enormous under write load. It was constantly working and it affected the write throughput
because it slowed the flushing of memtables. I then tried tiered compaction and it performed
better, but as it tends to create large SSTables they cannot be split across the multiple
data dirs.

What I'm thinking of doing now is using multiple data dirs, with tiered compaction, and dividing
the input data in several (64) different CFs. This way smaller SSTables will be created and
these can be split across the multiple data dirs. This will allow me to better use the available
capacity and I will not need as much free space for compactions than I would if the SSTables
were larger.

Am I missing something here? Is this the best way to deal with this (abnormal) use case?

Thanks and best regards,
André Cruz
View raw message