incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ?
Date Wed, 18 Sep 2013 19:15:55 GMT
 1.  Always in cassandra up your file descriptor limits on linux and even in 0.7 that was the
recommendation so cassandra could open tons of files
 2.  We use 50M for our LCS with no performance issues.  We had it 10M on our previous with
no issues but a huge amount of files of course with our 300T per node.

Dean

From: Jayadev Jayaraman <jdisalive@gmail.com<mailto:jdisalive@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, September 18, 2013 1:02 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy
?

We have set up a 24 node (m1.xlarge nodes, 1.7 TB per node) cassandra cluster on Amazon EC2
:

version=1.2.9
replication factor = 2
snitch=EC2Snitch
placement_strategy=NetworkTopologyStrategy (with 12 nodes each in 2 availability zones)

Background on our use-case :

We plan on using hadoop with sstableloader to load 10GB+ of analytics data per day ( 100 million+
row keys, 5 or so columns per day on average.) . We have chosen LeveledCompactionStrategy
in the hope that it constrains the number of SSTables that are read in order to retrieve a
sliced-predicate for a row. We don't want too many file-sockets ( > 1000) open to SSTables
by the Cassandra JVM as this has caused us network / unreachability issues before. We faced
this when we were on cassandra 0.8.9 and we were using SizeTieredCompactionStrategy and in
order to mitigate this, we ran minor compaction daily and major compaction semi-regularly
to ensure as few SSTable files as possible on disk.





If we use LeveledCompactionStrategy with a small value for sstable_size_in_mb ( default =
5 MB ) , wouldn't that result in a very large number of SSTable files on disk ? How does that
affect the number of file-sockets open (reading the docs, I get the impression that the number
of SSTable seeks per query is reduced by a large margin) ? But if we use a larger value for
sstable_size_in_mb, say around 200 MB, there will be 800 MB of small uncompacted SSTables
on disk per column-family to which there will inevitably be file-sockets open.

All in all, can someone help us figure out what we should set the value of sstable_size_in_mb
to ? I figure it's not a very good idea to set it to a larger value but I don't know how things
perform if we set it to a small value. Do we have to run major compaction regularly in this
case too ?

Thanks
Jayadev



Mime
View raw message