incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.
Date Fri, 23 Nov 2012 01:16:05 GMT
>  From what I know having too much data on one node is bad, not really sure why, but  I
think that performance will go down due to the size of indexes and bloom filters (I may be
wrong on the reasons but I'm quite sure you can't store too much data per node).
If you have many hundreds of millions of rows on a node the memory needed for bloom filters
and index sampling can be significant. These can both be tuned. 

If you have 1.1T per node the time to do a compaction, repair or upgrade may be very significant.
Also the time taken to copy this data should you need to remove or replace a node may be prohibitive.


> 2. Switch to Leveled compaction strategy.
I would avoid making a change like that on an unstable / at risk system. 

> - Our usage pattern is write once, read once (export) and delete once!

 The column TTL may be of use to you, it removes the need to do a delete. 

> - We were thinking of relying on the automatic minor compactions to free up space for
us but as..
There are some usage patterns which make life harder for STS. For example if you have very
long lived rows that are written to and deleted a lot. Row fragments that have been around
for a while will end up in bigger files, and these files get compacted less often. 

In this situation, if you are running low on disk space and you think there is a lot of deleted
data in there, I would run a major compaction. A word or warning though, if do this you will
need to continue to do it regularly. Major compaction creates a single big file, that will
not get compaction often. There are ways to resolve this, and moving to LDB may help in the
future.  

If you are stuck and worried about disk space it's what I would do. Once you are stable again
then look at LDB http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/11/2012, at 9:18 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Hi Alexandru,
> 
> "We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per node for
the data dir and separate disk for the commitlog, 12 cores, 24 GB RAM"
> 
> I think you should tune your architecture in a very different way. From what I know having
too much data on one node is bad, not really sure why, but  I think that performance will
go down due to the size of indexes and bloom filters (I may be wrong on the reasons but I'm
quite sure you can't store too much data per node).
> 
> Anyway, I am 6 nodes with half of these resources (6 cores / 12GB) would be better if
you have the choice.
> 
> "(12GB to Cassandra heap)."
> 
> The max heap recommanded is 8GB because if you use more than these 8GB the Gc jobs will
start decreasing your performance.
> 
> "We now have 1.1 TB worth of data per node (RF = 2)."
> 
> You should use RF=3 unless one out of consistency or SPOF  doesn't matter to you.
> 
> With RF=2 you are obliged to write at CL.one to remove the single point of failure.
> 
> "1. Start issuing regular major compactions (nodetool compact).
>      - This is not recommended: 
>             - Stops minor compactions.
>             - Major performance hit on node (very bad for us because need to be taking
data all the time)."
> 
> Actually, major compaction *does not* stop minor compactions. What happens is that due
to the size of the size of the sstable that remains after your major compaction, it will never
be compacted with the upcoming new sstables, and because of that, your read performance will
go down until you run an other major compaction.
> 
> "2. Switch to Leveled compaction strategy.
>       - It is mentioned to help with deletes and disk space usage. Can someone confirm?"
> 
> From what I know, Leveled compaction will not free disk space. It will allow you to use
a greater percentage of your total disk space (50% max for sized tier compaction vs about
80% for leveled compaction)
> 
> "Our usage pattern is write once, read once (export) and delete once! "
> 
> In this case, I think that leveled compaction fits your needs.
> 
> "Can anyone suggest which (if any) is better? Are there better solutions?"
> 
> Are your sstable compressed ? You have 2 types of built-in compression and you may use
them depending on the model of each of your CF.
> 
> see: http://www.datastax.com/docs/1.1/operations/tuning#configure-compression
> 
> Alain
> 
> 2012/11/22 Alexandru Sicoe <adsicoe@gmail.com>
> We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per node for the
data dir and separate disk for the commitlog, 12 cores, 24 GB RAM (12GB to Cassandra heap).
> 


Mime
View raw message