Thanks Maki and Tyler. 

Re: Q1: I think its the time for me to think about LevelCompaction at this time. But I'm happy to know I can run major compactions as often as I like if I can afford. 

Re: Q2: Other than the high IO impact if there won't be any data corruption/consistency issues I think I can afford this too. 

Thanks,
Eran Chinthaka Withana


On Wed, Feb 29, 2012 at 7:17 PM, Tyler Hobbs <tyler@datastax.com> wrote:
At this point, using LeveledCompaction is a much better way to have good guarantees about how many sstables your reads will hit (and thus better latency guarantees) than SizeTiered with periodic major compactions.


On Wed, Feb 29, 2012 at 8:49 PM, Maki Watanabe <watanabe.maki@gmail.com> wrote:
DataStax has not recommend to run major compaction now:
 http://www.datastax.com/docs/1.0/operations/tuning
But if you can afford it, major compaction will improve read latency as you see.

Major compaction is expensive, so you will not want to run it during
high traffic hours. And you should not run it more than 1 node in
replicas same time. You should not run repair and major compaction in
same time in same (affected) node, because both of the tasks require
massive io.
With these constraints, as often as you run major compaction, you will
get better read latency.

2012/3/1 Eran Chinthaka Withana <eran.chinthaka@gmail.com>:
> Hi,
>
> I have two questions on major compactions (the ones user initiate using
> nodetool) and I really appreciate if someone can help.
>
> 1. I've noticed that when I run compactions the read latency improves even
> more than I expected (which is good :) ) The improvement is so tempting that
> I'd like to run this almost every week :). I understand after a compaction
> Cassandra will create one giant SSTable and if something happens to it
> things can go little bit crazy. So from your experience how often should we
> be running compactions? What parameters will influence this frequency?
>
> 2. I'm thinking scheduling compactions using a cron job. But the issue is I
> scheduled repairs also using a cronjob to run once in GC Period (of default
> 10 days). Now the obvious question is what will happen if a node is running
> both the compactions AND the repair at the same time? Is this something we
> should avoid at all costs? What will be the implications?
>
> Thanks,
> Eran Chinthaka Withana
>



--
w3m



--
Tyler Hobbs
DataStax