This is already a lot better. While compacting, the cpu load remain quite low. However, I still have some spikes of overload generating timeouts. Is there some others tunes I can do to make this compaction more stable ?

2011/11/22 Jonathan Ellis <jbellis@gmail.com>
m1.small is still... small.  start by turning
compaction_throughput_mb_per_sec all the way down to 1MB/s.

On Tue, Nov 22, 2011 at 9:58 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
> I followed your advice and install a 3 m1.small instance cluster. The
> problem is still there. I've got less timeouts because I have less
> compaction due to a bigger amount of memory usable before flushing, but when
> a compaction starts, I can reach 95% of the cpu used, which produces
> timeouts. The compaction run faster, so I have less time out but they are
> still some.
> Is there really no way to turn compaction into a background and low CPU
> consumption task ?
> What kind of information can I give you to help you understanding what is
> going on with these timeouts ?
>
> 2011/11/15 Dan Hendry <dan.hendry.junk@gmail.com>
>>
>> I really don’t recommend using t1.micros. The problem with them is that
>> they have CPU bursting, basically meaning you get lots of CPU resources for
>> a short time but if you use more than you have been allocated you get
>> basically nothing for 10+ seconds afterwards. By ‘basically nothing’ I
>> really mean that – the machine is effectively dead. The biggest problem with
>> this (which we found out the hard way, within a test environment thankfully)
>> is that it makes capacity planning extremely difficult – the line between
>> having a cluster with sufficient capacity and being overloaded is extremely
>> abrupt and very difficult to see coming. Moreover once you are over
>> capacity, the ‘dead periods caused’ by CPU bursting cause things spiral out
>> of control rapidly due to overtly aggressive client retries and hinted
>> handoff increasing overall load (although the HH problem might have improved
>> with 1.0.x). I would recommend m1.smalls at the very least.
>>
>>
>>
>> If you are set on micros, make sure you only ever trigger compaction on
>> one node at a time (or better, consider if you even need to trigger major
>> compactions at all), set compaction_throughput_mb_per_sec (cassandra.yaml)
>> as low as you possibly can (1 is the minimum I believe), try disabling
>> hinted handoff (on all nodes), and use lower read/write consistency levels
>> if you can.
>>
>>
>>
>> Dan
>>
>>
>>
>> From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
>> Sent: November-15-11 6:34
>> To: user@cassandra.apache.org
>> Subject: Compaction -> CPU load 100% -> time out
>>
>>
>>
>> Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro.
>>
>>
>>
>> I managed to fix some OOM I had, but I still have some spike of cpu load.
>>
>>
>>
>> I know that t1.micro have small resources, but I think it could be enough
>> if they were well managed.
>>
>>
>>
>> My application works well, excepted when cassandra need to run a
>> compaction on a node. To do it, Cassandra uses 100% of the cpu, generating a
>> lot of time out. My time out is configured to 250 ms with 2 attempt max. I'm
>> running in production, our actual system use MySQL and we are trying to
>> replace MySQLwith Cassandra. Cassandra musn't slow down the production
>> environnement while we use both DB in parallel, that is why I can't increase
>> the time before a time out.
>>
>>
>>
>> Running this compaction in background somehow could be a good idea, after
>> my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS
>> -Dcassandra.compaction.priority=1" to the cassandra-env.sh
>>
>>
>>
>> This option was added for Cassandra 0.6.3, is it still usefull ? It
>> doesn't resolve my problem.
>>
>>
>>
>> Anyways, this doesn't help while performing a nodetool repair, the cpu
>> load is still 100%.
>>
>>
>>
>> Is there a way to turn these exceptional tasks into backgrounds tasks,
>> using only available cpu ?
>>
>>
>>
>> Is there a way to get Cassandra working properly on EC2 t1.micros ?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Alain
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11
>> 14:34:00
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com