incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamar Fraenkel <ta...@tok-media.com>
Subject Re: High CPU usage during repair
Date Mon, 11 Feb 2013 09:40:36 GMT
Thank you very much! Due to monetary limitations I will keep the m1.large
for now, but try the throughput modification.
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Mon, Feb 11, 2013 at 11:30 AM, aaron morton <aaron@thelastpickle.com>wrote:

>  What machine size?
>>
> m1.large
>
> If you are seeing high CPU move to an m1.xlarge, that's the sweet spot.
>
> That's normally ok. How many are waiting?
>>
>> I have seen 4 this morning
>
> That's not really abnormal.
> The pending task count goes when when a file *may* be eligible for
> compaction, not when there is a compaction task waiting.
>
> If you suddenly create a number of new SSTables for a CF the pending count
> will rise, however one of the tasks may compact all the sstables waiting
> for compaction. So the count will suddenly drop as well.
>
> Just to make sure I understand you correctly, you suggest that I change
> throughput to 12 regardless of whether repair is ongoing or not. I will do
> it using nodetool and change the yaml file in case a restart will occur in
> the future?
>
> Yes.
> If you are seeing performance degrade during compaction or repair try
> reducing the throughput.
>
> I would attribute most of the problems you have described to using
> m1.large.
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/02/2013, at 9:16 AM, Tamar Fraenkel <tamar@tok-media.com> wrote:
>
> Hi!
> Thanks for the response.
> See my answers and questions below.
> Thanks!
> Tamar
>
>  *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> <tokLogo.png>
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
> On Sun, Feb 10, 2013 at 10:04 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> During repair I see high CPU consumption,
>>
>> Repair reads the data and computes a hash, this is a CPU intensive
>> operation.
>> Is the CPU over loaded or is just under load?
>>
>  Usually just load, but in the past two weeks I have seen CPU of over 90%!
>
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>>
>> What machine size?
>>
> m1.large
>
>>
>> there are compactions waiting.
>>
>> That's normally ok. How many are waiting?
>>
>> I have seen 4 this morning
>
>> I thought of adding a call to my repair script, before repair starts to
>> do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>>
>> That will remove throttling on compaction and the validation compaction
>> used for the repair. Which may in turn add additional IO load, CPU load and
>> GC pressure. You probably do not want to do this.
>>
>> Try reducing the compaction throughput to say 12 normally and see the
>> effect.
>>
>> Just to make sure I understand you correctly, you suggest that I change
> throughput to 12 regardless of whether repair is ongoing or not. I will do
> it using nodetool and change the yaml file in case a restart will occur in
> the future?
>
>> Cheers
>>
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 11/02/2013, at 1:01 AM, Tamar Fraenkel <tamar@tok-media.com> wrote:
>>
>> Hi!
>> I run repair weekly, using a scheduled cron job.
>> During repair I see high CPU consumption, and messages in the log file
>> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
>> 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
>> 3894411264"
>> From time to time, there are also messages of the form
>> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
>> (line 607) 1 READ messages dropped in last 5000ms"
>>
>> Using opscenter, jmx and nodetool compactionstats I can see that during
>> the time the CPU consumption is high, there are compactions waiting.
>>
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>> I have the default settings:
>> compaction_throughput_mb_per_sec: 16
>> in_memory_compaction_limit_in_mb: 64
>> multithreaded_compaction: false
>> compaction_preheat_key_cache: true
>>
>> I am thinking on the following solution, and wanted to ask if I am on the
>> right track:
>> I thought of adding a call to my repair script, before repair starts to
>> do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>>
>> Is this a right solution?
>> Thanks,
>> Tamar
>>
>> *Tamar Fraenkel *
>> Senior Software Engineer, TOK Media
>>
>> <tokLogo.png>
>>
>>
>> tamar@tok-media.com
>> Tel:   +972 2 6409736
>> Mob:  +972 54 8356490
>> Fax:   +972 2 5612956
>>
>>
>>
>>
>
>

Mime
View raw message