incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: High CPU usage during repair
Date Mon, 11 Feb 2013 09:30:11 GMT
> What machine size?
> m1.large 
If you are seeing high CPU move to an m1.xlarge, that's the sweet spot. 

> That's normally ok. How many are waiting?
> 
> I have seen 4 this morning 
That's not really abnormal. 
The pending task count goes when when a file *may* be eligible for compaction, not when there
is a compaction task waiting. 

If you suddenly create a number of new SSTables for a CF the pending count will rise, however
one of the tasks may compact all the sstables waiting for compaction. So the count will suddenly
drop as well. 

> Just to make sure I understand you correctly, you suggest that I change throughput to
12 regardless of whether repair is ongoing or not. I will do it using nodetool and change
the yaml file in case a restart will occur in the future? 
Yes. 
If you are seeing performance degrade during compaction or repair try reducing the throughput.


I would attribute most of the problems you have described to using m1.large. 

Cheers
 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 9:16 AM, Tamar Fraenkel <tamar@tok-media.com> wrote:

> Hi!
> Thanks for the response.
> See my answers and questions below.
> Thanks!
> Tamar
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> <tokLogo.png>
> 
> tamar@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
> 
> 
> On Sun, Feb 10, 2013 at 10:04 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> During repair I see high CPU consumption, 
> Repair reads the data and computes a hash, this is a CPU intensive operation.
> Is the CPU over loaded or is just under load?
>  Usually just load, but in the past two weeks I have seen CPU of over 90%!
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> 
> What machine size?
> m1.large 
> 
>> there are compactions waiting.
> That's normally ok. How many are waiting?
> 
> I have seen 4 this morning 
>> I thought of adding a call to my repair script, before repair starts to do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
> That will remove throttling on compaction and the validation compaction used for the
repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do
not want to do this. 
> 
> Try reducing the compaction throughput to say 12 normally and see the effect.
> 
> Just to make sure I understand you correctly, you suggest that I change throughput to
12 regardless of whether repair is ongoing or not. I will do it using nodetool and change
the yaml file in case a restart will occur in the future? 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/02/2013, at 1:01 AM, Tamar Fraenkel <tamar@tok-media.com> wrote:
> 
>> Hi!
>> I run repair weekly, using a scheduled cron job.
>> During repair I see high CPU consumption, and messages in the log file
>> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for
ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264"
>> From time to time, there are also messages of the form
>> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607)
1 READ messages dropped in last 5000ms"
>> 
>> Using opscenter, jmx and nodetool compactionstats I can see that during the time
the CPU consumption is high, there are compactions waiting.
>> 
>> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>> I have the default settings:
>> compaction_throughput_mb_per_sec: 16
>> in_memory_compaction_limit_in_mb: 64
>> multithreaded_compaction: false
>> compaction_preheat_key_cache: true
>> 
>> I am thinking on the following solution, and wanted to ask if I am on the right track:
>> I thought of adding a call to my repair script, before repair starts to do:
>> nodetool setcompactionthroughput 0
>> and then when repair finishes call
>> nodetool setcompactionthroughput 16
>> 
>> Is this a right solution?
>> Thanks,
>> Tamar
>> 
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>> 
>> <tokLogo.png>
>> 
>> 
>> tamar@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>> 
>> 
> 
> 


Mime
View raw message