incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Why the StageManager thread pools have 60 seconds keepalive time?
Date Sun, 19 Aug 2012 08:21:09 GMT
Your seeing dropped mutations reported from nodetool tpstats ? 

Take a look at the logs. Look for messages from the MessagingService with the pattern "{}
{} messages dropped in last {}ms" They will be followed by info about the TP stats.

First would be the workload. Are you sending very big batch_mutate or multiget requests? Each
row in the requests turns into a command in the appropriate thread pool. This can result in
other requests waiting a long time for their commands to get processed. 

Next would be looking for GC and checking the memtable_flush_queue_size is set high enough
(check yaml for docs). 

After that I would look at winding  concurrent_writers (and I assume concurrent_readers) back.
Anytime I see weirdness I look for config changes and see what happens when they are returned
to the default or near default.  Do you have 16 _physical_ cores?

Hope that helps. 
Aaron Morton
Freelance Developer

On 18/08/2012, at 10:01 AM, Guillermo Winkler <> wrote:

> Aaron, thanks for your answer.
> I'm actually tracking a problem where mutations get dropped and cfstats show no activity
whatsoever, I have 100 threads for the mutation pool, no running or pending tasks, but some
mutations get dropped none the less.
> I'm thinking about some scheduling problems but not really sure yet.
> Have you ever seen a case of dropped mutations with the system under light load?
> Thanks,
> Guille
> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton <> wrote:
> That's some pretty old code. I would guess it was done that way to conserve resources.
And _i think_ thread creation is pretty light weight.
> Jonathan / Brandon / others - opinions ? 
> Cheers
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> On 17/08/2012, at 8:09 AM, Guillermo Winkler <> wrote:
>> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing from the
mutation pool.
>> MutationStage:72031
>> Where threads get created and disposed in 100's batches every few minutes, since
it's a 16 core server concurrent_writes is set in 100 in the cassandra.yaml. 
>> concurrent_writes: 100
>> I've seen in the StageManager class this pools get created with 60 seconds keepalive
>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep "extra"
threads alive for when idle
>> Is it a reason for it to be this way? 
>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since corePoolSize
and maxPoolSize are set at the same size?
>> Thanks,
>> Guille

View raw message