incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate McCall <zznat...@gmail.com>
Subject Re: about FlushWriter "All time blocked"
Date Thu, 27 Jun 2013 19:52:17 GMT
Non-zero for pending tasks is too transient. Try monitoring tpstats
with a (much) higher frequency and look for sustained threshold over a
duration.

Then, using a percentage of the configuration values for the max - 75%
of memtable_flush_queue_size in this case - alert when it has been
higher than '3' for more than N time. (Start with N=60 seconds and go
from there).

Also, that is a very high 'all time blocked' to 'completed' ratio for
FlushWriter. If iostat is happy, i'd do as Aaron suggested above and
turn up the memtable_flush_queue_size and play around with turning up
memtable_flush_writers (incrementally and separately for both of
course so you can see the effect).

On Thu, Jun 27, 2013 at 2:27 AM, Arindam Barua <abarua@247-inc.com> wrote:
> In our performance tests, we are seeing similar FlushWriter, MutationStage, MemtablePostFlusher
pending tasks become non-zero. We collect snapshots every 5 minutes, and they seem to clear
after ~10-15 minutes though. (The flush writer has an 'All time blocked' count of 540 in the
below example).
>
> We do not use secondary indexes or snapshots. We do not use SSDs. We have a 4-node cluster
with around 30-40 GB data on each node. Each node has 3 1-TB disks with a RAID 0 setup.
>
> Currently we monitor the tpstats every 5 minutes, and alert if FlushWriter or MutationStage
has a non-zero Pending count. Any suggestions if this is a cause of concern already, or, should
we alert only if that count becomes greater than a bigger number, say 10, or if the count
remains non-zero greater than a specified time.
>
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0       15685133         0              
  0
> RequestResponseStage              0         0       29880863         0              
  0
> MutationStage                     0         0       40457340         0              
  0
> ReadRepairStage                   0         0         704322         0              
  0
> ReplicateOnWriteStage             0         0              0         0              
  0
> GossipStage                       0         0        2283062         0              
  0
> AntiEntropyStage                  0         0              0         0              
  0
> MigrationStage                    0         0             70         0              
  0
> MemtablePostFlusher               1         1           1837         0              
  0
> StreamStage                       0         0              0         0              
  0
> FlushWriter                       1         1           1446         0              
540
> MiscStage                         0         0              0         0              
  0
> commitlog_archiver                0         0              0         0              
  0
> InternalResponseStage             0         0             43         0              
  0
> HintedHandoff                     0         0              3         0              
  0
>
> Thanks,
> Arindam
>
> -----Original Message-----
> From: aaron morton [mailto:aaron@thelastpickle.com]
> Sent: Tuesday, June 25, 2013 10:29 PM
> To: user@cassandra.apache.org
> Subject: Re: about FlushWriter "All time blocked"
>
>> FlushWriter                       0         0            191         0          
     12
>
> This means there were 12 times the code wanted to put an memtable in the queue to be
flushed to disk but the queue was full.
>
> The length of this queue is controlled by the memtable_flush_queue_size https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299
and memtable_flush_writers .
>
> When this happens an internal lock around the commit log is held which prevents writes
from being processed.
>
> In general it means the IO system cannot keep up. It can sometimes happen when snapshot
is used as all the CF's are flushed to disk at once. I also suspect it happens sometimes when
a commit log segment is flushed and their are a lot of dirty CF's. But i've never proved it.
>
> Increase memtable_flush_queue_size following the help in the yaml file. If you do not
use secondary indexes are you using snapshot?
>
> Hope that helps.
> A
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 24/06/2013, at 3:41 PM, yue.zhang <yue.zhang@chinacache.com> wrote:
>
>> 3 node
>> cent os
>> CPU 8core memory 32GB
>> cassandra 1.2.5
>> my scenario: many counter incr, every node has one client program, performance is
400 wps /every clicent (it’s so slowly)
>>
>> my question:
>> Ø  nodetool tpstats
>> ---------------------------------
>> Pool Name                    Active   Pending      Completed   Blocked  All time
blocked
>> ReadStage                         0         0           8453         0          
      0
>> RequestResponseStage              0         0      138303982         0          
      0
>> MutationStage                     0         0      172002988         0          
      0
>> ReadRepairStage                   0         0              0         0          
      0
>> ReplicateOnWriteStage             0         0       82246354         0          
      0
>> GossipStage                       0         0        1052389         0          
      0
>> AntiEntropyStage                  0         0              0         0          
      0
>> MigrationStage                    0         0              0         0          
      0
>> MemtablePostFlusher               0         0            670         0          
      0
>> FlushWriter                       0         0            191         0          
     12
>> MiscStage                         0         0              0         0          
      0
>> commitlog_archiver                0         0              0         0          
      0
>> InternalResponseStage             0         0              0         0          
      0
>> HintedHandoff                     0         0             56         0          
      0
>> -----------------------------------
>> FlushWriter “All time blocked”=12,I restart the node,but no use,it’s
normally ?
>>
>> thx
>>
>> -heipark
>>
>>
>
>

Mime
View raw message