cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick McFadin <pmcfa...@gmail.com>
Subject Re: Flush activity and dropped messages
Date Fri, 26 Aug 2016 20:18:55 GMT
It's not that your disks are getting full. I suspect you don't have enough
throughput to handle the type of stress compaction and memtable flushing
produce. Blocked flush writers is almost always a disk problem.

Any storage with the words SAN, NAS, NFS or SATA in them, is going to make
your life miserable. Avoid them at all costs.

Take a look at this doc and scan down to the section on disks:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

Patrick


On Fri, Aug 26, 2016 at 9:07 AM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> Hi Patrick and thanks for your reply,
>
> We are monitoring disk usage and more and we don't seem to be running out
> of space at the moment. We have separate partitions/disks for
> commitlog/data.  Which one do you suspect and why?
>
> Regards,
> Vasilis
>
> On 25 Aug 2016 4:01 pm, "Patrick McFadin" <pmcfadin@gmail.com> wrote:
>
> This looks like you've run out of disk. What are your hardware specs?
>
> Patrick
>
>
> On Thursday, August 25, 2016, Benedict Elliott Smith <benedict@apache.org>
> wrote:
>
>> You should update from 2.0 to avoid this behaviour, is the simple
>> answer.  You are correct that when the commit log gets full the memtables
>> are flushed to make room.  2.0 has several interrelated problems here
>> though:
>>
>> There is a maximum flush queue length property (I cannot recall its
>> name), and once there are this many memtables flushing, no more writes can
>> take place on the box, whatsoever.  You cannot simply increase this length,
>> though, because that shrinks the maximum size of any single memtable (it
>> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
>> which worsens write-amplification from compaction.
>>
>> This is because the memory management for memtables in 2.0 was really
>> terrible, and this queue length was used to try to ensure the space
>> allocated was not exceeded.
>>
>> Compounding this, when clearing the commit log 2.0 will flush all
>> memtables with data in them regardless of it is useful to do so, meaning
>> having more tables (that are actively written to) than your max queue
>> length will necessarily cause stalls every time you run out of commit log
>> space.
>>
>> In 2.1, none of these concerns apply.
>>
>>
>> On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlachos@gmail.com
>> > wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>>
>>>
>>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>>
>>>
>>>
>>>
>>> Our C# application often throws logs, which correlate with dropped
>>> messages (counter mutations usually) in our logs. We think that if a
>>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>>> drops it. This is also suggested by these lines in system.log:
>>>
>>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>>> timeout and 0 for cross node timeout
>>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>>> internal timeout and 0 for cross node timeout
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>>  All Time Blocked
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 70) ReadStage                         0         0      245177190         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 70) RequestResponseStage              0         0     3530334509         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) ReadRepairStage                   0         0        1549567         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) MutationStage                    48      1380     2540965500         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) GossipStage                       0         0       20586077         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) CacheCleanupExecutor              0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) MigrationStage                    0         0            106         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) MemoryMeter                       0         0         303029         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) ValidationExecutor                0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) FlushWriter                       1         5         322604         1
>>>              8227
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) InternalResponseStage             0         0             35         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) AntiEntropyStage                  0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) MemtablePostFlusher               1         5         424104         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) MiscStage                         0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>>> 70) PendingRangeCalculator            0         0             37         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>>> 70) commitlog_archiver                0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 70) CompactionExecutor                4         4        5144499         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 70) HintedHandoff                     0         0           3194         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 79) CompactionManager                 1         4
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 81) Commitlog                       n/a         0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 93) MessagingService                n/a       0/0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 103) Cache Type                     Size                 Capacity
>>>     KeysToSave
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 105) KeyCache                  104828280                104857600
>>>            all
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 111) RowCache                          0                        0
>>>            all
>>>
>>>
>>>
>>> So far we have noticed that when our application throws logs, Cassandra
>>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>>> Just before the tpstats-like output that appears in system.log we seem to
>>> have consecutive flushes which take anywhere between 7-20 seconds, like
>>> shown below:
>>>
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/Organa
>>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>>> for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for
>>> commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=8363)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=226441)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/DataMi
>>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>>> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=325555)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=429568)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=430537)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=433145)
>>>
>>> Even though no individual memtable seems to be taking a very long time,
>>> the commitlog segment id for the flushes shown above is the same. We think
>>> that this happens because the commitlog reached its limit and is rotating
>>> this particular segment, which triggers flush activity for all memtables
>>> referenced in this segment.
>>>
>>> Our questions:
>>>
>>>
>>>
>>>
>>>
>>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>>> output while this commitlog segment is being rotated? If so, will the
>>> blocking last for the entire duration or just for the time taken for the
>>> respective memtable to be flushed to disk?
>>>
>>> 2. We have also noticed that flushes can also be triggered by
>>> MemoryMeter (probably when occupancy fraction reaches the specified limit,
>>> at which point the largest memtable is flushed to disk). Would it be wise
>>> to increase the commitlog total size, so that MemoryMeter is the main
>>> reason for flush activity? When flushes are triggered due to "memory
>>> pressure" we never seem to through logs in our application.
>>>
>>> 3. Is there a recommended value for the ratio (total size of Commitlog
>>> segments)/(total size of Memtables)? We use the defaults for both values
>>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>>> the total size used for Memtables 2G, since we have allocated 8G in total
>>> for Heap.
>>>
>>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>>> us. Is there a maximum size for an individual memtable?
>>>
>>> 5. What is the max queue size for MemtablePostFlusher and exactly
>>> happens after a flush (which I pressume relates to MemtablePostFlusher)? Is
>>> it concerning that we also have five pending tasks there?
>>>
>>> Many thanks,
>>>
>>>
>>> Vasilis
>>>
>>
>>
>

Mime
View raw message