incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra Messages Dropped
Date Sun, 23 Sep 2012 21:39:30 GMT
> To put in other words, Cassandra will lock down all tables until all pending flush requests
fit in the pending queue.
This was the first issue I looked at in my Cassandra SF talk http://www.datastax.com/events/cassandrasummit2012/presentations

I've seen it occur more often with lots-o-secondary indexes. 
 

> We had plenty of write I/O available.  We also had free memory.  I increased the memtable_flush_writers
to "2" and memtable_flush_queue_size to "8".  We haven't had any timeouts for a number of
days now.
Cool. 

Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2012, at 6:09 AM, Michael Theroux <mtheroux2@yahoo.com> wrote:

> There were no errors in the log (other than the messages dropped exception pasted below),
and the node does recover.  We have only a small number of secondary indexes (3 in the whole
system).
> 
> However, I went through the cassandra code, and I believe I've worked through this problem.
> 
> Just to finish out this thread, I realized that when you see:
> 
> 	INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) FlushWriter
                      1         5         0
> 
> It is an issue.  Cassandra will at various times enqueue many memtables for flushing.
 By default, the queue size for this is 4.  If more than 5 memtables get queued for flushing
(4 + 1 for the one currently being flushed), a lock will be acquired and held across all tables
until all memtables that need to be flushed are enqueued.  If it takes more than rpc_timeout_time_in_ms
time to flush enough information to allow all the pending memtables to be enqueued, a "messages
dropped" will occur.  To put in other words, Cassandra will lock down all tables until all
pending flush requests fit in the pending queue.  If your queue size is 4, and 8 tables need
to be flushed, Cassandra will lock down all tables until a minimum of 3 memtables are flushed.
> 
> With this in mind, I went through the cassandra log and found this was indeed the case
looking at log entries similar to these:
> 
>  INFO [OptionalTasks:1] 2012-09-16 05:54:29,750 ColumnFamilyStore.java (line 643) Enqueuing
flush of Memtable-p@1525015234(18686281/341486464 serialized/live bytes, 29553 ops)
> ...
> INFO [FlushWriter:29] 2012-09-16 05:54:29,768 Memtable.java (line 266) Writing Memtable-p@1525015234(18686281/341486464
serialized/live bytes, 29553 ops)
> ...
> INFO [FlushWriter:29] 2012-09-16 05:54:30,254 Memtable.java (line 307) Completed flushing
/data/cassandra/data/open/people/open-p-hd-441-Data.db
> 
> I was able to figure out what the rpc_timeout_in_ms needed to be to temporarily prevent
the problem.
> 
> We had plenty of write I/O available.  We also had free memory.  I increased the memtable_flush_writers
to "2" and memtable_flush_queue_size to "8".  We haven't had any timeouts for a number of
days now.
> 
> Thanks for your help,
> -Mike
> 
> On Sep 18, 2012, at 5:14 AM, aaron morton wrote:
> 
>> Any errors in the log ?
>> 
>> The node recovers ? 
>> 
>> Do you use secondary indexes ? If so check comments for  memtable_flush_queue_size
in the yaml. if this value is too low writes may back up. But I would not expect it to cause
dropped messages. 
>> 
>>> nodetool info also shows we have over a gig of available memory on the JVM heap
of each node.
>> 
>> Not all memory is created equal :)
>> ParNew is kicking in to GC the Eden space in the New Heap. 
>>  
>> It may just be that the node is getting hammered by something and IO is getting overwhelmed.
If you can put the logs up someone might take a look. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/09/2012, at 3:46 PM, Michael Theroux <mtheroux2@yahoo.com> wrote:
>> 
>>> Thanks for the response.
>>> 
>>> We are on version 1.1.2.  We don't see the MutationStage back up.  The dump from
the messages dropped error doesn't show a backup, but also watching "nodetool tpstats" doesn't
show any backup there.
>>> 
>>> nodetool info also shows we have over a gig of available memory on the JVM heap
of each node.
>>> 
>>> The earliest GCInspector traces I see before one of the more recent incidents
in which messages were dropped are:
>>> 
>>> 	INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 122)
GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 4253024256
>>>  
>>> 	NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 122) GC
for ParNew: 485 ms for 1 collections, 1961875064 used; max is 4253024256
>>>  
>>> 	INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 122)
GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 4253024256
>>> 
>>> But this was 45 minutes before messages were dropped.
>>> 
>>> It's appreciated,
>>> -Mike
>>>  
>>> On Sep 17, 2012, at 11:27 PM, aaron morton wrote:
>>> 
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line
72) MemtablePostFlusher               1         5         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line
72) FlushWriter                       1         5         0
>>>> Looks suspiciously like 
>>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3C9FB0E801-B1ED-41C4-9939-BAFBDDF15FD9@thelastpickle.com%3E
>>>> 
>>>> What version are you on ? 
>>>> 
>>>> Are there any ERROR log messages before this ? 
>>>> 
>>>> Are you seeing MutationStage back up ? 
>>>> 
>>>> Are you see log messages from GCInspector ?
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 18/09/2012, at 2:16 AM, Michael Theroux <mtheroux2@yahoo.com> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> While under load, we have occasionally been seeing "messages dropped"
errors in our cassandra log.  Doing some research, I understand this is part of Cassandra's
design to shed load, and we should look at the tpstats-like output to determine what should
be done to resolve the situation.  Typically, you will see lots of messages blocked or pending,
and that might be an indicator that a specific part of hardware needs to be improved/tuned/upgraded.
 
>>>>> 
>>>>> However, looking at the output we are getting, I'm finding it difficult
to see what needs to be tuned, as it looks to me cassandra is handling the load within the
mutation stage:
>>>>> 
>>>>> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java
(line 658) 119 MUTATION messages dropped in last 5000ms
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line
57) Pool Name                    Active   Pending   Blocked
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line
72) ReadStage                         3         3         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line
72) RequestResponseStage              0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line
72) ReadRepairStage                   0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line
72) MutationStage                     0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line
72) ReplicateOnWriteStage             0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line
72) GossipStage                       0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line
72) AntiEntropyStage                  0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line
72) MigrationStage                    0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line
72) StreamStage                       0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line
72) MemtablePostFlusher               1         5         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line
72) FlushWriter                       1         5         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line
72) MiscStage                         0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line
72) commitlog_archiver                0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line
72) InternalResponseStage             0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line
72) AntiEntropySessions               0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line
72) HintedHandoff                     0         0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line
77) CompactionManager                 0         0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line
89) MessagingService                n/a       0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line
99) Cache Type                     Size                 Capacity               KeysToSave
                                                        Provider
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line
100) KeyCache                    2184533                  2184533                      all
                                                                
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line
106) RowCache                          0                        0                      all
             org.apache.cassandra.cache.SerializingCacheProvider
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line
113) ColumnFamily                Memtable ops,data
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line
116) system.NodeIdInfo                         0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line
116) system.IndexInfo                          0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line
116) system.LocationInfo                       0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line
116) system.Versions                           0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line
116) system.schema_keyspaces                   0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line
116) system.Migrations                         0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line
116) system.schema_columnfamilies                 0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line
116) system.schema_columns                     0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line
116) system.HintsColumnFamily                  0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line
116) system.Schema                             0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line
116) prod.comp                           0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line
116) prod.byp                      224,61863
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line
116) prod.byn                      30,6106
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line
116) prod.p                     3354,3792954
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line
116) prod.rel                788,153294
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line
116) prod.i                        46,271517
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line
116) prod.users                     18489,5650405
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line
116) prod.caches                       40,7512998
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line
116) prod.sessions                  8085,13063121
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line
116) prod.content                             0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line
116) prod.enr                  436,341190
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line
116) prod.byt                 23800,9880339
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line
116) prod.aliases                  63603,10475769
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line
116) prod.ran                    1500,329387
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line
116) prod.cmap                        0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line
116) prod.pmap                73032,61237450
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line
116) prod.pic                   5421,4438066
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line
116) prod.watchdog                            0,0
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line
116) prod.s                         31,40924
>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line
116) prod.m                 21690,64262284
>>>>> 
>>>>> 
>>>>> We are using a write consistency level of LOCAL_QUROUM over a six node
cluster.  Any observations are welcome,
>>>>> 
>>>>> Thanks,
>>>>> -Mike
>>>> 
>>> 
>> 
> 


Mime
View raw message