cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()
Date Fri, 09 Jun 2017 14:05:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044470#comment-16044470
] 

Stefan Podkowinski commented on CASSANDRA-13432:
------------------------------------------------

I'm not really sure if this qualifies as a bug or improvement tbh. The changes seem to be
reasonable, but keep in mind that if this goes in it would become part of the potentially
last release before 2.x goes EOL. Is the case for the patch really strong enough to introducing
more restrictive behaviour during query execution? Shouldn't we rather suggest people to upgrade
to 3.0 when hitting this issue (or even better changing their data model)?

> MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra 2.1.15
>            Reporter: Corentin Chary
>             Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> MutationStage                     0         0       32135875         0              
  0
> ReadStage                       114         0       29492940         0              
  0
> RequestResponseStage              0         0       86090931         0              
  0
> ReadRepairStage                   0         0         166645         0              
  0
> CounterMutationStage              0         0              0         0              
  0
> MiscStage                         0         0              0         0              
  0
> HintedHandoff                     0         0             47         0              
  0
> GossipStage                       0         0         188769         0              
  0
> CacheCleanupExecutor              0         0              0         0              
  0
> InternalResponseStage             0         0              0         0              
  0
> CommitLogArchiver                 0         0              0         0              
  0
> CompactionExecutor                0         0          86835         0              
  0
> ValidationExecutor                0         0              0         0              
  0
> MigrationStage                    0         0              0         0              
  0                                    
> AntiEntropyStage                  0         0              0         0              
  0                                    
> PendingRangeCalculator            0         0             92         0              
  0                                    
> Sampler                           0         0              0         0              
  0                                    
> MemtableFlushWriter               0         0            563         0              
  0                                    
> MemtablePostFlush                 0         0           1500         0              
  0                                    
> MemtableReclaimMemory             1        29            534         0              
  0                                    
> Native-Transport-Requests        41         0       54819182         0              1896
                           
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>    java.lang.Thread.State: WAITING
> 	at sun.misc.Unsafe.park(Native Method)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> 	at org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
> 	at org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
> 	at org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
> 	at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
> 	at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
> 	at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
> 	at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
> 	at org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
> 	at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
> 	at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
> 	at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
> 	at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> 	at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
> 	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:316)
> 	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2015)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1858)
> 	at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353)
> 	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> 	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> 	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- None
> "SharedPool-Worker-206" - Thread t@1014
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
> 	at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
> 	at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
> 	at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
> 	at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
> 	at org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
> 	at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
> 	at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
> 	at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
> 	at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:89)
> 	at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:48)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:105)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
> 	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:316)
> 	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2015)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1858)
> 	at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353)
> 	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> 	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> 	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- None
> {code}
> As you can see MemtableReclaimMemory is waiting on the read barrier to be released, but
there are two queries currently being executed which are locking this.
> Since most of the time is spent pretty low in the stack, these read operations will never
timeout (they are reading rows with tons of tombstones).
> We also can easily detect or purge the offending line because there is no easy way to
find out which partition is currently being read.
> The TombstoneFailureThreshold should also protect us, but it is probably being checked
too high in the call stack.
> Looks like RangeTombstoneList or DeletionInfo should also check for DatabaseDescriptor.getTombstoneFailureThreshold()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message