cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Manley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11447) Flush writer deadlock in Cassandra 2.2.5
Date Mon, 28 Mar 2016 19:20:25 GMT
Mark Manley created CASSANDRA-11447:
---------------------------------------

             Summary: Flush writer deadlock in Cassandra 2.2.5
                 Key: CASSANDRA-11447
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11447
             Project: Cassandra
          Issue Type: Bug
            Reporter: Mark Manley
         Attachments: cassandra.jstack.out

When writing heavily to one of my Cassandra tables, I got a deadlock similar to CASSANDRA-9882:

{code}
"MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 tid=0x0000000005fc11d0 nid=0x7664
waiting for monitor entry [0x00007fb83f0e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
        - waiting to lock <0x0000000400956258> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
        at org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
        at org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
        at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
        at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
        at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
        at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

The compaction strategies in this keyspace are mixed with one table using LCS and the rest
using DTCS.  None of the tables here save for the LCS one seem to have large SSTable counts:

{code}
		Table: active_counters
		SSTable count: 2
--

		Table: aggregation_job_entries
		SSTable count: 2
--

		Table: dsp_metrics_log
		SSTable count: 207
--

		Table: dsp_metrics_ts_5min
		SSTable count: 3
--

		Table: dsp_metrics_ts_day
		SSTable count: 2
--

		Table: dsp_metrics_ts_hour
		SSTable count: 2
{code}

Yet the symptoms are similar. 

The "dsp_metrics_ts_5min" table had had a major compaction shortly before all this to get
rid of the 400+ SStable files before this system went into use, but they should have been
eliminated.

Have other people seen?  I am attaching a strack trace.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message