cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Plotnik (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-6744) Network streaming is locked by a cleanup comapction
Date Thu, 20 Feb 2014 02:17:19 GMT
Alexey Plotnik created CASSANDRA-6744:

             Summary: Network streaming is locked by a cleanup comapction
                 Key: CASSANDRA-6744
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: CentOS 6.4
            Reporter: Alexey Plotnik
         Attachments: receiver.dump, sender.dump

When I rebalanced my Cassandra cluster moving SSTables from one node to another I saw that
sometimes the streaming process stucked without any exeptions in logs on both sides. It was
like a pause. I investigated Thread dumps from node that sends the data and found that it
waits for response here:

"Streaming to /" - Thread t@26058
   java.lang.Thread.State: RUNNABLE
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
public class FileStreamTask extends WrappedRunnable {
    protected void receiveReply() throws IOException
        MessagingService.validateMagic(input.readInt()); // <-- stucked here

Ok, it waits for answer from the opposite endpoint.

Let's go further. After investigating receiving endpoint tread dump I found where it stucks:
"Thread-104503" - Thread t@268602
   java.lang.Thread.State: WAITING
	at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(
	at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(
It build secondary indexes (my CF has no secondaries). *SecondaryIndexManager.maybeBuildSecondaryIndexes*
creates a *Future* and wait's for it. Inside *Future* it synchronizes with common lock of
compactionLock.readLock().lock(); // line #797

The same lock is used by a cleanup process as the *performCleanup()* executes *performAllSSTableOperation()*
method which is aquire the lock.

This ticket is created because on large nodes (1Tb-2Tb) especially hosted on network storages
the delay can reach up to few days. Correct me if I wrong: we shouldn't lock in rebuild secondary
indexes stage because there is no secondary indexes for this CF. It's not a problem when Cleanup
process is paused by Streaming but not vice versa, because streaming process has much more

Both thread dumps attached.

This message was sent by Atlassian JIRA

View raw message