cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Plotnik (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6744) Network streaming is locked while cleanup is running
Date Fri, 21 Feb 2014 01:03:15 GMT


Alexey Plotnik commented on CASSANDRA-6744:

Looks like concurrent_compactors causes locking. But Streaming is a network process and compaction
is a disk-related process. My concern is a streaming and compaction is a different kinds of
tasks and they shouldn't be concurrent to each other

> Network streaming is locked while cleanup is running
> ----------------------------------------------------
>                 Key: CASSANDRA-6744
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: CentOS 6.4
>            Reporter: Alexey Plotnik
>            Assignee: Yuki Morishita
>         Attachments: receiver.dump, sender.dump
> When I rebalanced my Cassandra cluster moving SSTables from one node to another I saw
that sometimes the streaming process stucked without any exceptions in logs on both sides.
It was like a pause. I investigated Thread dumps from node that sends the data and found that
it waits for a response here:
> {noformat}
> "Streaming to /" - Thread t@26058
>    java.lang.Thread.State: RUNNABLE
>    ...
> 	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(
> 	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(
> 	at
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(
> 	at java.util.concurrent.ThreadPoolExecutor$
> 	at
> {noformat}
> Source:
> {code:title=org.apache.cassandra.streaming.FileStreamTask|borderStyle=solid}
> public class FileStreamTask extends WrappedRunnable {
> ....
>     protected void receiveReply() throws IOException
>     {
>         MessagingService.validateMagic(input.readInt()); // <-- stucked here
> {code}
> Ok, it waits for an answer from the opposite endpoint.
> Let's go further. After investigating receiving endpoint thread dump I found the place
where it stucks:
> {noformat}
> "Thread-104503" - Thread t@268602
>    java.lang.Thread.State: WAITING
> 	at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(
> 	at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(
> 	at
> 	at
> 	at
> 	at
> {noformat}
> It build secondary indexes (my CF has no secondaries). *SecondaryIndexManager.maybeBuildSecondaryIndexes*
creates a *Future* and wait's for it. Inside the *Future* it synchronizes using common lock
of *CompactionManager*:
> {code:title=org.apache.cassandra.db.compaction.CompactionManager|borderStyle=solid}
> compactionLock.readLock().lock(); // line #797
> {code}
> The same lock is used by a cleanup process as the *performCleanup()* executes *performAllSSTableOperation()*
method which is aquire the lock.
> This ticket is created because on large nodes (1Tb-2Tb) especially hosted on network
storages the delay can reach up to few days. Correct me if I wrong: we shouldn't lock in rebuild
secondary indexes stage because there is no secondary indexes for this CF. It's not a problem
when Cleanup process is paused by the Streaming stage but not vice versa, because streaming
process is much more important for cluster.
> Both thread dumps attached.

This message was sent by Atlassian JIRA

View raw message