cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omid Aladini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9458) Race condition causing StreamSession to get stuck in WAIT_COMPLETE
Date Wed, 27 May 2015 12:04:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560844#comment-14560844
] 

Omid Aladini commented on CASSANDRA-9458:
-----------------------------------------

Thanks for checking the log and the patch. You're right as all the relevant calls to maybeCompleted
are synchronised on the object.

{quote}
Do you have secondary indexes? Right now, streaming is considered completed after secondary
indexes are built in that finalise phase(CASSANDRA-9308).
{quote}

There are secondary indexes and I see a bunch of "submitting index build of" in the full log
so I guess it's possible that the index build is just taking longer than the timeout. I'll
disable the timeout (and enable TCP keep-alive via CASSANDRA-9455) to see if it gets resolved.

> Race condition causing StreamSession to get stuck in WAIT_COMPLETE
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-9458
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9458
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Omid Aladini
>            Assignee: Omid Aladini
>            Priority: Critical
>             Fix For: 2.1.x, 2.0.x
>
>         Attachments: 9458-v1.txt
>
>
> I think there is a race condition in StreamSession where one side of the stream could
get stuck in WAIT_COMPLETE although both have sent COMPLETE messages. Consider a scenario
that node B is being bootstrapped and it only receives files during the session:
> 1- During a stream session A sends some files to B and B sends no files to A.
> 2- Once B completes the last task (receiving), StreamSession::maybeComplete is invoked.
> 3- While B is sending the COMPLETE message via StreamSession::maybeComplete, it also
receives the COMPLETE message from A and therefore StreamSession::complete() is invoked.
> 4- Therefore both maybeComplete() and complete() functions have branched into the state
!= State.WAIT_COMPLETE case and both set the state to WAIT_COMPLETE.
> 5- Now B is waiting to receive COMPLETE although it's already received it and nothing
triggers checking the state again, until it times out after streaming_socket_timeout_in_ms.
> In the log below:
> https://gist.github.com/omidaladini/003de259958ad8dfb07e
> although the node has received COMPLETE, "SocketTimeoutException" is thrown after streaming_socket_timeout_in_ms
(30 minutes here).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message