Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27124C058 for ; Tue, 2 Jul 2013 09:51:32 +0000 (UTC) Received: (qmail 77882 invoked by uid 500); 2 Jul 2013 09:51:31 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 76214 invoked by uid 500); 2 Jul 2013 09:51:26 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 76118 invoked by uid 99); 2 Jul 2013 09:51:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jul 2013 09:51:23 +0000 Date: Tue, 2 Jul 2013 09:51:23 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-5699) Streaming (2.0) can deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5699?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5699: ---------------------------------------- Attachment: 5699.txt bq. It's somewhat confusing that we use StreamInit both as leader and as fo= llower, to mean different things I guess, though if you see StreamInit as a way for both node to open/initia= lize their outgoing connection to the other node (which is what this is doi= ng), it's not really that different. And it's somewhat consistent with the = rest of the protocol, where each side will sent a Prepare and then finally = a Complete. Anyway, updated the patch with the sentByInitiator renaming. bq. What is going on with the switch from Set ongoingSessions to Map<= InetAddress, StreamSession> ongoingSessions in SRF? ongoingSession only role initialy was to track how many sessions for a Stre= amResultFuture were live to know when we're done (SRF.maybeComplete()). In = the original patch of CASSANDRA-5286, it wasn't even there, and there was j= ust an AtomicInteger decremented each time a session completed. I was sligh= tly afraid that a race could have us counting the same session done twice, = so I changed it to a Set and added a simple UUID per session that was= only used for that purpose (it was never send to the other side or anythin= g). Anyway, that UUID addition was stupid in the first place since for a SR= F, there is only one StreamSession per remote host, so it should have been = a Set to start with and we have no use for a StreamSession UUI= D. But on top of that, this patch add the need to be able to find back a Strea= mSession in a SRF given the remote endpoint (in IncomingStreamConnection, w= hen the initiator gets the other side StreamInit), so we really need a map = now. bq. Suggest renaming, e.g. onSessionEstablished Agreed, though I've renamed to onInitializationComplete to be consistent wi= th the wording of the javadoc. bq. Somewhat confused by logic in complete() =E2=80=93 "I received a Comple= te message. If I'm already waiting for a complete message, close the sessio= n. Otherwise, wait for [another] Complete message" ? Each side will send a CompleteMessage to say "I'm done on my side", and the= session will be considered really complete (and closed) when we got the Co= mplete for both side. So WAIT_COMPLETE means "I've seen only one complete m= essage so far". Hence the logic of complete() is "I just received a complet= e from the other side, if I had already completed my side, close the sessio= n, otherwise move to the "I'm waiting for the other complete" state". bq. It looks like there may be synchronization issues with "state;" I think we're good, because the only case where we can race to set a state = is during the completion phase, for WAIT_COMPLETE and COMPLETE, and those a= re protected. For other states, there are set sequentially on each node by = construction. bq. Why is init broken out from construction? SRF takes his StreamSession in his ctor so we can't have StreamSessions tak= e their SRF without some other refactor. Now don't get me wrong, it bugs m= e too. And I do think there is a few things we can do to make that whole ne= w streaming API simpler/cleaner (including probably renaming StreamRepairFu= ture). And I'm volonteering to give to a shot to that. But in another follo= w ticket because: # I really think that kind of code cleanup shouldn't block beta1 (but this = ticket, the fact that streaming deadlock, do is a blocker) # I've already rewrote quite a bit of Yuki's code without him having the ch= ange to chime in. Would feel fair to wait to him to be back before refactor= ing parts that are not really crucial for beta1. bq. Why do we include a String description in SIM? This the description of what the operation is doing ("Repair", "Bootstrap",= "Restore replica count", ...) bq. When does follower immediately have something to stream to leader A StreamSession in the initiator handle both incoming and outgoing streams.= Sometimes we only have outgoing ones (Unbootstrap/Restore replica count/Bu= lkloading), sometimes only incoming ones (Bootstrap) and sometimes both (Repair indeed but also the SS= .RangeRelocator for moves and vnodes tokens relocation). Does that answer y= our question? =20 > Streaming (2.0) can deadlock > ---------------------------- > > Key: CASSANDRA-5699 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5699 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 2.0 beta 1 > > Attachments: 5699.txt > > > The new streaming implementation (CASSANDRA-5286) creates 2 threads per h= ost for streaming, one for the incoming stream and one for the outgoing one= . However, both currently share the same socket, but since we use synchrono= us I/O, a read can block a write, which can result in a deadlock if 2 nodes= are both blocking on a read a the same time, thus blocking their respectiv= e writes (this is actually fairly easy to reproduce with a simple repair). > So instead attaching a patch that uses one socket per thread. > The patch also correct the stream throughput throttling calculation that = was 8000 times lower than what it should be. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira