cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-5699) Streaming (2.0) can deadlock
Date Tue, 02 Jul 2013 09:51:23 GMT


Sylvain Lebresne updated CASSANDRA-5699:

    Attachment: 5699.txt

bq. It's somewhat confusing that we use StreamInit both as leader and as follower, to mean
different things

I guess, though if you see StreamInit as a way for both node to open/initialize their outgoing
connection to the other node (which is what this is doing), it's not really that different.
And it's somewhat consistent with the rest of the protocol, where each side will sent a Prepare
and then finally a Complete.  Anyway, updated the patch with the sentByInitiator renaming.

bq. What is going on with the switch from Set<UUID> ongoingSessions to Map<InetAddress,
StreamSession> ongoingSessions in SRF?

ongoingSession only role initialy was to track how many sessions for a StreamResultFuture
were live to know when we're done (SRF.maybeComplete()). In the original patch of CASSANDRA-5286,
it wasn't even there, and there was just an AtomicInteger decremented each time a session
completed. I was slightly afraid that a race could have us counting the same session done
twice, so I changed it to a Set<UUID> and added a simple UUID per session that was only
used for that purpose (it was never send to the other side or anything). Anyway, that UUID
addition was stupid in the first place since for a SRF, there is only one StreamSession per
remote host, so it should have been a Set<InetAddress> to start with and we have no
use for a StreamSession UUID.

But on top of that, this patch add the need to be able to find back a StreamSession in a SRF
given the remote endpoint (in IncomingStreamConnection, when the initiator gets the other
side StreamInit), so we really need a map now.

bq. Suggest renaming, e.g. onSessionEstablished

Agreed, though I've renamed to onInitializationComplete to be consistent with the wording
of the javadoc.

bq. Somewhat confused by logic in complete() – "I received a Complete message. If I'm already
waiting for a complete message, close the session. Otherwise, wait for [another] Complete
message" ?

Each side will send a CompleteMessage to say "I'm done on my side", and the session will be
considered really complete (and closed) when we got the Complete for both side. So WAIT_COMPLETE
means "I've seen only one complete message so far". Hence the logic of complete() is "I just
received a complete from the other side, if I had already completed my side, close the session,
otherwise move to the "I'm waiting for the other complete" state".

bq. It looks like there may be synchronization issues with "state;"

I think we're good, because the only case where we can race to set a state is during the completion
phase, for WAIT_COMPLETE and COMPLETE, and those are protected. For other states, there are
set sequentially on each node by construction.

bq. Why is init broken out from construction?

SRF takes his StreamSession in his ctor so we can't have StreamSessions take their SRF without
some other refactor.  Now don't get me wrong, it bugs me too. And I do think there is a few
things we can do to make that whole new streaming API simpler/cleaner (including probably
renaming StreamRepairFuture). And I'm volonteering to give to a shot to that. But in another
follow ticket because:
# I really think that kind of code cleanup shouldn't block beta1 (but this ticket, the fact
that streaming deadlock, do is a blocker)
# I've already rewrote quite a bit of Yuki's code without him having the change to chime in.
Would feel fair to wait to him to be back before refactoring parts that are not really crucial
for beta1.

bq. Why do we include a String description in SIM?

This the description of what the operation is doing ("Repair", "Bootstrap", "Restore replica
count", ...)

bq. When does follower immediately have something to stream to leader

A StreamSession in the initiator handle both incoming and outgoing streams. Sometimes we only
have outgoing ones (Unbootstrap/Restore replica count/Bulkloading), sometimes only
incoming ones (Bootstrap) and sometimes both (Repair indeed but also the SS.RangeRelocator
for moves and vnodes tokens relocation). Does that answer your question?

> Streaming (2.0) can deadlock
> ----------------------------
>                 Key: CASSANDRA-5699
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0 beta 1
>         Attachments: 5699.txt
> The new streaming implementation (CASSANDRA-5286) creates 2 threads per host for streaming,
one for the incoming stream and one for the outgoing one. However, both currently share the
same socket, but since we use synchronous I/O, a read can block a write, which can result
in a deadlock if 2 nodes are both blocking on a read a the same time, thus blocking their
respective writes (this is actually fairly easy to reproduce with a simple repair).
> So instead attaching a patch that uses one socket per thread.
> The patch also correct the stream throughput throttling calculation that was 8000 times
lower than what it should be.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message