cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8343) Secondary index creation causes moves/bootstraps to fail
Date Wed, 17 Feb 2016 11:10:18 GMT


Sylvain Lebresne commented on CASSANDRA-8343:

I'm not super familiar with how the streaming protocol number is used. If there is some protocol
version negotiations between nodes that make it possible to bump the number without breaking
any backward compatibility, then that would be fine for trunk (well, assuming we do carefully
test this). Otherwise, we'd have to wait for 4.0.

bq. we've had this problem since forever \[...\] there is the workaround of increasing {{streaming_socket_timeout}}

I agree that this probably mean it's not worth doing too risky changes for this before trunk.
But really, it feels to me that the main problem is how the code handle this kind of problem.
Assuming we probably surface the timeout on the sending side, there is not reason not to properly
close the session and move on on the receiving side when this happen (we could still log an
error or warning on that receiving side explaining what happens (and that if the sending timeouted,
the user may want to increase {{streaming_socket_timeout}})). We can also document that {{streaming_socket_timeout}}
should be high enough to let 2ndary index/MVs be built in the yaml.

Imo, if we handle the case better (by not breaking anything but logging enough info that the
user understand what happened and that this is really not a big deal), it's fine if we only
fix it properly in 4.0 (we do need to have a better solution eventually of course).

> Secondary index creation causes moves/bootstraps to fail
> --------------------------------------------------------
>                 Key: CASSANDRA-8343
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Frisch
>            Assignee: Paulo Motta
> Node moves/bootstraps are failing if the stream timeout is set to a value in which secondary
index creation cannot complete.  This happens because at the end of the very last stream the
StreamInSession.closeIfFinished() function calls maybeBuildSecondaryIndexes on every column
family.  If the stream time + all CF's index creation takes longer than your stream timeout
then the socket closes from the sender's side, the receiver of the stream tries to write to
said socket because it's not null, an IOException is thrown but not caught in closeIfFinished(),
the exception is caught somewhere and not logged, AbstractStreamSession.close() is never called,
and the CountDownLatch is never decremented.  This causes the move/bootstrap to continue forever
until the node is restarted.
> This problem of stream time + secondary index creation time exists on decommissioning/unbootstrap
as well but since it's on the sending side the timeout triggers the onFailure() callback which
does decrement the CountDownLatch leading to completion.
> A cursory glance at the 2.0 code leads me to believe this problem would exist there as
> Temporary workaround: set a really high/infinite stream timeout.

This message was sent by Atlassian JIRA

View raw message