cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8343) Secondary index creation causes moves/bootstraps to fail
Date Fri, 12 Feb 2016 15:38:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144718#comment-15144718
] 

Paulo Motta commented on CASSANDRA-8343:
----------------------------------------

Surprisingly enough I didn't manage to reproduce this issue in 2.1 because the {{streaming_socket_timeout}}
parameter was not being enforced due to the use of a {{ReadableByteChannel}} created via {{socket.getChannel()}},
which never times out on reads (see [this article|https://technfun.wordpress.com/2009/01/29/networking-in-java-non-blocking-nio-blocking-nio-and-io/]
for background). The workaround is to create the {{ReadableByteChannel}} via {{Channels.newChannel(socket.getInputStream())}}
instead, so the socket {{SO_TIMEOUT}} is respected.

Even after this fix, the socket {{SO_TIMEOUT}} was never being set on the receiving side,
so I also set while attaching the socket on the receiving side.

After the previous fixes, I managed to reproduce this issue on a [bootstrap dtest|https://github.com/pauloricardomg/cassandra-dtest/commit/301e332758b3873d2bb61259343375107caf437b]
by introducing a sleep delay (via a system property) on the {{OnCompletionRunnable}} larger
than {{streaming_socket_timeout}}.

This problem will probably happen more often on 3.0 because of MVs, since they're rebuilt
by the receiving node in the end of the stream session.
I think we should remain finishing the stream session only after the secondary indexes/MVs
are rebuilt to avoid leaving the node in a inconsistent state in case the rebuild fails after
the session is completed.

The proposed solution is to introduce a {{KeepAlive}} message and send a keep alive message
to the peer after reaching the {{WAIT_COMPLETE}} state every {{streaming_socket_timeout/2}},
to ensure the socket will remain fresh and will not throw a {{SocketTimeoutException}} and
fail the stream session.

I initially created a fix for 2.1 (even though it's near EOL, I think {{streaming_socket_timeout}}
not working is critical enough to be fixed on 2.1), and after review I will create patch for
other versions.

||2.1||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-8343]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:8343]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-8343-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-8343-dtest/lastCompletedBuild/testReport/]|

> Secondary index creation causes moves/bootstraps to fail
> --------------------------------------------------------
>
>                 Key: CASSANDRA-8343
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8343
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Frisch
>            Assignee: Paulo Motta
>
> Node moves/bootstraps are failing if the stream timeout is set to a value in which secondary
index creation cannot complete.  This happens because at the end of the very last stream the
StreamInSession.closeIfFinished() function calls maybeBuildSecondaryIndexes on every column
family.  If the stream time + all CF's index creation takes longer than your stream timeout
then the socket closes from the sender's side, the receiver of the stream tries to write to
said socket because it's not null, an IOException is thrown but not caught in closeIfFinished(),
the exception is caught somewhere and not logged, AbstractStreamSession.close() is never called,
and the CountDownLatch is never decremented.  This causes the move/bootstrap to continue forever
until the node is restarted.
> This problem of stream time + secondary index creation time exists on decommissioning/unbootstrap
as well but since it's on the sending side the timeout triggers the onFailure() callback which
does decrement the CountDownLatch leading to completion.
> A cursory glance at the 2.0 code leads me to believe this problem would exist there as
well.
> Temporary workaround: set a really high/infinite stream timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message