cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10674) Materialized View SSTable streaming/leaving status race on decommission
Date Wed, 25 Nov 2015 23:27:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027795#comment-15027795
] 

Paulo Motta commented on CASSANDRA-10674:
-----------------------------------------

I agree with [~tjake] that the simplest thing to do here is to force the mutation into the
local batchlog when the node is not a base replica of the mutation, and log a warning if there
are no pending ranges (since they might be being calculated or still haven't propagated fully
by gossip). I implemented a patch based on this approach:

||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10674]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10674]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10674-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10674-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10674-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10674-dtest/lastCompletedBuild/testReport/]|

[~jkni] could you verify the jepsen tests with this approach and check if the warning is being
printed?

bq. Second and more importantly we should probably add an acknowledgement to the streaming
operation that it was processed by the receiver correctly. 

It seems the stream receive task (and thus the stream sesssion) is only completed on [2.1|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L175]
and [2.2|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L171]
after the files are processed (otherwise it just hangs), but on [3.0|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L231]
it's always completed even if there was a failure, what seems more critical. In any case,
we should probably fail the stream session if there is a problem while processing the received
data. I created CASSANDRA-10774 to investigate and address that.

> Materialized View SSTable streaming/leaving status race on decommission
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-10674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10674
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination, Distributed Metadata
>            Reporter: Joel Knighton
>            Assignee: Paulo Motta
>             Fix For: 3.0.1, 3.1
>
>         Attachments: leaving-node-debug.log, receiving-node-debug.log
>
>
> On decommission of a node in a cluster with materialized views, it is possible for the
decommissioning node to begin streaming sstables for an MV base table before the receiving
node is aware of the leaving status.
> The materialized view base/view replica pairing checks pending endpoints to handle the
case when an sstable is received from a leaving node; without the leaving message, this check
breaks and an exception is thrown. The streamed sstable is never applied.
> Logs from a decommissioning node and a node receiving such a stream are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message