cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming
Date Fri, 09 Dec 2016 13:46:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735349#comment-15735349
] 

Paulo Motta edited comment on CASSANDRA-12905 at 12/9/16 1:46 PM:
------------------------------------------------------------------

Thanks for your great investigative work and proposed patch. Please find a bellow a summary
of the problems identified to make sure we're on the same page:
1 - Streaming of MVs may never complete due to lock contention throwing WTE during MV mutation
 application (bug).
2 - MV hint delivery may throw WTE on MV lock contention, even if the node is not overloaded
(bug)
3 - Streaming of MVs is slow due to MV mutation application (performance limitation)

Your original patch fixes 1 and 2 by retrying to acquire MV lock on contention instead of
throwing WTE for mutations originating from streaming or hints.

While 3 is not actually a bug but a potential performance limitation, it seems from your results
that the cost of performing bootstrap (and large stream sessions in general) of large base
tables may become prohibitive due to a large amount of mutations going through the write path.

While your proposed approach of performing sstable-based streaming for base and MVs may be
valid in the consistent range movement case like bootstrap or decommission, it is not correct
in all scenarios for repair or inconsistent range movements like replace and removenode, because
it may generate permanent inconsistencies in the views which are only fixable by dropping
and re-creating the view. We may address this by allowing operators to disable mutation-based
streaming for MVs and streaming if they know what they are doing and can deal with the inconsistencies
that may arise with their own ways (for instance, by ensuring writes are append-only or dealing
with it in the application layer), but in the general case I'm afraid we will have to live
with this until we find a better way to deal with it globally, with things like CASSANDRA-9779,
CASSANDRA-9308, providing a custom tool to repair inconsistencies in views, or even redesigning
MV consistency ensuring mechanisms.

With this said, since this is blocking release I propose we fix 1 and 2 on this ticket with
your original patch, and open a new ticket to discuss ways to mitigate and/or fix 3, without
the pressure of having to block release on this and taking the risk of delivering a half-baked
solution.

Overall your original patch (without the sstable-based streaming fix) and approach looks good,
I have the following comments:
- invert the variable from {{dontTimeOut}} to {{droppable}} to facilitate understanding, since
this is the term used for requests that can throw {{WriteTimeoutException}}
- I think we can make hint mutation requests {{deferrable}} and {{non-droppable}} to avoid
blocking the thread on failure to acquire the lock - for this we need to do hint replying
asynchronously on {{HintVerbHandler}} (similar to what is done on {{MutationVerbHandler}}).
The right thing to do would be to make hints droppable similar to remote mutations, but this
would require us to send the hint timestamp down to {{keyspace.mutateMV}}, so if you find
a non-disruptive way to do this than even better, otherwise we can leave it as {{deferrable}}
and {{non-droppable}} for the time being.
- Instead of making {{applyBlocking}} throw {{ExecutionException}}, I think we can handle
this on {{applyBlocking}} itself instead of handling on consumers with {{Throwables.propagate(e.getCause())}}.

Since those are mostly cosmetic changes I will kick-off an initial round of tests with your
initial patch to make sure everything is working as expected:
||3.0||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-12905]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-dtest/lastCompletedBuild/testReport/]|

Please let me if you need help or will not have time to address those and I will take care
of the changes myself if you agree with those. Thanks again!


was (Author: pauloricardomg):
Thanks for your great investigative work and proposed patch. Please find a bellow a summary
of the problems identified to make sure we're on the same page:
1 - Streaming of MVs may never complete due to lock contention throwing WTE during MV mutation
 application (bug).
2 - MV hint delivery may throw WTE on MV lock contention, even if the node is not overloaded
(bug)
3 - Streaming of MVs is slow due to MV mutation application (performance limitation)

Your original patch fixes 1 and 2 by retrying to acquire MV lock on contention instead of
throwing WTE for mutations originating from streaming or hints.

While 3 is not actually a bug but a potential performance limitation, it seems from your results
that the cost of performing bootstrap (and large stream sessions in general) of large base
tables may become prohibitive due to a large amount of mutations going through the write path.

While your proposed approach of performing sstable-based streaming for base and MVs may be
valid in the consistent range movement case like bootstrap or decommission, it is not correct
in all scenarios for repair or inconsistent range movements like replace and removenode, because
it may generate permanent inconsistencies in the views which are only fixable by dropping
and re-creating the view. We may address this by allowing operators to disable mutation-based
streaming for MVs if they know what they are doing and  can deal with the inconsistencies
that may arise with their own ways (for instance, by ensuring writes are append-only or dealing
with it in the application layer), but in the general case I'm afraid we will have to live
with this until we find a better way to deal with it globally, with things like CASSANDRA-9779,
CASSANDRA-9308, providing a custom tool to repair inconsistencies in views, or even redesigning
MV consistency ensuring mechanisms.

With this said, since this is blocking release I propose we fix 1 and 2 on this ticket with
your original patch, and open a new ticket to discuss ways to mitigate and/or fix 3, without
the pressure of having to block release on this and taking the risk of delivering a half-baked
solution.

Overall your original patch (without the sstable-based streaming fix) and approach looks good,
I have the following comments:
- invert the variable from {{dontTimeOut}} to {{droppable}} to facilitate understanding, since
this is the term used for requests that can throw {{WriteTimeoutException}}
- I think we can make hint mutation requests {{deferrable}} and {{non-droppable}} to avoid
blocking the thread on failure to acquire the lock - for this we need to do hint replying
asynchronously on {{HintVerbHandler}} (similar to what is done on {{MutationVerbHandler}}).
The right thing to do would be to make hints droppable similar to remote mutations, but this
would require us to send the hint timestamp down to {{keyspace.mutateMV}}, so if you find
a non-disruptive way to do this than even better, otherwise we can leave it as {{deferrable}}
and {{non-droppable}} for the time being.
- Instead of making {{applyBlocking}} throw {{ExecutionException}}, I think we can handle
this on {{applyBlocking}} itself instead of handling on consumers with {{Throwables.propagate(e.getCause())}}.

Since those are mostly cosmetic changes I will kick-off an initial round of tests with your
initial patch to make sure everything is working as expected:
||3.0||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-12905]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-12905-dtest/lastCompletedBuild/testReport/]|

Please let me if you need help or will not have time to address those and I will take care
of the changes myself if you agree with those. Thanks again!

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12905
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: centos 6.7 x86_64
>            Reporter: Nir Zilka
>            Assignee: Benjamin Roth
>            Priority: Critical
>             Fix For: 3.10
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed out -
received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> ----
> i use 3.9 default configuration with the cluster settings adjustments (3 seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (86400000).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message