ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amelchev Nikita (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
Date Thu, 28 Mar 2019 10:04:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803774#comment-16803774
] 

Amelchev Nikita edited comment on IGNITE-11460 at 3/28/19 10:03 AM:
--------------------------------------------------------------------

Hi, [~amashenkov]

I have investigated issue one more time and suggest next:

1. Fix current incorrect behavior for the case when the current coordinator was set onto disconnect
and events will continue processing in the listener. So, we need to check the {{ctx.clientDisconnected()}}
flag and skip overriding disconnected coordinator. I added additional synchronization for
the case when we can override the coordinator in a moment between check this flag and set
to a new coordinator. Because this is done in different threads.

2. Fix discovery logic for the case when previous cluster events can be processed after {{onLocalJoin}}
method called. I have filed the IGNITE-11624 for this case.

I have fixed PR and tested it.
Could you take a look, please?


was (Author: nsamelchev):
Hi, [~amashenkov]

I have investigated issue one more time and suggest next:

1. Fix current incorrect behavior for the case when the current coordinator was set onto disconnect
and events will continue processing in the listener. So, we need to check the {{ctx.clientDisconnected()}}
flag and skip overriding disconnected coordinator. I added additional synchronization for
the case when we can override the coordinator in a moment between check this flag and set
to a new coordinator. Because this is done in different threads.

2. Fix discovery logic for the case when previous cluster events can be processed after {{onLocalJoin}}
method called. I have filed the issue for this case.

I have fixed PR and tested it.
Could you take a look, please?

> MVCC: Possible race on coordinator changing on client reconnection.
> -------------------------------------------------------------------
>
>                 Key: IGNITE-11460
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11460
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Amelchev Nikita
>            Assignee: Amelchev Nikita
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.8
>
>         Attachments: stacktraces.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
>     at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
>     at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
>     at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
>     at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event thread when the
client already disconnect (disconnection processed in notifier thread and change coordinator
on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set in notifier
thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message