ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislav Lukyanov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-7753) Processors are incorrectly initialized if a node joins during cluster activation
Date Mon, 19 Feb 2018 14:45:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369189#comment-16369189
] 

Stanislav Lukyanov edited comment on IGNITE-7753 at 2/19/18 2:44 PM:
---------------------------------------------------------------------

The bug is caused by a coding error in the GridClusterStateProcessor.onStateFinishMessage.
Future that holds the activation result is always finished with false (joinFut.onDone(false)).
The patch below fixes the problem:

--- modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
(revision 1a6e54489d58ceb50521523c00383b13d6e3bd8b)
+++ modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
(date 1519047408803)
@@ -389,7 +389,7 @@
             TransitionOnJoinWaitFuture joinFut = this.joinFut;
 
             if (joinFut != null)
-                joinFut.onDone(false);
+                joinFut.onDone(msg.clusterActive());
 
             GridFutureAdapter<Void> transitionFut = transitionFuts.remove(state.transitionRequestId());
 



was (Author: slukyanov):
The bug is caused by a coding error in the GridClusterStateProcessor.onStateFinishMessage.
Future that holds the activation result is always finished with false (joinFut.onDone(false)).
The patch below fixes the problem:
{{
--- modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
(revision 1a6e54489d58ceb50521523c00383b13d6e3bd8b)
+++ modules/core/src/main/java/org/apache/ignite/internal/processors/cluster/GridClusterStateProcessor.java
(date 1519047408803)
@@ -389,7 +389,7 @@
             TransitionOnJoinWaitFuture joinFut = this.joinFut;
 
             if (joinFut != null)
-                joinFut.onDone(false);
+                joinFut.onDone(msg.clusterActive());
 
             GridFutureAdapter<Void> transitionFut = transitionFuts.remove(state.transitionRequestId());
 
}}

> Processors are incorrectly initialized if a node joins during cluster activation
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-7753
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7753
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3, 2.4, 2.5
>            Reporter: Stanislav Lukyanov
>            Assignee: Stanislav Lukyanov
>            Priority: Major
>
> If a node joins during the cluster activation process (while the related exchange operation
is in progress), then some of the GridProcessor instances of that node will be incorrectly
initialized. While GridClusterStateProcessor will correctly report the active cluster state,
other processors that are sensitive to the cluster state, e.g. GridServiceProcessor, will
be not initialized.
> A reproducer is below. 
> =======================
> Ignite server = IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
"server");
>         CyclicBarrier barrier = new CyclicBarrier(2);
>         Thread activationThread = new Thread(() -> {
>             try {
>                 barrier.await();
>                 server.active(true);
>             }
>             catch (Exception e) {
>                 e.printStackTrace(); // TODO implement.
>             }
>         });
>         activationThread.start();
>         barrier.await();
>         IgnitionEx.setClientMode(true);
>         Ignite client = IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml",
"client");
>         activationThread.join();
>         client.services().deployClusterSingleton("myClusterSingleton", new SimpleMapServiceImpl<>());
> =======================
> Here a single server node is started, then simultaneously a client node is being started
and the cluster is being activated, then client attempts to deploy a service. As the result,
the thread calling the deploy method hangs forever with a stack trace like this:
> =======================
> "main@1" prio=5 tid=0x1 nid=NA waiting
>   java.lang.Thread.State: WAITING
> 	  at sun.misc.Unsafe.park(Unsafe.java:-1)
> 	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> 	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> 	  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> 	  at org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7505)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceCache(GridServiceProcessor.java:290)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.writeServiceToCache(GridServiceProcessor.java:728)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:634)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:600)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:488)
> 	  at org.apache.ignite.internal.processors.service.GridServiceProcessor.deployClusterSingleton(GridServiceProcessor.java:469)
> 	  at org.apache.ignite.internal.IgniteServicesImpl.deployClusterSingleton(IgniteServicesImpl.java:120)
> =======================
> The behavior depends on the timings - the client has to join in the middle of the activation's
exchange process. Putting Thread.sleep(4000) into GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest
seems to work on a development laptop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message