ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-10935) "Invalid node order" error occurs while cycle cluster nodes restart
Date Wed, 30 Jan 2019 08:44:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755861#comment-16755861
] 

Alexey Goncharuk commented on IGNITE-10935:
-------------------------------------------

Several issues were discovered and fixed in the attached PR:
1) Pending messages were incorrectly initialized during processing of NodeAddedMessage. Non-null
discardId caused the SkipIterator to skip all pending messages immediately after join
2) Collection of failed nodes were not set to pending messages, causing new coordinator to
skip correct NodeAddedMessage processing
3) A node could skip second NodeAddedMessage processing if local node order was greater than
in received message
4) HandshakeRequest did not check which node was responding for the request, and receiving
node did not check previous node ID
5) When a node decides to segment itself in CONNECTING state, it failed to do so causing a
zombie node in a ring
6) Promotion of the local node into the first coordinator is done in a not-thread-safe way
with regard to ring message worker

> "Invalid node order" error occurs while cycle cluster nodes restart
> -------------------------------------------------------------------
>
>                 Key: IGNITE-10935
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10935
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Dmitry Sherstobitov
>            Assignee: Alexey Goncharuk
>            Priority: Critical
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
> {code:java}
> Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: Invalid node
order: TcpDiscoveryNode [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501, /0:0:0:0:0:0:0:1%lo:47501,
/127.0.0.1:47501, /172.17.0.1:47501], discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047,
loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false]
> at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51)
> at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48)
> at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030)
> at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635)
> at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608)
> at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625)
> at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> Collaps{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message