ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yakov Zhdanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-4501) Improvement of connection in a cluster of new node
Date Wed, 05 Apr 2017 22:45:41 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957944#comment-15957944
] 

Yakov Zhdanov commented on IGNITE-4501:
---------------------------------------

Alexander, 

I checked out your changes to finalize and commit, but discovered this failure

org.apache.ignite.spi.discovery.tcp.TcpDiscoverySelfTest#testFailedNodes4

{noformat}
[02:41:39,056][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent
cluster wide instability.
java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode [id=4215172c-d71b-4fd1-8baf-73ad97600002,
addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, lastExchangeTime=1491432076544,
loc=true, ver=2.0.0#19700101-sha1:00000000, clusterRegionId=-9223372036854775808, isClient=false],
other=TcpDiscoveryNode [id=7063b039-493e-4aad-9036-30c962100000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:00000000,
clusterRegionId=-9223372036854775808, isClient=false]]
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26)
	at java.util.TreeMap.compare(TreeMap.java:1291)
	at java.util.TreeMap.getHigherEntry(TreeMap.java:463)
	at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423)
	at java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639)
	at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498)
	at java.util.TreeSet.isEmpty(TreeSet.java:216)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2676)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4940)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2547)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2349)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6398)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2435)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[02:41:39,059][ERROR][tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%][TcpDiscoverySelfTest$TestFailedNodesSpi]
Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%]
java.lang.AssertionError: Duplicate order [this=TcpDiscoveryNode [id=4215172c-d71b-4fd1-8baf-73ad97600002,
addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, lastExchangeTime=1491432076544,
loc=true, ver=2.0.0#19700101-sha1:00000000, clusterRegionId=-9223372036854775808, isClient=false],
other=TcpDiscoveryNode [id=7063b039-493e-4aad-9036-30c962100000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:00000000,
clusterRegionId=-9223372036854775808, isClient=false]]
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26)
	at java.util.TreeMap.compare(TreeMap.java:1291)
	at java.util.TreeMap.getHigherEntry(TreeMap.java:463)
	at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423)
	at java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639)
	at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498)
	at java.util.TreeSet.isEmpty(TreeSet.java:216)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2676)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4940)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2547)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2349)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6398)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2435)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[02:41:39,059][WARN ][test-runner-#801%tcp.TcpDiscoverySelfTest%][TcpDiscoverySelfTest2] Grid
startup routine has been interrupted (will rollback).
Exception in thread "tcp-disco-msg-worker-#1169%tcp.TcpDiscoverySelfTest2%" java.lang.AssertionError:
Duplicate order [this=TcpDiscoveryNode [id=4215172c-d71b-4fd1-8baf-73ad97600002, addrs=[127.0.0.1],
sockAddrs=[/127.0.0.1:47502], discPort=47502, order=1, intOrder=1, lastExchangeTime=1491432076544,
loc=true, ver=2.0.0#19700101-sha1:00000000, clusterRegionId=-9223372036854775808, isClient=false],
other=TcpDiscoveryNode [id=7063b039-493e-4aad-9036-30c962100000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1491432076533, loc=false, ver=2.0.0#19700101-sha1:00000000,
clusterRegionId=-9223372036854775808, isClient=false]]
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.compareTo(TcpDiscoveryNode.java:563)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:33)
	at org.apache.ignite.spi.discovery.tcp.internal.RegionNodeComparator.compare(RegionNodeComparator.java:26)
	at java.util.TreeMap.compare(TreeMap.java:1291)
	at java.util.TreeMap.getHigherEntry(TreeMap.java:463)
	at java.util.TreeMap$NavigableSubMap.absLowest(TreeMap.java:1423)
	at java.util.TreeMap$NavigableSubMap$EntrySetView.isEmpty(TreeMap.java:1639)
	at java.util.TreeMap$NavigableSubMap.isEmpty(TreeMap.java:1498)
	at java.util.TreeSet.isEmpty(TreeSet.java:216)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.serverNodes(TcpDiscoveryNodesRing.java:654)
	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nextNode(TcpDiscoveryNodesRing.java:512)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2676)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4940)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2547)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2349)
[02:41:39,060][INFO ][node-stop-thread][TcpDiscoverySelfTest$TestFailedNodesSpi] Stopped the
node successfully in response to TcpDiscoverySpi's message worker thread abnormal termination.
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6398)
	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2435)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
{noformat}

This test is not stable on my machine in master also, but it never throws assertion of the
kind. Can you please take a look?

> Improvement of connection in a cluster of new node
> --------------------------------------------------
>
>                 Key: IGNITE-4501
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4501
>             Project: Ignite
>          Issue Type: Improvement
>          Components: messaging
>    Affects Versions: 1.8
>            Reporter: Vyacheslav Daradur
>            Assignee: Alexander Menshikov
>             Fix For: 2.0
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other place, and places
lost connect each other, we will have many ways of reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then we have
only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of the cluster
was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of reconnections (A to
B, B to C, C to A -- in general n/2 reconnections, where n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place for creation
of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet : other
nodes). We will use it when we connect a new node.
> * [dev list thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=Ke83-5-Rh9g@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding public static
final constant to TcpDiscoverySpi.
> # Alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>)
to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs are equal
then we should compare nodes' IDs. This way we have consistent order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This can be postponed
and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message