[ https://issues.apache.org/jira/browse/IGNITE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703138#comment-16703138
]
Roman Guseinov commented on IGNITE-10354:
-----------------------------------------
[~ilyak], thank you for the review. I've made requested changes. Do I need to rerun any tests?
> Failing client node due to not receiving metrics updates
> --------------------------------------------------------
>
> Key: IGNITE-10354
> URL: https://issues.apache.org/jira/browse/IGNITE-10354
> Project: Ignite
> Issue Type: Bug
> Components: clients
> Affects Versions: 2.6
> Reporter: Roman Guseinov
> Assignee: Roman Guseinov
> Priority: Major
> Attachments: ClientDisconnectedTest.java
>
>
> In some cases after the coordinator change, the client node can be failed before it can
establish a connection to another server from the cluster.
> {code:java}
> [2018-11-21 12:21:45,769][WARN ][tcp-disco-msg-worker-#15%server-b%][TestTcpDiscoverySpi]
Failing client node due to not receiving metrics updates from client node within 'IgniteConfiguration.clientFailureDetectionTimeout'
(consider increasing configuration property) [timeout=10000, node=TcpDiscoveryNode [id=dc739711-f685-45e8-9017-1f91b1d86c8c,
addrs=[0:0:0:0:0:0:0:1, 10.0.75.1, 127.0.0.1, 192.168.1.51, 192.168.192.1], sockAddrs=[/0:0:0:0:0:0:0:1:0,
LAPTOP-6FN8RAOS/10.0.75.1:0, /127.0.0.1:0, /192.168.192.1:0, /192.168.1.51:0], discPort=0,
order=2, intOrder=2, lastExchangeTime=1542774105666, loc=false, ver=2.4.0#20180830-sha1:345c0a7c,
isClient=true]]
> [2018-11-21 12:21:45,791][INFO ][tcp-client-disco-msg-worker-#10%client%][TestTcpDiscoverySpi]
Client node disconnected from cluster, will try to reconnect with new id [newId=46812956-2fc4-4b74-9909-d523a547ba0e,
prevId=dc739711-f685-45e8-9017-1f91b1d86c8c, locNode=TcpDiscoveryNode [id=dc739711-f685-45e8-9017-1f91b1d86c8c,
addrs=[0:0:0:0:0:0:0:1, 10.0.75.1, 127.0.0.1, 192.168.1.51, 192.168.192.1], sockAddrs=[/0:0:0:0:0:0:0:1:0,
LAPTOP-6FN8RAOS/10.0.75.1:0, /127.0.0.1:0, /192.168.192.1:0, /192.168.1.51:0], discPort=0,
order=2, intOrder=0, lastExchangeTime=1542774104031, loc=true, ver=2.4.0#20180830-sha1:345c0a7c,
isClient=true]]
> {code}
> It looks like a race condition.
> Steps to reproduce:
> 1. Start server A.
> 2. Start client.
> 3. Start server B.
> 4. Stop server A.
> If add Thread.sleep(10000) between (3) and (4) then the client node won't be disconnected
from the cluster.
> Reproducer is attached [^ClientDisconnectedTest.java].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
|