ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Chugunov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-8633) Node fails to bail out of wrong BLT, instead hanging around indefinitely
Date Wed, 30 May 2018 13:46:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495180#comment-16495180
] 

Sergey Chugunov commented on IGNITE-8633:
-----------------------------------------

Hi [~ilyak], 

I tried to reproduce this behavior with testing framework and everything worked fine: both
A and C nodes were rejected to join B because of BaselineTopology inconsistency.

Attached logs made me think that Discovery didn't reach BLT checks but got stuck at some point
before. Could you please turn on debug logging for tcp discovery package (org.apache.ignite.spi.discovery.tcp)
and run the test in your environment once again?

> Node fails to bail out of wrong BLT, instead hanging around indefinitely
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-8633
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8633
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.4
>            Reporter: Ilya Kasnacheev
>            Assignee: Sergey Chugunov
>            Priority: Major
>         Attachments: 8633.zip
>
>
> Follow-up on https://stackoverflow.com/questions/50234056/how-to-give-multiple-static-ip-in-apache-ignite-cache-configuration-xml-file/50270676?noredirect=1#comment88095814_50270676
but not quite the same.
> I have three nodes: A, B and C.
> I've started A and C and performed activation.
> Then I stopped them both, started B and performed activation on it.
> Now I have two BlT clusters: (A, C) and (B)
> However, when I start B; and then try to launch nodes A or C I get inconsistent behavior:
> When I launch C, I get the error:
> {code}
> org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (8c1e210f-52bb-424f-9c7c-a2e7b1bab546
) is not compatible with BaselineTopology in the cluster. Branching history of cluster BlT
([-1349069127]) doesn't contain branching point hash of joining node BlT (631694798). Consider
cleaning persistent storage of the node and adding it to the cluster again.
> {code}
> But when I launch A, it never enters topology, but also never fails. Moreover, A and
B will ping pong each other for eternity:
> {code}
> [20:16:38,596][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology
and will repeat join process. Check remote nodes logs for possible error messages. Note that
large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout'
configuration property if getting this message on the starting nodes [networkTimeout=5000]
> [20:17:29,514][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming
connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,522][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new
thread for connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,523][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Started serving remote
node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,524][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Received ping request
from the remote node [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:49030,
rmtPort=49030]
> [20:17:29,525][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished writing ping
response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,526][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished serving remote
node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming
connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new
thread for connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started serving remote
node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Received ping request
from the remote node [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:50857,
rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished writing ping
response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished serving remote
node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857
> {code}
> {code}
> [20:16:28,793][INFO][tcp-disco-msg-worker-#3][GridSnapshotAwareClusterStateProcessorImpl]
Received state change finish message: true
> [20:16:28,803][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion
[topVer=1, minorTopVer=1], crd=true]
> [20:16:28,812][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] Skipping
rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=1, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT,
node=37104137-a21e-4b6f-a70b-09164300bbfc]
> [20:16:28,818][INFO][sys-#68][GridSnapshotAwareClusterStateProcessorImpl] Successfully
performed final activation steps [nodeId=37104137-a21e-4b6f-a70b-09164300bbfc, client=false,
topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
> [20:16:33,571][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming
connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,579][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new
thread for connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,580][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started serving remote
node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500]
> [20:16:33,592][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished serving remote
node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming
connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new
thread for connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,802][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Started serving remote
node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714]
> [20:16:39,806][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished serving remote
node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714
> {code}
> I don't think this is expected behaviour. I will attach config and work directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message