ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
Date Thu, 29 Sep 2016 12:45:22 GMT
Vladimir Ozerov created IGNITE-4003:
---------------------------------------

             Summary: Slow or faulty client can stall the whole cluster.
                 Key: IGNITE-4003
                 URL: https://issues.apache.org/jira/browse/IGNITE-4003
             Project: Ignite
          Issue Type: Bug
          Components: cache, general
    Affects Versions: 1.7
            Reporter: Vladimir Ozerov
            Priority: Critical
             Fix For: 1.8


Steps to reproduce:
1) Start two server nodes and some data to cache.
2) Start a client from Docker subnet, which is not visible from the outside. Client will join
the cluster.
3) Try to put something to cache or start another node to force rabalance.

Cluster is stuck at this moment. Root cause - servers are constantly trying to establish outgoing
connection to the client, but fail as Docker subnet is not visible from the outside. It may
stop virtually all cluster operations.

Typical thread dump:

{code}
org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid
or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, /172.17.0.6:0], discPort=0,
order=7241, intOrder=3707, lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da,
isClient=true], topic=T4 [topic=TOPIC_CACHE, id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, id2=a15d74c2-1ec2-4349-9640-aeacd70d8714,
id3=2], msg=GridContinuousMessage [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db,
data=null, futId=null], policy=2]
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799)
[ignite-core-1.5.23.jar:1.5.23]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node:
TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[127.0.0.1, 172.17.0.6],
sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, /172.17.0.6:0], discPort=0, order=7241, intOrder=3707,
lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, isClient=true]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
	... 32 common frames omitted
Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still
alive?). Make sure that each GridComputeTask and GridCacheTransaction has a timeout set in
order to prevent parties from waiting forever in case of network issues [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[/172.17.0.6:47100, /127.0.0.1:47100]]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151)
~[ignite-core-1.5.23.jar:1.5.23]
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149)
~[ignite-core-1.5.23.jar:1.5.23]
	... 12 common frames omitted
	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address: /172.17.0.6:47100
		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
		... 35 common frames omitted
	Caused by: java.net.SocketTimeoutException: null
		at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
		... 35 common frames omitted
	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100
		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
		... 35 common frames omitted
	Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714,
rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
		... 35 common frames omitted
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message