ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Semen Boikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
Date Tue, 11 Apr 2017 10:52:41 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964169#comment-15964169
] 

Semen Boikov commented on IGNITE-4003:
--------------------------------------

Andrey,

I did review, found only one potential implementation issue: 
{noformat}
                            nioSrvr.sendSystem(ses, new RecoveryLastReceivedMessage(-1), new
IgniteInClosure<IgniteInternalFuture<?>>() {
                                @Override public void apply(IgniteInternalFuture<?>
msgFut) {
                                    try {
                                        msgFut.get();
                                    } catch (IgniteCheckedException e) {
                                        if (log.isDebugEnabled())
                                            log.debug("Failed to send recovery handshake "
+
                                                    "[rmtNode=" + rmtNode.id() + ", err="
+ e + ']');

                                        recoveryDesc.release();
                                    } finally {
                                        fut.onDone();

                                        clientFuts.remove(connKey, fut);

                                        ses.close();
                                    }
                                }
                            });
{noformat}

It seems it is not needed to call 'recoveryDesc.release()' since descriptor should be released
on session close.

Another, more serious issue from my point of view: TcpComminucationSpi code already was overcomplicated,
now it became even for unclear, I need to think how it can be simplified.

Thanks

> Slow or faulty client can stall the whole cluster.
> --------------------------------------------------
>
>                 Key: IGNITE-4003
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4003
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, general
>    Affects Versions: 1.7
>            Reporter: Vladimir Ozerov
>            Assignee: Semen Boikov
>            Priority: Critical
>             Fix For: 2.1
>
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. Client will
join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying to establish
outgoing connection to the client, but fail as Docker subnet is not visible from the outside.
It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left
the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode
[id=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0,
/127.0.0.1:0, /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, lastExchangeTime=1474096941045,
loc=false, ver=1.5.23#20160526-sha1:259146da, isClient=true], topic=T4 [topic=TOPIC_CACHE,
id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2],
msg=GridContinuousMessage [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db,
data=null, futId=null], policy=2]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799)
[ignite-core-1.5.23.jar:1.5.23]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote
node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[127.0.0.1, 172.17.0.6],
sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, /172.17.0.6:0], discPort=0, order=7241, intOrder=3707,
lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, isClient=true]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
> 	... 32 common frames omitted
> Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node
still alive?). Make sure that each GridComputeTask and GridCacheTransaction has a timeout
set in order to prevent parties from waiting forever in case of network issues [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[/172.17.0.6:47100, /127.0.0.1:47100]]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151)
~[ignite-core-1.5.23.jar:1.5.23]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149)
~[ignite-core-1.5.23.jar:1.5.23]
> 	... 12 common frames omitted
> 	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address:
/172.17.0.6:47100
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
> 		... 35 common frames omitted
> 	Caused by: java.net.SocketTimeoutException: null
> 		at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
> 		... 35 common frames omitted
> 	Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect to address:
/127.0.0.1:47100
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
> 		... 35 common frames omitted
> 	Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is not as expected
[expected=a15d74c2-1ec2-4349-9640-aeacd70d8714, rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
> 		at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
> 		... 35 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message