ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Nohelty <nolt2...@gmail.com>
Subject Re: What happens when a client gets disconnected
Date Tue, 23 Apr 2019 17:30:04 GMT
What period of time are you asking about?  We deploy fairly regularly so
our application servers (i.e. the Ignite clients) get restarted at least
weekly which will trigger a disconnect and reconnect event for each.  We
have not noticed any issues during our regular release process but in this
case we are shutting down the Ignite clients gracefully with Ignite#close.
However, it's also possible that something bad happens on an application
servers causing it to crash.  This is the scenario where we've seen
blocking across the cluster.  We'd obviously like our application servers
to be as independent of one another as possible and it's problematic if an
issue on one server is allowed to ripple across all of them.

I should have mentioned it in my initial post but we are currently using
version 2.4.  I received the following response on my Stack Overflow post:
"When topology changes, partition map exchange is triggered internally. It
blocks all operations on the cluster. Also in old versions ongoing
rebalancing was cancelled. But in the latest versions client
connection/disconnection doesn't affect some processes like this. So, it's
worth trying the most fresh release"

This comment also mentions PME so it sounds like you both are referencing
the same behavior.  However, this comment also states that client
connect/disconnect events do not trigger  PME in the more recent versions
of Ignite.  Can anyone confirm that this is true, and if so, which version
was this change made in?

Thank you very much for the help.

On Tue, Apr 23, 2019 at 10:00 AM Ilya Kasnacheev <ilya.kasnacheev@gmail.com>

> Hello!
> What's the period of time?
> When client disconnects, topology will change, which will trigger waiting
> for PME, which will delay all further operations until PME is finished.
> Avoid having short-lived clients.
> Regards,
> --
> Ilya Kasnacheev
> вт, 23 апр. 2019 г. в 03:40, Matt Nohelty <nolt2232@gmail.com>:
>> I already posted this question to stack overflow here
>> https://stackoverflow.com/questions/55801760/what-happens-in-apache-ignite-when-a-client-gets-disconnected
>> but this mailing list is probably more appropriate.
>> We use Apache Ignite for caching and are seeing some unexpected behavior
>> across all of the clients of cluster when one of the clients fails. The
>> Ignite cluster itself has three servers and there are approximately 12
>> servers connecting to that cluster as clients. The cluster has persistence
>> disabled and many of the caches have near caching enabled.
>> What we are seeing is that when one of the clients fail (out of memory,
>> high CPU, network connectivity, etc.), threads on all the other clients
>> block for a period of time. During these times, the Ignite servers
>> themselves seem fine but I see things like the following in the logs:
>> Topology snapshot [ver=123, servers=3, clients=11, CPUs=XXX, offheap=XX.XGB, heap=XXX.GB]Topology
snapshot [ver=124, servers=3, clients=10, CPUs=XXX, offheap=XX.XGB, heap=XXX.GB]
>> The topology itself is clearly changing when a client
>> connects/disconnects but is there anything happening internally inside the
>> cluster that could cause blocking on other clients? I would expect
>> re-balancing of data when a server disconnects but not a client.
>> From a thread dump, I see many threads stuck in the following state:
>> java.lang.Thread.State: TIMED_WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)- parking to wait for  <0x000000078a86ff18>
(a java.util.concurrent.CountDownLatch$Sync)
>> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>> at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7452)
>> at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.awaitAllReplies(GridReduceQueryExecutor.java:1056)
>> at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:733)
>> at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
>> at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
>> at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1403)
>> at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
>> at java.lang.Iterable.forEach(Iterable.java:74)...
>> Any ideas, suggestions, or further avenues to investigate would be much
>> appreciated.

View raw message