ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Александр Меньшиков <sharple...@gmail.com>
Subject Re: IGNITE-4365: Data grid in deadlock on stop by DataStreamerImpl
Date Thu, 29 Jun 2017 13:11:38 GMT
I don't have it. I got all information from thread dump which you added to
the ticket: one thread stuck in the DataStreamerImpl#doFlush() (which was
called by GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute()),
and the other in the GridCacheGateway#onStopped() (which was called by
GridCacheProcessor#onExchangeDone()).

I read about a problem with reproducing (Alexey Kuznetsov's first comment
in JIRA) and made the decision to look at the different view.
Code still looks dangerous, so I don't think the problem has resolved
itself.

In thread dump there are 2 tests:
1) GridCacheNearTxForceKeyTest
2) CrossCacheTxRandomOperationsTest

They all passed in a single running.

2017-06-29 15:31 GMT+03:00 Yakov Zhdanov <yzhdanov@apache.org>:

> Alex, can you please share a test that demonstrates the hang?
>
> --Yakov
>
> 2017-06-29 14:27 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com>:
>
>> Hello,
>>
>> I want to make ticket IGNITE-4365
>> <https://issues.apache.org/jira/browse/IGNITE-4365>. The problem came
>> from DataStreamerImpl.
>> There are methods which use DataStreamerImpl under the lock
>> (GridCacheGateway), but the method DataStreamerImpl#doFlush() has a
>> "while(true)" loop. And in case when someone is calling the
>> GridCacheGateway#onStopped(), application can get stuck in the loop in
>> DataStreamerImpl#doFlush(), and in trying get a lock in
>> GridCacheGateway#onStopped().
>>
>> So I need an expert opinion about DataStreamerImpl#doFlush().
>> 1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when
>> someone is calling GridCacheGateway#onStopped()? I can track it by adding a
>> volatile boolean flag in the GridCacheGateway.
>> 2) Or better to modify a futures execution DataStreamerImpl#load0() to
>> use onDone with an exception or something like that?
>>
>> Methods which use or might use DataStreamerImpl under the lock:
>>
>> 1) GridCacheAdapter#localLoad()
>> 2) GridCacheAdapter#localLoadAndUpdate()
>> 3) GridCacheAdapter#localLoadCache()
>> 4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it
>> exectly happen in thread dump in ticket)
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message