ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Semyon Boikov <sboi...@gridgain.com>
Subject Re: Issue with near cache update
Date Wed, 25 Mar 2015 18:38:49 GMT
Reviewed, looks good, thanks for the fix.

On Wed, Mar 25, 2015 at 9:24 PM, Alexey Goncharuk <agoncharuk@apache.org>
wrote:

> Semyon,
>
> I was looking at one of the timed out tests and found this piece of thread
> dump interesting:
>
> [20:08:23]Thread
> [name="ignite-#16529%sys-near.GridCacheNearRemoveFailureTest0%", id=21488,
> state=WAITING, blockCnt=1, waitCnt=11284]
> [20:08:23]    Lock
>
> [object=o.a.i.i.processors.affinity.GridAffinityAssignmentCache$AffinityReadyFuture@23931c53
> ,
> ownerName=null, ownerId=-1]
> [20:08:23]        at sun.misc.Unsafe.park(Native Method)
> [20:08:23]        at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [20:08:23]        at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [20:08:23]        at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [20:08:23]        at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [20:08:23]        at
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:102)
> [20:08:23]        at
>
> o.a.i.i.processors.affinity.GridAffinityAssignmentCache.awaitTopologyVersion(GridAffinityAssignmentCache.java:400)
> [20:08:23]        at
>
> o.a.i.i.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:362)
> [20:08:23]        at
>
> o.a.i.i.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:327)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheAffinityManager.nodes(GridCacheAffinityManager.java:187)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheAffinityManager.primary(GridCacheAffinityManager.java:205)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.primaryNode(GridNearCacheEntry.java:630)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.resetFromPrimary(GridNearCacheEntry.java:219)
> [20:08:23]        - locked
> o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry@1333a4f6
> [20:08:23]        at
>
> o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture$MiniFuture.onResult(GridNearTxPrepareFuture.java:935)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture.onResult(GridNearTxPrepareFuture.java:254)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareResponse(IgniteTxHandler.java:363)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:49)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:77)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:75)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:299)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:212)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:44)
> [20:08:23]        at
>
> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:132)
> [20:08:23]        at
>
> o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:664)
> [20:08:23]        at
>
> o.a.i.i.managers.communication.GridIoManager.access$1500(GridIoManager.java:57)
> [20:08:23]        at
> o.a.i.i.managers.communication.GridIoManager$5.run(GridIoManager.java:627)
> [20:08:23]        at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [20:08:23]        at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [20:08:23]        at java.lang.Thread.run(Thread.java:745)
>
> This thread waits for new topology version to be ready, but it will not be
> ready until update is completed. I analyzed all usages of primaryNode(UUID)
> method and there is always a version of topology version available in the
> context of the call. I added an argument to primaryNode(...) method and
> propagated correct topology version there. Can you review my changes in
> ignite-589?
>
> 2015-03-24 23:02 GMT-07:00 Semyon Boikov <sboikov@gridgain.com>:
>
> > Yes, this is possible, will implement this today.
> >
> > On Tue, Mar 24, 2015 at 6:38 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > I think we can do better than flushing near cache for every topology
> > > version change.
> > >
> > > Let's say that that topology version in new cache entry is 1 and the
> > actual
> > > topology version is 4. Then we could check if the entry key changed
> > > assigned between 1 and 4. For example, if the cache key primary node
> > didn't
> > > change on version 2, 3, and 4, then there is no point to flush the near
> > > cache entry.
> > >
> > > Would this be possible to implement?
> > >
> > > D.
> > >
> > > On Tue, Mar 24, 2015 at 8:11 AM, Semyon Boikov <sboikov@gridgain.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Today I investigated failures in failover suite and found issue with
> > near
> > > > cache update. Now when near cache entry is initialized we store
> primary
> > > > node id, and when value is requested from near cache entry we check
> > that
> > > > stored node is still primary (NearCacheEntry.valid()).
> > > > Following scenario is possible (reproduces in our test):
> > > > - there are two nodes A is primary, B is near
> > > > - near cache entry is initialized on B, A is stored in near cache
> entry
> > > as
> > > > primary
> > > > - new node C joins grid and becomes new primary
> > > > - values is updated from C, it is not aware about near reader B and
> > value
> > > > in near cache on B is not updated
> > > > - node C leaves grid, A again becomes primary
> > > > - value is requested from near cache entry on B, it sees that stored
> > > node A
> > > > is still primary and returns outdated value
> > > >
> > > > As a simple fix I changed GridNearCacheEntry to store current
> topology
> > > > version at the moment when entry was initialized from primary, and
> > method
> > > > NearCacheEntry.valid() checks that topology version did not change.
> > > > Assuming topology should not change often this fix should not impact
> > near
> > > > cache performance.
> > > >
> > > > The only case when topology can change often is usage of client
> nodes.
> > > When
> > > > support for client nodes will be fully implemented we will need some
> > way
> > > to
> > > > check that cache affinity topology did not change.
> > > >
> > > > Thoughts?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message