ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: Ignite cache transaction timeout
Date Thu, 12 Oct 2017 13:08:42 GMT
Hi,

Transaction is timed out as transaction initiator node doen't receive
response from other node in time by some reason.
It may be due to an entry is locked by smbd or network issues or GC issues
on other node or some bug.

There is no starvation is stripe pool actually, as thread queue is empty,
but the long running operation that can be an issue
as it can hold a locks and make other messages delayed.

Also, transaction can be blocked with running partition map exchange on
unstable topology.
Is it possible there are too many operations or large entries involved in
transaction?


On Thu, Oct 12, 2017 at 8:36 AM, iostream <sidds.moona@gmail.com> wrote:

> Hi,
>
> We are observing cache transaction timeout in our Ignite v2.1 cluster. The
> setup comprises 20 ignite servers and 20 clients. Error log -
>
> spring_264202322.log:Caused by:
> org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
> Cache transaction timed out: GridNearTxLocal [mappings=IgniteTxMappingsImpl
> [], nearLocallyMapped=false, colocatedLocallyMapped=false,
> needCheckBackup=null, hasRemoteLocks=true,
> thread=DefaultMessageListenerContainer-10, mappings=IgniteTxMappingsImpl
> [],
> super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[],
> dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter
> [completedBase=null, sndTransformedVals=false, depEnabled=false,
> txState=IgniteTxStateImpl [activeCacheIds=GridIntList [idx=2,
> arr=[-1836347052,159420608]], recovery=false, txMap=[IgniteTxEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> cacheId=-1836347052, txKey=IgniteTxKey
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> cacheId=-1836347052], val=[op=TRANSFORM, val=null], prevVal=[op=TRANSFORM,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=org.apache.ignite.internal.processors.query.h2.
> DmlStatementsProcessor$ModifyingEntryProcessor@75e749b8,
> val2=[Ljava.lang.Object;@4ee393d4]], ttl=-1, conflictExpireTime=-1,
> conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order
> [idHash=1411260389, hash=1683738400, create_userid=HOORDPRO,
> dspns_type_cd=1], startVer=1507208251042, ver=GridCacheVersion
> [topVer=117883976, order=1507204044952, nodeOrder=112], hash=789085753,
> extras=null, flags=0]]], prepared=0, locked=true,
> nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=117883976,
> order=1507208250156, nodeOrder=401]], IgniteTxEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428], cacheId=159420608, txKey=IgniteTxKey
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428], cacheId=159420608], val=[op=TRANSFORM,
> val=null], prevVal=[op=TRANSFORM, val=null], oldVal=[op=NOOP, val=null],
> entryProcessorsCol=[IgniteBiTuple
> [val1=org.apache.ignite.internal.processors.query.h2.
> DmlStatementsProcessor$ModifyingEntryProcessor@688b99e3,
> val2=[Ljava.lang.Object;@3a24ca15]], ttl=-1, conflictExpireTime=-1,
> conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428],
> val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order_line
> [idHash=518069123, hash=-1823543951, create_userid=HOORDPRO,
> src_item_secondary_desc=null, s], startVer=1507208251774,
> ver=GridCacheVersion [topVer=117883976, order=1507204044404,
> nodeOrder=112],
> hash=1510291529, extras=null, flags=0]]], prepared=0, locked=true,
> nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=117883976,
> order=1507208250156, nodeOrder=401]]]], super=IgniteTxAdapter
> [xidVer=GridCacheVersion [topVer=117883976, order=1507208250156,
> nodeOrder=401], writeVer=null, implicit=false, loc=true, threadId=167,
> startTime=1507741430563, nodeId=a951003b-310c-45db-88d9-622d30a7558f,
> startVer=GridCacheVersion [topVer=117883976, order=1507208250156,
> nodeOrder=401], endVer=null, isolation=REPEATABLE_READ,
> concurrency=PESSIMISTIC, timeout=120000, sysInvalidate=false, sys=false,
> plc=2, commitVer=null, finalizing=NONE, invalidParts=null,
> state=MARKED_ROLLBACK, timedOut=true, topVer=AffinityTopologyVersion
> [topVer=406, minorTopVer=0], duration=143967ms, onePhaseCommit=false],
> size=2]]]
>
> Looking at "hasRemoteLocks=true" in the error log, I want to understand if
> the transaction timed out because some other thread has already occupied a
> lock on this key? Was this the reason for transaction timing out?
>
> I also saw the following warning in ignite server logs -
>
> [13:11:55,559][WARNING][grid-timeout-worker-#31%null%][G] >>> Possible
> starvation in striped pool.
>     Thread name: sys-stripe-5-#6%null%
>     Queue: []
>     Deadlock: false
>     Completed: 254308
> Thread [name="sys-stripe-5-#6%null%", id=20, state=RUNNABLE, blockCnt=97,
> waitCnt=254278]
>         at
> o.a.i.i.processors.cache.persistence.tree.io.CacheVersionIO.read(
> CacheVersionIO.java:235)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.readFullRow(
> CacheDataRowAdapter.java:317)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink(
> CacheDataRowAdapter.java:158)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink(
> CacheDataRowAdapter.java:101)
>         at
> o.a.i.i.processors.query.h2.database.H2RowFactory.getRow(
> H2RowFactory.java:62)
>         at
> o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO.
> getLookupRow(H2ExtrasLeafIO.java:124)
>         at
> o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO.
> getLookupRow(H2ExtrasLeafIO.java:36)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:123)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:40)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> getRow(BPlusTree.java:4372)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:199)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:40)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> compare(BPlusTree.java:4359)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> findInsertionPoint(BPlusTree.java:4279)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> access$1500(BPlusTree.java:81)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Replace.run0(BPlusTree.java:342)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Replace.run0(BPlusTree.java:330)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> GetPageHandler.run(BPlusTree.java:4697)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> GetPageHandler.run(BPlusTree.java:4682)
>         at
> o.a.i.i.processors.cache.persistence.tree.util.PageHandler.writePage(
> PageHandler.java:342)
>         at
> o.a.i.i.processors.cache.persistence.DataStructure.
> write(DataStructure.java:261)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> access$11200(BPlusTree.java:81)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Put.tryReplace(BPlusTree.java:2875)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2279)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> doPut(BPlusTree.java:2006)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> put(BPlusTree.java:1977)
>         at
> o.a.i.i.processors.query.h2.database.H2TreeIndex.put(H2TreeIndex.java:207)
>         at
> o.a.i.i.processors.query.h2.opt.GridH2Table.addToIndex(
> GridH2Table.java:729)
>
> Is there any correlation between the starving striped pool size and cache
> transaction taking too long?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
Best regards,
Andrey V. Mashenkov

Mime
View raw message