ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evgenii Zhuravlev <e.zhuravlev...@gmail.com>
Subject Re: Grid freezing
Date Mon, 23 Oct 2017 13:05:53 GMT
Hi!

How many nodes do you have? How many CPUs? Could you provide a code of
com.company.node.ignite.IgnitePortionDequeuer
or at least a dequeuePortions method?

Evgenii

2017-10-20 0:24 GMT+03:00 smurphy <smurphy@trustwave.com>:

> threaddump.tdump
> <http://apache-ignite-users.70518.x6.nabble.com/file/
> t1317/threaddump.tdump>
>
> I am using Ignite v2.1 and my code using Optimistic/Serializable
> transactions and is locking up. When it does, there are a lot of `WARNING:
> Found long running transaction` and `WARNING: Found long running cache
> future` messages in the logs (see below). Eventually the grid freezes and
> there are a lot of "IgniteInterruptedException: Got interrupted while
> waiting for future to complete." (see below). Also, see attached thread
> dump
> when the grid is finally stopped.
>
> Using visor, I could see the active threads increase until all are
> consumed.
> I boosted the thread pool count to 100 and saw the active threads rise
> steadily until all were consumed. At theat point, the grid freezes up.
>
> Using visor ping multiple times, the servers always ping successfully but
> the client fails intermittently.
>
> LONG RUNNING TRANSACTION:
>
> 2017-10-19 13:03:06,698 WARN  [ ] grid-timeout-worker-#15%null%
> {Log4JLogger.java:480} - Found long running transaction
> [startTime=12:30:33.985, curTime=13:03:06.682, tx=GridNearTxLocal
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], nearLocallyMapped=false,
> colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
> thread=<failed to find active thread 109>,
> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], super=GridDhtTxLocalAdapter
> [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[],
> explicitLock=false,
> super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
> depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
> recovery=false], super=IgniteTxAdapter [xidVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], writeVer=null,
> implicit=true, loc=true, threadId=109, startTime=1508434233985,
> nodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, startVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], endVer=null,
> isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=0,
> sysInvalidate=false, sys=true, plc=2, commitVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], finalizing=NONE,
> invalidParts=null, state=PREPARING, timedOut=false,
> topVer=AffinityTopologyVersion [topVer=2243, minorTopVer=0],
> duration=1952707ms, onePhaseCommit=true], size=1]]]]
>
> LONG RUNNING CACHE FUTURE:
> 2017-10-19 13:03:06,709 WARN  [ ] grid-timeout-worker-#15%null%
> {Log4JLogger.java:480} - Found long running cache future
> [startTime=12:30:33.985, curTime=13:03:06.682, fut=GridNearTxFinishFuture
> [futId=d7d7ea53f51-2939d9ae-16bb-41f2-a756-32150fde4ecf,
> tx=GridNearTxLocal
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], nearLocallyMapped=false,
> colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
> thread=<failed to find active thread 109>,
> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], super=GridDhtTxLocalAdapter
> [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[],
> explicitLock=false,
> super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
> depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
> recovery=false], super=IgniteTxAdapter [xidVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], writeVer=null,
> implicit=true, loc=true, threadId=109, startTime=1508434233985,
> nodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, startVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], endVer=null,
> isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=0,
> sysInvalidate=false, sys=true, plc=2, commitVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], finalizing=NONE,
> invalidParts=null, state=PREPARING, timedOut=false,
> topVer=AffinityTopologyVersion [topVer=2243, minorTopVer=0],
> duration=1952717ms, onePhaseCommit=true], size=1]]], commit=true,
> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], trackable=true,
> finishOnePhaseCalled=false, innerFuts=[], super=GridCompoundIdentityFuture
> [super=GridCompoundFuture [rdc=AlwaysTrueReducer [], initFlag=0,
> lsnrCalls=0, done=false, cancelled=false, err=null, futs=[]]]]]
>
> IGNITECHECKEDEXCEPTION:
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: SEVERE: Failed to execute
> job [jobId=ac08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55,
> ses=GridJobSessionImpl [ses=GridTaskSessionImpl
> [taskName=com.company.fragment.node.callable.DequeuePortionsCallable,
> dep=GridDeployment [ts=1508433959490, depMode=SHARED,
> clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6,
> clsLdrId=fa68ba53f51-85f1f563-695c-43f9-a263-e9f67e8925b7, userVer=0,
> loc=true,
> sampleClsName=o.a.i.i.processors.task.GridTaskProcessor$
> TaskDiscoveryListener$1,
> pendingUndeploy=false, undeployed=false, usage=64],
> taskClsName=com.company.fragment.node.callable.DequeuePortionsCallable,
> sesId=9c08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55,
> startTime=1508434417976, endTime=9223372036854775807,
> taskNodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55,
> clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, closed=false,
> cpSpi=null,
> failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false,
> subjId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, mapFut=IgniteFuture
> [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null,
> hash=2051666823]], execName=null],
> jobId=ac08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55]]
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: class
> org.apache.ignite.IgniteException: class
> org.apache.ignite.IgniteInterruptedException: Got interrupted while
> waiting
> for future to complete.
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.
> execute(GridClosureProcessor.java:1850)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.job.GridJobWorker$
> 2.call(GridJobWorker.java:566)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.
> java:6608)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.job.GridJobWorker.
> execute0(GridJobWorker.java:560)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.job.GridJobWorker.
> body(GridJobWorker.java:489)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.job.GridJobProcessor.
> processJobExecuteRequest(GridJobProcessor.java:1181)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.job.GridJobProcessor$
> JobExecutionListener.onMessage(GridJobProcessor.java:1908)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.managers.communication.
> GridIoManager.invokeListener(GridIoManager.java:1556)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.managers.communication.GridIoManager.
> processRegularMessage0(GridIoManager.java:1184)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.managers.communication.
> GridIoManager.access$4200(GridIoManager.java:126)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(
> GridIoManager.java:1097)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> java.lang.Thread.run(Unknown Source)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: Caused by:
> javax.cache.CacheException: class
> org.apache.ignite.IgniteInterruptedException: Got interrupted while
> waiting
> for future to complete.
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.cache.GridCacheUtils.
> convertToCacheException(GridCacheUtils.java:1312)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxy.
> cacheException(IgniteCacheProxy.java:2630)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxy.get(
> IgniteCacheProxy.java:1188)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> com.company.fragment.node.ignite.IgnitePortionDequeuer.dequeuePortionIds(
> IgnitePortionDequeuer.java:276)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> com.company.fragment.node.ignite.IgnitePortionDequeuer._dequeuePortionIds(
> IgnitePortionDequeuer.java:196)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> com.company.fragment.node.ignite.IgnitePortionDequeuer.dequeuePortions(
> IgnitePortionDequeuer.java:146)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> com.company.fragment.node.callable.DequeuePortionsCallable.call(
> DequeuePortionsCallable.java:44)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> com.company.fragment.node.callable.DequeuePortionsCallable.call(
> DequeuePortionsCallable.java:35)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.
> execute(GridClosureProcessor.java:1847)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: ... 14 more
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: Caused by: class
> org.apache.ignite.IgniteInterruptedException: Got interrupted while
> waiting
> for future to complete.
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:796)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at
> org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:794)
> Oct 19 13:29:22 prod-srvr apache-ignite[46822]: ... 23 more
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Mime
View raw message