ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Zinoviev (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-9359) OptimizeMakeChangeGAExample hangs forever with additional nods in topology
Date Tue, 19 Nov 2019 10:51:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977368#comment-16977368
] 

Alexey Zinoviev commented on IGNITE-9359:
-----------------------------------------

[~netmille] do you have any ideas how to fix it?

> OptimizeMakeChangeGAExample hangs forever with additional nods in topology
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-9359
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9359
>             Project: Ignite
>          Issue Type: Bug
>          Components: ml
>    Affects Versions: 2.6
>            Reporter: Alex Volkov
>            Assignee: Turik Campbell
>            Priority: Major
>
> To reproduce this issue please follow these steps:
> 1. Run two nodes using ignite.sh script.
> For example:
> {code:java}
> bin/ignite.sh examples/config/example-ignite.xml -J-Xmx1g -J-Xms1g -J-DCONSISTENT_ID=node1
-J-DIGNITE_QUIET=false
> {code}
> 2. Run  HelloWorldGAExample from IDEA IDE.
> *Expecting result:*
> Example successfully run and completed.
> *Actual result:*
> There are a lot of NPE exceptions in example log:
> {code:java}
> [2018-08-23 17:38:59,246][ERROR][pub-#20][GridJobWorker] Failed to execute job due to
unexpected runtime exception [jobId=2a309376561-70889d5c-33f2-4c96-bf1e-f280c0ac4a1c, ses=GridJobSessionImpl
[ses=GridTaskSessionImpl [taskName=o.a.i.ml.genetic.FitnessTask, dep=GridDeployment [ts=1535035116486,
depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=4baf8376561-70889d5c-33f2-4c96-bf1e-f280c0ac4a1c,
userVer=0, loc=true, sampleClsName=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap,
pendingUndeploy=false, undeployed=false, usage=2], taskClsName=o.a.i.ml.genetic.FitnessTask,
sesId=b4209376561-70889d5c-33f2-4c96-bf1e-f280c0ac4a1c, startTime=1535035123014, endTime=9223372036854775807,
taskNodeId=70889d5c-33f2-4c96-bf1e-f280c0ac4a1c, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2,
closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false,
topPred=o.a.i.i.cluster.ClusterGroupAdapter$AttributeFilter@5668ad01, subjId=70889d5c-33f2-4c96-bf1e-f280c0ac4a1c,
mapFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=574227802]IgniteFuture
[orig=], execName=null], jobId=2a309376561-70889d5c-33f2-4c96-bf1e-f280c0ac4a1c], err=null]
> java.lang.NullPointerException
> at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:76)
> at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:35)
> at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
> at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6749)
> at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
> at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> and it hangs on this one:
> {code:java}
> [2018-08-23 17:38:59,582][WARN ][sys-#54][AlwaysFailoverSpi] Received topology with only
nodes that job had failed on (forced to fail) [failedNodes=[3db84480-08b8-4d54-9d3a-e23b53761f29,
70889d5c-33f2-4c96-bf1e-f280c0ac4a1c, 4f815cff-f77c-4a41-9ae1-ebb00b1dd44c]]
> class org.apache.ignite.cluster.ClusterTopologyException: Failed to failover a job to
another node (failover SPI returned null) [job=org.apache.ignite.ml.genetic.FitnessJob@1045c79e,
node=TcpDiscoveryNode [id=4f815cff-f77c-4a41-9ae1-ebb00b1dd44c, addrs=ArrayList [0:0:0:0:0:0:0:1,
127.0.0.1, 172.25.4.42, 172.25.4.92], sockAddrs=HashSet [/172.25.4.92:47501, /172.25.4.42:47501,
/0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1535035115978,
loc=false, ver=2.7.0#19700101-sha1:00000000, isClient=false]]
> at org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:853)
> at org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:851)
> at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:985)
> at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:541)
> at org.apache.ignite.ml.genetic.GAGrid.calculateFitness(GAGrid.java:102)
> at org.apache.ignite.ml.genetic.GAGrid.evolve(GAGrid.java:171)
> at org.apache.ignite.examples.ml.genetic.change.OptimizeMakeChangeGAExample.main(OptimizeMakeChangeGAExample.java:148)
> Caused by: class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Failed to failover a job to another node (failover SPI returned null) [job=org.apache.ignite.ml.genetic.FitnessJob@1045c79e,
node=TcpDiscoveryNode [id=4f815cff-f77c-4a41-9ae1-ebb00b1dd44c, addrs=ArrayList [0:0:0:0:0:0:0:1,
127.0.0.1, 172.25.4.42, 172.25.4.92], sockAddrs=HashSet [/172.25.4.92:47501, /172.25.4.42:47501,
/0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1535035115978,
loc=false, ver=2.7.0#19700101-sha1:00000000, isClient=false]]
> at org.apache.ignite.internal.processors.task.GridTaskWorker.checkTargetNode(GridTaskWorker.java:1235)
> at org.apache.ignite.internal.processors.task.GridTaskWorker.failover(GridTaskWorker.java:1203)
> at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:938)
> at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1077)
> at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1312)
> at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
> at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteException: Failed to deserialize object [typeName=org.apache.ignite.ml.genetic.FitnessJob]
> at org.apache.ignite.internal.processors.job.GridJobWorker.initialize(GridJobWorker.java:459)
> at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1119)
> at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1923)
> ... 7 more
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to deserialize object
[typeName=org.apache.ignite.ml.genetic.FitnessJob]
> at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10025)
> at org.apache.ignite.internal.processors.job.GridJobWorker.initialize(GridJobWorker.java:440)
> ... 9 more
> Caused by: class org.apache.ignite.binary.BinaryObjectException: Failed to deserialize
object [typeName=org.apache.ignite.ml.genetic.FitnessJob]
> at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:914)
> at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1764)
> at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
> at org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
> at org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:102)
> at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
> at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10019)
> ... 10 more
> Caused by: class org.apache.ignite.binary.BinaryObjectException: Failed to read field
[name=fitnessFuncton]
> at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:191)
> at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:875)
> ... 16 more
> Caused by: class org.apache.ignite.binary.BinaryInvalidTypeException: org.apache.ignite.examples.ml.genetic.change.OptimizeMakeChangeFitnessFunction
> at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:707)
> at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757)
> at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
> at org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1984)
> at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
> at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
> ... 17 more
> Caused by: java.lang.ClassNotFoundException: org.apache.ignite.examples.ml.genetic.change.OptimizeMakeChangeFitnessFunction
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8665)
> at org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:349)
> at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:698)
> ... 22 more
> {code}
> Please let me know if you need full logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message