From issues-return-72729-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu Aug 23 09:05:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7F571180771 for ; Thu, 23 Aug 2018 09:05:05 +0200 (CEST) Received: (qmail 40694 invoked by uid 500); 23 Aug 2018 07:05:04 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 40685 invoked by uid 99); 23 Aug 2018 07:05:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2018 07:05:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C2F3618098D for ; Thu, 23 Aug 2018 07:05:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.301 X-Spam-Level: X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 4EYXymkN6nrU for ; Thu, 23 Aug 2018 07:05:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0292A5F43B for ; Thu, 23 Aug 2018 07:05:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3050BE0DAA for ; Thu, 23 Aug 2018 07:05:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 815212469F for ; Thu, 23 Aug 2018 07:05:00 +0000 (UTC) Date: Thu, 23 Aug 2018 07:05:00 +0000 (UTC) From: "Alex Volkov (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-9354) HelloWorldGAExample hangs forever with additional nods in topology MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Alex Volkov created IGNITE-9354: ----------------------------------- Summary: HelloWorldGAExample hangs forever with additional nod= s in topology Key: IGNITE-9354 URL: https://issues.apache.org/jira/browse/IGNITE-9354 Project: Ignite Issue Type: Bug Components: ml Affects Versions: 2.6 Reporter: Alex Volkov Attachments: log.zip To reproduce this issue please follow these steps: 1. Run two nodes using ignite.sh script. For example: {code:java} bin/ignite.sh examples/config/example-ignite.xml -J-Xmx1g -J-Xms1g -J-DCONS= ISTENT_ID=3Dnode1 -J-DIGNITE_QUIET=3Dfalse {code} 2. Run=C2=A0 HelloWorldGAExample from IDEA IDE. *Expecting result:* Example successfully run and completed. *Actual result:* There are a lot of NPE exceptions in example log: {code:java} [2018-08-23 09:49:25,029][ERROR][pub-#19][GridJobWorker] Failed to execute = job due to unexpected runtime exception [jobId=3Dc296b856561-e5eca24b-6f5a-= 4d3e-9e9e-94ad404b44d1, ses=3DGridJobSessionImpl [ses=3DGridTaskSessionImpl= [taskName=3Do.a.i.ml.genetic.FitnessTask, dep=3DGridDeployment [ts=3D15350= 06960878, depMode=3DSHARED, clsLdr=3Dsun.misc.Launcher$AppClassLoader@18b4a= ac2, clsLdrId=3D8d16b856561-e5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, userVer= =3D0, loc=3Dtrue, sampleClsName=3Do.a.i.i.processors.cache.distributed.dht.= preloader.GridDhtPartitionFullMap, pendingUndeploy=3Dfalse, undeployed=3Dfa= lse, usage=3D2], taskClsName=3Do.a.i.ml.genetic.FitnessTask, sesId=3Db196b8= 56561-e5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, startTime=3D1535006964236, endT= ime=3D9223372036854775807, taskNodeId=3De5eca24b-6f5a-4d3e-9e9e-94ad404b44d= 1, clsLdr=3Dsun.misc.Launcher$AppClassLoader@18b4aac2, closed=3Dfalse, cpSp= i=3Dnull, failSpi=3Dnull, loadSpi=3Dnull, usage=3D1, fullSup=3Dfalse, inter= nal=3Dfalse, topPred=3Do.a.i.i.cluster.ClusterGroupAdapter$AttributeFilter@= 2d746ce4, subjId=3De5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, mapFut=3DGridFutur= eAdapter [ignoreInterrupts=3Dfalse, state=3DINIT, res=3Dnull, hash=3D679592= 043]IgniteFuture [orig=3D], execName=3Dnull], jobId=3Dc296b856561-e5eca24b-= 6f5a-4d3e-9e9e-94ad404b44d1], err=3Dnull] java.lang.NullPointerException at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:76) at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:35) at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWo= rker.java:568) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils= .java:6749) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJob= Worker.java:562) at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWork= er.java:491) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:11= 0) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav= a:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:617) at java.lang.Thread.run(Thread.java:745) {code} and it hangs on this one: {code:java} [2018-08-23 09:49:35,229][WARN ][pub-#17][AlwaysFailoverSpi] Received topol= ogy with only nodes that job had failed on (forced to fail) [failedNodes=3D= [eac48ea7-da79-453a-a94c-291039c5cc15, 0907d876-e0ce-4fda-966d-ad91a03f9722= , e5eca24b-6f5a-4d3e-9e9e-94ad404b44d1]] class org.apache.ignite.cluster.ClusterTopologyException: Failed to failove= r a job to another node (failover SPI returned null) [job=3Dorg.apache.igni= te.ml.genetic.FitnessJob@35f8a9d3, node=3DTcpDiscoveryNode [id=3De5eca24b-6= f5a-4d3e-9e9e-94ad404b44d1, addrs=3DArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, = 172.25.4.42, 172.25.4.92], sockAddrs=3DHashSet [/172.25.4.42:47502, /172.25= .4.92:47502, /0:0:0:0:0:0:0:1:47502, /127.0.0.1:47502], discPort=3D47502, o= rder=3D3, intOrder=3D3, lastExchangeTime=3D1535006974981, loc=3Dtrue, ver= =3D2.7.0#19700101-sha1:00000000, isClient=3Dfalse]] at org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:853= ) at org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:851= ) at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils= .java:985) at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.j= ava:541) at org.apache.ignite.ml.genetic.GAGrid.calculateFitness(GAGrid.java:102) at org.apache.ignite.ml.genetic.GAGrid.evolve(GAGrid.java:171) at org.apache.ignite.examples.ml.genetic.helloworld.HelloWorldGAExample.mai= n(HelloWorldGAExample.java:90) Caused by: class org.apache.ignite.internal.cluster.ClusterTopologyCheckedE= xception: Failed to failover a job to another node (failover SPI returned n= ull) [job=3Dorg.apache.ignite.ml.genetic.FitnessJob@35f8a9d3, node=3DTcpDis= coveryNode [id=3De5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, addrs=3DArrayList [0= :0:0:0:0:0:0:1, 127.0.0.1, 172.25.4.42, 172.25.4.92], sockAddrs=3DHashSet [= /172.25.4.42:47502, /172.25.4.92:47502, /0:0:0:0:0:0:0:1:47502, /127.0.0.1:= 47502], discPort=3D47502, order=3D3, intOrder=3D3, lastExchangeTime=3D15350= 06974981, loc=3Dtrue, ver=3D2.7.0#19700101-sha1:00000000, isClient=3Dfalse]= ] at org.apache.ignite.internal.processors.task.GridTaskWorker.checkTargetNod= e(GridTaskWorker.java:1235) at org.apache.ignite.internal.processors.task.GridTaskWorker.failover(GridT= askWorker.java:1203) at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(Gri= dTaskWorker.java:938) at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobE= xecuteResponse(GridTaskProcessor.java:1077) at org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJo= bWorker.java:931) at org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJo= bWorker.java:779) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJob= Worker.java:631) at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWork= er.java:491) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:11= 0) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav= a:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:617) at java.lang.Thread.run(Thread.java:745) Caused by: class org.apache.ignite.compute.ComputeUserUndeclaredException: = Failed to execute job due to unexpected runtime exception [jobId=3Df296b856= 561-e5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, ses=3DGridJobSessionImpl [ses=3DG= ridTaskSessionImpl [taskName=3Dorg.apache.ignite.ml.genetic.FitnessTask, de= p=3DGridDeployment [ts=3D1535006960878, depMode=3DSHARED, clsLdr=3Dsun.misc= .Launcher$AppClassLoader@18b4aac2, clsLdrId=3D8d16b856561-e5eca24b-6f5a-4d3= e-9e9e-94ad404b44d1, userVer=3D0, loc=3Dtrue, sampleClsName=3Dorg.apache.ig= nite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFu= llMap, pendingUndeploy=3Dfalse, undeployed=3Dfalse, usage=3D2], taskClsName= =3Dorg.apache.ignite.ml.genetic.FitnessTask, sesId=3Db196b856561-e5eca24b-6= f5a-4d3e-9e9e-94ad404b44d1, startTime=3D1535006964236, endTime=3D9223372036= 854775807, taskNodeId=3De5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, clsLdr=3Dsun.= misc.Launcher$AppClassLoader@18b4aac2, closed=3Dfalse, cpSpi=3Dnull, failSp= i=3Dnull, loadSpi=3Dnull, usage=3D1, fullSup=3Dfalse, internal=3Dfalse, top= Pred=3Dorg.apache.ignite.internal.cluster.ClusterGroupAdapter$AttributeFilt= er@2d746ce4, subjId=3De5eca24b-6f5a-4d3e-9e9e-94ad404b44d1, mapFut=3DGridFu= tureAdapter [ignoreInterrupts=3Dfalse, state=3DINIT, res=3Dnull, hash=3D157= 9959210]IgniteFuture [orig=3D], execName=3Dnull], jobId=3Df296b856561-e5eca= 24b-6f5a-4d3e-9e9e-94ad404b44d1], err=3Dnull] at org.apache.ignite.internal.processors.job.GridJobWorker.handleThrowable(= GridJobWorker.java:689) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJob= Worker.java:621) ... 5 more Caused by: java.lang.NullPointerException at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:76) at org.apache.ignite.ml.genetic.FitnessJob.execute(FitnessJob.java:35) at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWo= rker.java:568) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils= .java:6749) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJob= Worker.java:562) ... 5 more {code} Please=C2=A0let me know if you need full nodes and example logs. =C2=A0 =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)