Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8453A200D19 for ; Fri, 6 Oct 2017 11:39:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 82DED1609DF; Fri, 6 Oct 2017 09:39:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A1F881609D0 for ; Fri, 6 Oct 2017 11:39:41 +0200 (CEST) Received: (qmail 25623 invoked by uid 500); 6 Oct 2017 09:39:40 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 25613 invoked by uid 99); 6 Oct 2017 09:39:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2017 09:39:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id DD6671812A4 for ; Fri, 6 Oct 2017 09:39:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.172 X-Spam-Level: ** X-Spam-Status: No, score=2.172 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, NML_ADSP_CUSTOM_MED=1.2, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.972] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id i3-lDcugEyp4 for ; Fri, 6 Oct 2017 09:39:36 +0000 (UTC) Received: from n6.nabble.com (n6.nabble.com [162.255.23.37]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E492B5FD10 for ; Fri, 6 Oct 2017 09:39:35 +0000 (UTC) Received: from n6.nabble.com (localhost [127.0.0.1]) by n6.nabble.com (Postfix) with ESMTP id A1DB22167FB3 for ; Fri, 6 Oct 2017 02:39:35 -0700 (MST) Date: Fri, 6 Oct 2017 02:39:35 -0700 (MST) From: =?UTF-8?Q?Art=D1=91m_Basov?= To: user@ignite.apache.org Message-ID: <1507282775619-0.post@n6.nabble.com> Subject: Race in Service Grid deployment? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit archived-at: Fri, 06 Oct 2017 09:39:42 -0000 So we found this one while testing nodes shutdown and startup under light but steady load. Setup: We have 2 types of nodes let's call them Vehicle node (VehicleService deployed there) and client node. Client node have constant load applied to Vehicle grid (could be many nodes. We will stick with 1 since it reproducible even there) via following code: public static void main(String[] args) throws IgniteException, InterruptedException { Ignite ignite = Ignition.start("config/client-node-config.xml"); VehicleService vehicleService = ignite.services().serviceProxy(VehicleService.SERVICE_NAME, VehicleService.class, false); int cnt = 0; while(!Thread.currentThread().isInterrupted()) { String answer = null; try { answer = "" + vehicleService.testService(); } catch (Exception e) { answer = e.getMessage(); } System.out.println("#"+ cnt++ + "a=" + answer); TimeUnit.MILLISECONDS.sleep(10); } } From the VehicleService side testService() is dead simple: @Override public int testService() { return 1; } And i added some delay in init() of VehicleService: public void init(ServiceContext ctx) throws Exception { System.out.println("Initializing Vehicle Service on node:" + ignite.cluster().localNode() + " within 10 seconds"); SECONDS.sleep(10); } Steps: 1) Start Vehicle service node Observation: VehicleService deployed 2) Start main() provided above Observation: output, as expected will be printing counter+ 1 as result of service execution - OK 3) Stop VehicleService node Observation: output of main() : [12:06:06] Topology snapshot [ver=7, servers=1, clients=0, CPUs=4, heap=3.5GB] #286a=Node has left grid: 34c87a31-12b5-4e9f-a172-3f881d12d949 [12:06:06,040][SEVERE][pub-#53%null%][GridTaskWorker] Failed to obtain remote job result policy for result from ComputeTask.result(..) method (will fail the whole task): GridJobResultImpl [job=C2V2 [c=o.a.i.i.processors.service.GridServiceProcessor$ServiceTopologyCallable@3f672204], sib=GridJobSiblingImpl [sesId=520eee0fe51-710996c1-c154-4fca-af30-0a067ec16009, jobId=620eee0fe51-710996c1-c154-4fca-af30-0a067ec16009, nodeId=34c87a31-12b5-4e9f-a172-3f881d12d949, isJobDone=false], jobCtx=GridJobContextImpl [jobId=620eee0fe51-710996c1-c154-4fca-af30-0a067ec16009, timeoutObj=null, attrs={}], node=TcpDiscoveryNode [id=34c87a31-12b5-4e9f-a172-3f881d12d949, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.6.251], sockAddrs=[/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500, BASOV/192.168.6.251:47500], discPort=47500, order=4, intOrder=3, lastExchangeTime=1507280749135, loc=false, ver=1.9.0#20170302-sha1:a8169d0a, isClient=false], ex=class o.a.i.cluster.ClusterTopologyException: Node has left grid: 34c87a31-12b5-4e9f-a172-3f881d12d949, hasRes=true, isCancelled=false, isOccupied=true] class org.apache.ignite.cluster.ClusterTopologyException: Node has left grid: 34c87a31-12b5-4e9f-a172-3f881d12d949 at org.apache.ignite.internal.processors.task.GridTaskWorker.onNodeLeft(GridTaskWorker.java:1460) at org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1.run(GridTaskProcessor.java:1248) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6674) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:783) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) #287a=Failed to find deployed service: VehicleService #288a=Failed to find deployed service: VehicleService which is expected behavior. 4) Start VehicleService node. VehicleService deployed Observation: suddenly, output contains: Initializing Vehicle Service on node:TcpDiscoveryNode [id=dd38ff63-b3b2-486a-8974-5bebcb576cd8, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.6.251], sockAddrs=[BASOV/192.168.6.251:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1507280863055, loc=true, ver=1.9.0#20170302-sha1:a8169d0a, isClient=false] within 10 seconds [12:07:43,691][SEVERE][pub-#46%null%][GridJobWorker] Failed to execute job [jobId=d7180f0fe51-0fd41e8b-385e-4480-8510-dca328846008, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=o.a.i.i.processors.service.GridServiceProxy$ServiceProxyCallable, dep=GridDeployment [ts=1507280863375, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=9a2b0f0fe51-dd38ff63-b3b2-486a-8974-5bebcb576cd8, userVer=0, loc=true, sampleClsName=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=o.a.i.i.processors.service.GridServiceProxy$ServiceProxyCallable, sesId=c7180f0fe51-0fd41e8b-385e-4480-8510-dca328846008, startTime=1507280863660, endTime=9223372036854775807, taskNodeId=0fd41e8b-385e-4480-8510-dca328846008, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, subjId=0fd41e8b-385e-4480-8510-dca328846008, mapFut=IgniteFuture [orig=GridFutureAdapter [resFlag=0, res=null, startTime=1507280863678, endTime=0, ignoreInterrupts=false, state=INIT]]], jobId=d7180f0fe51-0fd41e8b-385e-4480-8510-dca328846008]] class org.apache.ignite.IgniteException: Service not found: VehicleService at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2V2.execute(GridClosureProcessor.java:2059) at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:560) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6618) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:554) at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:483) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1180) at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1894) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:850) at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:108) at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:790) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.internal.processors.service.GridServiceNotFoundException: Service not found: VehicleService at org.apache.ignite.internal.processors.service.GridServiceProxy$ServiceProxyCallable.call(GridServiceProxy.java:397) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2V2.execute(GridClosureProcessor.java:2056) ... 14 more Executing Vehicle Service on node:TcpDiscoveryNode [id=dd38ff63-b3b2-486a-8974-5bebcb576cd8, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.6.251], sockAddrs=[BASOV/192.168.6.251:47501, /0:0:0:0:0:0:0:1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1507280873066, loc=true, ver=1.9.0#20170302-sha1:a8169d0a, isClient=false] From my understanding, this means that Client node got the information about VehicleService deployment before VehicleService deployment node. Steps 1 and 3 is somewhat optional - i had reproduced this issue without them as well. Question is - is this is expected behaviour (race intentionally allowed) or is a bug? Here is sample project - It is somewhat messy. Sorry about that. MicroServicesExample-master.zip It is modified example https://github.com/dmagda/MicroServicesExample. You only interested in VehicleServiceNodeStartup and ServiceRaceTest entry points. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/