Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFE0F17E3C for ; Wed, 1 Apr 2015 14:44:54 +0000 (UTC) Received: (qmail 83987 invoked by uid 500); 1 Apr 2015 14:44:54 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 83943 invoked by uid 500); 1 Apr 2015 14:44:54 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 83917 invoked by uid 99); 1 Apr 2015 14:44:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 14:44:54 +0000 Date: Wed, 1 Apr 2015 14:44:54 +0000 (UTC) From: "Jordan Zimmerman (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-196) this.client.create().creatingParentsIfNeeded() throw Puzzling EXCEPTION MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-196?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D143= 90651#comment-14390651 ]=20 Jordan Zimmerman commented on CURATOR-196: ------------------------------------------ I don't see where the bug is. From what I can tell you are calling create()= for a node that already exists. The most likely problem is code like this: {code} if(this.client.checkExists().forPath(path)!=3Dnull) { .... } else { this.client.create(). .... } {code} This type of check is not correct. After the call to checkExists() another = client may have created the node and, so, the create() call will fail. The = only safe way to handle this is to always try to create() and, if that thro= ws NodeExists, call setData(). Or, acquire a Curator lock first. > this.client.create().creatingParentsIfNeeded() throw Puzzling EXCEPTION > ------------------------------------------------------------------------ > > Key: CURATOR-196 > URL: https://issues.apache.org/jira/browse/CURATOR-196 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 2.6.0 > Environment: RedHat > Reporter: HuanWang > > Scene One=EF=BC=9AIn Single test. when I wanna register to zk. The code a= s below: > {code} > private void startWorker() { > =09=09try { > =09=09=09LOG.info("Start With Worker IP:" + this.workerIP); > =09=09=09 > =09=09=09this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_MONITOR_PA= TH); > =09=09=09this.client.makeDir(SuperionConstant.ZOOKEEPER_WORKER_PATH); > =09=09=09 > =09=09=09this.workerMonitorPath =3D SuperionConstant.ZOOKEEPER_WORKER_MON= ITOR_PATH + "/" + this.workerIP; > =09=09=09/** Ephemeral Node: /workersMonitor/192.168.0.2 */ > =09=09=09this.client.createEphemeralNode(this.workerMonitorPath); > =09=09=09 > =09=09=09 > =09=09=09this.workerPath =3D SuperionConstant.ZOOKEEPER_WORKER_PATH + "/"= + this.workerIP; > =09=09=09/** worker Node: /workers/192.168.0.2 */ > =09=09=09this.client.makeDir(this.workerPath); > =09=09=09 > =09=09=09String workerStatePath =3D this.workerPath + "/" + "state"; > =09=09=09/** Persistent Node: /workers/192.168.0.2/state */ > =09=09=09this.client.makeDir(workerStatePath); > =09=09=09 > =09=09=09/** Persistent Node: /workers/192.168.0.2/state/ProcessID */ > =09=09=09String workerStatePidPath =3D workerStatePath + "/" + "ProcessID= "; > =09=09=09this.client.writeInt32(workerStatePidPath, workerPID); > =09=09=09 > =09=09=09//this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_PATH); > =09=09=09/** Persistent Node: /jobs/tmp */ > =09=09=09this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_TMP_PATH); > =09=09=09/** Persistent Node: /jobs/state */ > =09=09=09this.client.makeDir(SuperionConstant.ZOOKEEPER_JOB_STATE_PATH); > =09=09=09 > =09=09=09//register the worker in Zookeeper success > =09=09=09this.containerManager.setBlockNewContainerRequests(false);=09 > =09=09} catch (Exception e) { > =09=09=09String errorMsg =3D "Worker Register Error Happen, Maker Sure Zo= okeeper Server Can Be Connected"; > =09=09=09LOG.error(errorMsg, e); > =09=09=09throw new SuperionRuntimeException(errorMsg,e); > =09=09} > =09} > {code} > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > the function I use is creatingParentsIfNeeded(). > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > {code} > public synchronized void writeData(String path,byte data[]) throws Except= ion { > =09=09 System.out.println(path+" : writeData"); > =09=09if(this.client.checkExists().forPath(path)!=3Dnull) { > =09=09=09//node exit > =09=09=09System.out.println(path+" : checkExist"); > =09=09=09this.client.setData().forPath(path, data); > =09=09} else { > =09=09=09//node not exit, create new > =09=09=09System.out.println(path+ " : node not exit"); > =09=09=09this.client.create().creatingParentsIfNeeded() > =09=09=09.withMode(CreateMode.PERSISTENT).forPath(path, data); > =09=09//=09this.client.create().withMode(CreateMode.PERSISTENT).forPath(p= ath, data); > =09=09=09System.out.println(path+ " : creatingParentsIfNeeded"); > =09=09} > {code} > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > but sometimes (not every time) .it would throw NodeExistException: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > {code} > 015-03-31 15:29:49,452 INFO [main-EventThread] state.ConnectionStateMana= ger (ConnectionStateManager.java:postState(228)) - State change: CONNECTED > /workersMonitor : checkExist > /workers : writeData > /workers : checkExist > /workers/10.24.76.52 : writeData > /workers/10.24.76.52 : node not exit > /workers/10.24.76.52 : creatingParentsIfNeeded > /workers/10.24.76.52/state : writeData > /workers/10.24.76.52/state : node not exit > 2015-03-31 15:29:50,508 ERROR [main] zookeeper.ZookeeperService (Zookeepe= rService.java:startWorker(331)) - Worker Register Error Happen, Maker Sure = Zookeeper Server Can Be Connected > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode= =3D NodeExists for /workers/10.24.76.52/state > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:11= 9) > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:51= ) > =09at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:688) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:672) > =09at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForegroun= d(CreateBuilderImpl.java:668) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathIn= Foreground(CreateBuilderImpl.java:453) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:443) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:44) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.writeData(ZookeeperClient.java:125) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.makeDir(ZookeeperClient.java:169) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:315) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.serviceStart(ZookeeperService.java:86) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.containermanager.ContainerMana= gerImpl.serviceStart(ContainerManagerImpl.java:230) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.jav= a:143) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager= (Worker.java:182) > =09at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227) > 2015-03-31 15:29:50,510 INFO [main] service.AbstractService (AbstractSer= vice.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.= containermanager.zookeeper.ZookeeperService failed in state STARTED; cause:= com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker R= egister Error Happen, Maker Sure Zookeeper Server Can Be Connected > com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker = Register Error Happen, Maker Sure Zookeeper Server Can Be Connected > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:332) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.serviceStart(ZookeeperService.java:86) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.containermanager.ContainerMana= gerImpl.serviceStart(ContainerManagerImpl.java:230) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.jav= a:143) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager= (Worker.java:182) > =09at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: Keep= erErrorCode =3D NodeExists for /workers/10.24.76.52/state > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:11= 9) > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:51= ) > =09at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:688) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:672) > =09at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForegroun= d(CreateBuilderImpl.java:668) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathIn= Foreground(CreateBuilderImpl.java:453) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:443) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:44) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.writeData(ZookeeperClient.java:125) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.makeDir(ZookeeperClient.java:169) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:315) > =09... 10 more > 2015-03-31 15:29:50,557 INFO [main] zookeeper.ZooKeeper (ZooKeeper.java:= close(684)) - Session: 0x34a75c727c204a4 closed > 2015-03-31 15:29:50,557 INFO [main-EventThread] zookeeper.ClientCnxn (Cl= ientCnxn.java:run(512)) - EventThread shut down > 2015-03-31 15:29:50,558 INFO [main] service.AbstractService (AbstractSer= vice.java:noteFailure(272)) - Service com.suning.cybertron.superion.worker.= containermanager.ContainerManagerImpl failed in state STARTED; cause: com.s= uning.cybertron.superion.exception.SuperionRuntimeException: Worker Registe= r Error Happen, Maker Sure Zookeeper Server Can Be Connected > com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker = Register Error Happen, Maker Sure Zookeeper Server Can Be Connected > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:332) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.serviceStart(ZookeeperService.java:86) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.containermanager.ContainerMana= gerImpl.serviceStart(ContainerManagerImpl.java:230) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.jav= a:143) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager= (Worker.java:182) > =09at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: Keep= erErrorCode =3D NodeExists for /workers/10.24.76.52/state > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:11= 9) > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:51= ) > =09at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:688) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:672) > =09at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForegroun= d(CreateBuilderImpl.java:668) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathIn= Foreground(CreateBuilderImpl.java:453) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:443) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:44) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.writeData(ZookeeperClient.java:125) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.makeDir(ZookeeperClient.java:169) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:315) > =09... 10 more > 2015-03-31 15:29:50,561 INFO [main] monitor.ContainersMonitorImpl (Conta= inersMonitorImpl.java:isEnabled(168)) - Neither virutal-memory nor physical= -memory monitoring is needed. Not running the monitor-thread > 2015-03-31 15:29:50,562 INFO [main] service.AbstractService (AbstractSer= vice.java:noteFailure(272)) - Service NodeManager failed in state STARTED; = cause: com.suning.cybertron.superion.exception.SuperionRuntimeException: Wo= rker Register Error Happen, Maker Sure Zookeeper Server Can Be Connected > com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker = Register Error Happen, Maker Sure Zookeeper Server Can Be Connected > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:332) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.serviceStart(ZookeeperService.java:86) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.containermanager.ContainerMana= gerImpl.serviceStart(ContainerManagerImpl.java:230) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.jav= a:143) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager= (Worker.java:182) > =09at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: Keep= erErrorCode =3D NodeExists for /workers/10.24.76.52/state > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:11= 9) > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:51= ) > =09at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:688) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:672) > =09at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForegroun= d(CreateBuilderImpl.java:668) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathIn= Foreground(CreateBuilderImpl.java:453) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:443) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:44) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.writeData(ZookeeperClient.java:125) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.makeDir(ZookeeperClient.java:169) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:315) > =09... 10 more > 2015-03-31 15:29:50,562 INFO [Public Localizer] localizer.ResourceLocali= zationService (ResourceLocalizationService.java:run(642)) - Public cache ex= iting > 2015-03-31 15:29:50,563 INFO [main] impl.MetricsSystemImpl (MetricsSyste= mImpl.java:stop(200)) - Stopping Worker metrics system... > 2015-03-31 15:29:50,564 INFO [main] impl.MetricsSystemImpl (MetricsSyste= mImpl.java:stop(206)) - Worker metrics system stopped. > 2015-03-31 15:29:50,564 INFO [main] impl.MetricsSystemImpl (MetricsSyste= mImpl.java:shutdown(572)) - Worker metrics system shutdown complete. > 2015-03-31 15:29:50,564 FATAL [main] worker.Worker (Worker.java:initAndSt= artNodeManager(184)) - Error starting NodeManager > com.suning.cybertron.superion.exception.SuperionRuntimeException: Worker = Register Error Happen, Maker Sure Zookeeper Server Can Be Connected > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:332) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.serviceStart(ZookeeperService.java:86) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.containermanager.ContainerMana= gerImpl.serviceStart(ContainerManagerImpl.java:230) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at org.apache.hadoop.service.CompositeService.serviceStart(CompositeSe= rvice.java:121) > =09at com.suning.cybertron.superion.worker.Worker.serviceStart(Worker.jav= a:143) > =09at org.apache.hadoop.service.AbstractService.start(AbstractService.jav= a:193) > =09at com.suning.cybertron.superion.worker.Worker.initAndStartNodeManager= (Worker.java:182) > =09at com.suning.cybertron.superion.worker.Worker.main(Worker.java:227) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: Keep= erErrorCode =3D NodeExists for /workers/10.24.76.52/state > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:11= 9) > =09at org.apache.zookeeper.KeeperException.create(KeeperException.java:51= ) > =09at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:688) > =09at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateB= uilderImpl.java:672) > =09at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForegroun= d(CreateBuilderImpl.java:668) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathIn= Foreground(CreateBuilderImpl.java:453) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:443) > =09at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateB= uilderImpl.java:44) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.writeData(ZookeeperClient.java:125) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperClient.makeDir(ZookeeperClient.java:169) > =09at com.suning.cybertron.superion.worker.containermanager.zookeeper.Zoo= keeperService.startWorker(ZookeeperService.java:315) > =09... 10 more > {code} > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D > Scene Two: When starting job =EF=BC=9A > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D > {code} > private void startJob(ZookeeperEvent zookeeperEvent) { > =09=09 > =09=09StartJobZookeeperEvent startJobZookeeperEvent =3D (StartJobZookeepe= rEvent) zookeeperEvent; > =09=09String jobInstanceId =3D startJobZookeeperEvent > =09=09=09=09.getStartContainerRequest().getContainerId().getApplicationId= () > =09=09=09=09.getJobInstanceZKId(); > =09=09 > =09=09String jobTmpEphemeral =3D SuperionConstant.ZOOKEEPER_JOB_TMP_PATH = + "/" + jobInstanceId; > =09=09String jobStatePersistent =3D SuperionConstant.ZOOKEEPER_JOB_STATE_= PATH + "/" + jobInstanceId; > =09=09 > =09=09String jobStateWorkerIP =3D jobStatePersistent + "/" + SuperionCons= tant.JobState.WorkerIP; > =09=09String jobStateJobStatus =3D jobStatePersistent + "/" + SuperionCon= stant.JobState.JobStatus; > =09=09String jobStateJobErrorMsg =3D jobStatePersistent + "/" + SuperionC= onstant.JobState.JobErrorMsg; > =09=09String jobStateCreateTime =3D jobStatePersistent + "/" + SuperionCo= nstant.JobState.CreateTime; > =09=09 > =09=09try { > =09=09=09/** Ephemeral Node: /job/tmp/jobInstanceId */ > =09=09=09this.client.createEphemeralNode(jobTmpEphemeral); > =09=09=09if(this.client.checkExists(jobTmpEphemeral) =3D=3D null) > =09=09=09=09throw new Exception("ephemeral node["+jobTmpEphemeral+"] crea= te fail"); > =09=09=09 > =09=09=09/** update job state----------------- */ > =09=09=09/** Persistent Node: /jobs/state/jobInstanceId */ > =09=09=09this.client.makeDir(jobStatePersistent); > =09=09=09/** Persistent Node: /jobs/state/jobInstanceId/WorkerIP */ > =09=09=09this.client.writeString(jobStateWorkerIP, this.workerIP); > =09=09=09 > =09=09=09/** Persistent Node: /jobs/state/jobInstanceId/CreateTime */ > =09=09=09this.client.writeInt64(jobStateCreateTime, System.currentTimeMil= lis()); > =09=09=09/* start container request */ > =09=09=09StartContainerResponse response =3D this.containerManager.startC= ontainers( > =09=09=09=09=09startJobZookeeperEvent.getStartContainerRequest()); > =09=09=09 > =09=09=09int jobStatusInt =3D SuperionConstant.JOB_STATUS_TAKED; > =09=09=09 > =09=09=09//TODO whtest > =09=09=09 > =09=09=09if(!response.isSuccess()) { > =09=09=09//=09jobStatusInt =3D SuperionConstant.JOB_STATUS_PARAMETER_CHEC= K_ERROR; > =09=09=09=09LOG.error(startJobZookeeperEvent.getStartContainerRequest().g= etContainerId().toString() + " start exception",=20 > =09=09=09=09=09=09response.getFailureReason()); > =09=09=09String jobErrorMsg =3D response.getFailureReason().getMessage(); > =09=09=09throw new Exception(jobErrorMsg,response.getFailureReason()); > =09=09=09=09/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */ > // =09=09=09this.client.writeString(jobStateJobErrorMsg, jobErrorMs= g); > =09=09=09 > =09=09=09}=20 > =09=09=09 > =09=09=09/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */ > =09=09=09this.client.writeInt32(jobStateJobStatus, jobStatusInt); > =09=09} catch (Exception e) { > =09=09=09LOG.error("exception happened when start job" , e); > =09=09=09 > =09=09=09if(e instanceof KeeperException.NodeExistsException){ > =09=09=09=09/* > =09=09=09=09* node exit exception when /job/tmp/jobInstanceId create > =09=09=09=09* if /job/tmp/jobInstanceId create then return > =09=09=09=09* */ > =09=09=09=09KeeperException.NodeExistsException nodeExists =3D (KeeperExc= eption.NodeExistsException)e; > =09=09=09=09 String existsPath =3D nodeExists.getPath(); > =09=09=09=09 =20 > =09=09=09=09if(existsPath !=3D null && existsPath.startsWith(SuperionCons= tant.ZOOKEEPER_JOB_TMP_PATH)) { > =09=09=09=09=09return; > =09=09=09=09} > =09=09=09} > =09=09=09try{ > =09=09=09=09String jobErrorMsg =3D e.getMessage(); > =09=09=09=09/** Persistent Node: /jobs/state/jobInstanceId/JobErrorMsg */ > =09=09=09=09this.client.writeString(jobStateJobErrorMsg, jobErrorMsg); > =09=09=09=09/** Persistent Node: /jobs/state/jobInstanceId/JobStatus */= =09=09 > =09=09=09=09this.client.writeInt32(jobStateJobStatus, SuperionConstant.JO= B_STATUS_PARAMETER_CHECK_ERROR); > =09=09=09} catch(Exception ignoreE) { > =09=09=09=09LOG.warn("Ignore Exception", ignoreE);//ignore > =09=09=09} finally { > =09=09=09=09try { > =09=09=09=09=09this.client.deleteEphemeralNode(jobTmpEphemeral); > =09=09=09=09} catch(Exception exception) { > =09=09=09=09=09LOG.warn("Ignore Exception", exception);//ignore > =09=09=09=09} > =09=09=09} > =09=09} > =09} > {code} > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > When we saw logs.we find some jobs(not every one) throw the Exception > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > {code} > ource_visiblity as resource9_2_ from job_depend_resource jobdependr0_ whe= re jobdependr0_.job_id=3D? > 2015-03-28 00:01:58,651 INFO [AsyncDispatcher event handler] containerma= nager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInterna= l(319)) - Start request for container_20150327000156_5755_0299_0144_ by use= r bicbt > 2015-03-28 00:01:58,652 INFO [AsyncDispatcher event handler] containerma= nager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInterna= l(343)) - Creating a new application reference for app application_20150327= 000156_5755 > 2015-03-28 00:01:58,652 INFO [AsyncDispatcher event handler] worker.Work= erAuditLogger (WorkerAuditLogger.java:logSuccess(98)) - USER=3Dbicbt OP= ERATION=3DStart Container Request TARGET=3DContainerManageImpl R= ESULT=3DSUCCESS APPID=3Dapplication_20150327000156_5755 CONTAINERID=3Dco= ntainer_20150327000156_5755_0299_0144_ > 2015-03-28 00:01:58,675 ERROR [AsyncDispatcher event handler] zookeeper.Z= ookeeperService (ZookeeperService.java:startJob(178)) - exception happened = when start job > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode= =3D NodeExists for /jobs/state/1_299_20150328000156_144_0/JobStatus > at org.apache.zookeeper.KeeperException.create(KeeperException.ja= va:119) > at org.apache.zookeeper.KeeperException.create(KeeperException.ja= va:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(Cr= eateBuilderImpl.java:688) > at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(Cr= eateBuilderImpl.java:672) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at org.apache.curator.framework.imps.CreateBuilderImpl.pathInFore= ground(CreateBuilderImpl.java:668) > at org.apache.curator.framework.imps.CreateBuilderImpl.protectedP= athInForeground(CreateBuilderImpl.java:453) > at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(Cr= eateBuilderImpl.java:443) > at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(Cr= eateBuilderImpl.java:44) > at com.suning.cybertron.superion.worker.containermanager.zookeepe= r.ZookeeperClient.writeData(ZookeeperClient.java:119) > at com.suning.cybertron.superion.worker.containermanager.zookeepe= r.ZookeeperClient.writeInt32(ZookeeperClient.java:126) > at com.suning.cybertron.superion.worker.containermanager.zookeepe= r.ZookeeperService.startJob(ZookeeperService.java:176) > at com.suning.cybertron.superion.worker.containermanager.zookeepe= r.ZookeeperService.handle(ZookeeperService.java:104) > at com.suning.cybertron.superion.worker.containermanager.zookeepe= r.ZookeeperService.handle(ZookeeperService.java:30) > at com.suning.cybertron.superion.event.AsyncDispatcher.dispatch(A= syncDispatcher.java:138) > at com.suning.cybertron.superion.event.AsyncDispatcher$1.run(Asyn= cDispatcher.java:85) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)