apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vlad Rozov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (APEXCORE-770) Application is killed due to NPE in ApplicationMaster
Date Fri, 28 Jul 2017 16:41:00 GMT

     [ https://issues.apache.org/jira/browse/APEXCORE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vlad Rozov updated APEXCORE-770:
--------------------------------
    Description: 
In my apex-application, I was trying to delete different containers ( except the app master
) randomly. 

The application got killed unexpectedly with the following exception -

{noformat}
2017-07-25 11:24:51,681 WARN com.datatorrent.stram.StreamingAppMasterService: Failed to stop
container container_e47_1499808956620_0716_01_000090
org.apache.hadoop.yarn.exceptions.YarnException: Container container_e47_1499808956620_0716_01_000090
is neither started nor scheduled to start
	at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.stopContainerAsync(NMClientAsyncImpl.java:234)
	at com.datatorrent.stram.StreamingAppMasterService.sendContainerAskToRM(StreamingAppMasterService.java:1175)
	at com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:865)
	at com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
	at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
2017-07-25 11:24:51,681 INFO com.datatorrent.stram.StreamingAppMasterService: Requested stop
container container_e47_1499808956620_0716_01_000090
2017-07-25 11:24:51,681 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Processing Event EventType: STOP_CONTAINER for Container container_e47_1499808956620_0716_01_000090
2017-07-25 11:24:51,681 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Container container_e47_1499808956620_0716_01_000090 is already stopped or failed
2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager: Initiating recovery
for container_e47_1499808956620_0716_01_000090@node21.morado.com:8041
2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager: Affected operators
[PTOperator[id=38,name=passthrough,state=ACTIVE], PTOperator[id=105,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=97,name=console,state=ACTIVE], PTOperator[id=106,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=103,name=console,state=ACTIVE], PTOperator[id=107,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=100,name=console,state=ACTIVE], PTOperator[id=108,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=99,name=console,state=ACTIVE], PTOperator[id=109,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=101,name=console,state=ACTIVE], PTOperator[id=110,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=102,name=console,state=ACTIVE], PTOperator[id=111,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=98,name=console,state=ACTIVE], PTOperator[id=112,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=104,name=console,state=ACTIVE], PTOperator[id=68,name=randomGenerator.out#unifier,state=ACTIVE]]
2017-07-25 11:24:52,260 ERROR com.datatorrent.stram.StreamingContainerManager: Unknown container
container_e47_1499808956620_0716_01_000090
2017-07-25 11:24:52,263 INFO com.datatorrent.stram.StreamingContainerParent: child msg: [container_e47_1499808956620_0716_01_000090]
Exiting heartbeat loop.. context: PTContainer[id=38(container_e47_1499808956620_0716_01_000090),state=KILLED,operators=[PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],
PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]]]
2017-07-25 11:24:52,697 INFO com.datatorrent.stram.ResourceRequestHandler: Strict anti-affinity
= [] for container with operators PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]
2017-07-25 11:24:52,698 INFO com.datatorrent.stram.ResourceRequestHandler: Found host null
2017-07-25 11:24:52,698 INFO com.datatorrent.stram.BlacklistBasedResourceRequestHandler: No
node specific request 
2017-07-25 11:24:53,710 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl: Replacing
token for : node18.morado.com:8041
2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingAppMasterService: Got new container.,
containerId=container_e47_1499808956620_0716_02_000034, containerNode=node18.morado.com:8041,
containerNodeURI=node18.morado.com:8042, containerResourceMemory4096, priority32
2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingContainerManager: Removing container
agent container_e47_1499808956620_0716_01_000090
2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable: Setting up container
launch context for containerid=container_e47_1499808956620_0716_02_000034
2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable: CLASSPATH: ./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:.
2017-07-25 11:24:53,946 INFO com.datatorrent.common.util.BasicContainerOptConfigurator: property
map for operator {Generic=null, -Xmx=1920m}
2017-07-25 11:24:53,947 INFO com.datatorrent.common.util.BasicContainerOptConfigurator: property
map for operator {Generic=null, -Xmx=768m}
2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable: Jvm opts  -Xmx3355443200
 for container container_e47_1499808956620_0716_02_000034
2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable: Launching on node:
node18.morado.com:8041 command: $JAVA_HOME/bin/java  -Xmx3355443200  -Ddt.attr.APPLICATION_PATH=hdfs://node18.morado.com:8020/user/vinay/datatorrent/apps/application_1499808956620_0716
-Djava.io.tmpdir=$PWD/tmp -Ddt.cid=container_e47_1499808956620_0716_02_000034 -Dhadoop.root.logger=INFO,RFA
-Dhadoop.log.dir=<LOG_DIR> -Dapex.application.name=$'SlowConsumerTimeoutWindowCountSet.apa'
com.datatorrent.stram.engine.StreamingContainer 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
 
2017-07-25 11:24:53,947 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Processing Event EventType: START_CONTAINER for Container container_e47_1499808956620_0716_02_000034
2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService: Completed containerId=container_e47_1499808956620_0716_01_000090,
state=COMPLETE, exitStatus=0, diagnostics=
2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService: Container completed
successfully., containerId=container_e47_1499808956620_0716_01_000090
2017-07-25 11:24:53,947 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy : node18.morado.com:8041
2017-07-25 11:24:53,948 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting Application
Master
java.lang.NullPointerException
	at com.datatorrent.stram.StreamingAppMasterService$AllocatedContainer.access$1000(StreamingAppMasterService.java:1251)
	at com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:1014)
	at com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
	at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
{noformat}

  was:
In my apex-application, I was trying to delete different containers ( except the app master
) randomly. 

The application got killed unexpectedly with the following exception -

{noformat}
2017-07-26 11:33:08,477 WARN com.datatorrent.stram.StreamingContainerManager: Trying to get
unknown container container_e47_1499808956620_0862_01_000001
2017-07-26 11:33:11,881 WARN com.datatorrent.stram.StreamingAppMasterService: Failed to stop
container container_e47_1499808956620_0862_01_000044
org.apache.hadoop.yarn.exceptions.YarnException: Container container_e47_1499808956620_0862_01_000044
is neither started nor scheduled to start
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.stopContainerAsync(NMClientAsyncImpl.java:234)
at com.datatorrent.stram.StreamingAppMasterService.sendContainerAskToRM(StreamingAppMasterService.java:1175)
at com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:865)
at com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
{noformat}


> Application is killed due to NPE in ApplicationMaster
> -----------------------------------------------------
>
>                 Key: APEXCORE-770
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-770
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Vinay Bangalore Srikanth
>            Assignee: Sandesh
>
> In my apex-application, I was trying to delete different containers ( except the app
master ) randomly. 
> The application got killed unexpectedly with the following exception -
> {noformat}
> 2017-07-25 11:24:51,681 WARN com.datatorrent.stram.StreamingAppMasterService: Failed
to stop container container_e47_1499808956620_0716_01_000090
> org.apache.hadoop.yarn.exceptions.YarnException: Container container_e47_1499808956620_0716_01_000090
is neither started nor scheduled to start
> 	at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
> 	at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.stopContainerAsync(NMClientAsyncImpl.java:234)
> 	at com.datatorrent.stram.StreamingAppMasterService.sendContainerAskToRM(StreamingAppMasterService.java:1175)
> 	at com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:865)
> 	at com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
> 	at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
> 2017-07-25 11:24:51,681 INFO com.datatorrent.stram.StreamingAppMasterService: Requested
stop container container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:51,681 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Processing Event EventType: STOP_CONTAINER for Container container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:51,681 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Container container_e47_1499808956620_0716_01_000090 is already stopped or failed
> 2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager: Initiating
recovery for container_e47_1499808956620_0716_01_000090@node21.morado.com:8041
> 2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager: Affected
operators [PTOperator[id=38,name=passthrough,state=ACTIVE], PTOperator[id=105,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=97,name=console,state=ACTIVE], PTOperator[id=106,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=103,name=console,state=ACTIVE], PTOperator[id=107,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=100,name=console,state=ACTIVE], PTOperator[id=108,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=99,name=console,state=ACTIVE], PTOperator[id=109,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=101,name=console,state=ACTIVE], PTOperator[id=110,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=102,name=console,state=ACTIVE], PTOperator[id=111,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=98,name=console,state=ACTIVE], PTOperator[id=112,name=passthrough.output#unifier,state=ACTIVE],
PTOperator[id=104,name=console,state=ACTIVE], PTOperator[id=68,name=randomGenerator.out#unifier,state=ACTIVE]]
> 2017-07-25 11:24:52,260 ERROR com.datatorrent.stram.StreamingContainerManager: Unknown
container container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:52,263 INFO com.datatorrent.stram.StreamingContainerParent: child msg:
[container_e47_1499808956620_0716_01_000090] Exiting heartbeat loop.. context: PTContainer[id=38(container_e47_1499808956620_0716_01_000090),state=KILLED,operators=[PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],
PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]]]
> 2017-07-25 11:24:52,697 INFO com.datatorrent.stram.ResourceRequestHandler: Strict anti-affinity
= [] for container with operators PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]
> 2017-07-25 11:24:52,698 INFO com.datatorrent.stram.ResourceRequestHandler: Found host
null
> 2017-07-25 11:24:52,698 INFO com.datatorrent.stram.BlacklistBasedResourceRequestHandler:
No node specific request 
> 2017-07-25 11:24:53,710 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl: Replacing
token for : node18.morado.com:8041
> 2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingAppMasterService: Got new
container., containerId=container_e47_1499808956620_0716_02_000034, containerNode=node18.morado.com:8041,
containerNodeURI=node18.morado.com:8042, containerResourceMemory4096, priority32
> 2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingContainerManager: Removing
container agent container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable: Setting up
container launch context for containerid=container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable: CLASSPATH:
./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:.
> 2017-07-25 11:24:53,946 INFO com.datatorrent.common.util.BasicContainerOptConfigurator:
property map for operator {Generic=null, -Xmx=1920m}
> 2017-07-25 11:24:53,947 INFO com.datatorrent.common.util.BasicContainerOptConfigurator:
property map for operator {Generic=null, -Xmx=768m}
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable: Jvm opts
 -Xmx3355443200  for container container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable: Launching
on node: node18.morado.com:8041 command: $JAVA_HOME/bin/java  -Xmx3355443200  -Ddt.attr.APPLICATION_PATH=hdfs://node18.morado.com:8020/user/vinay/datatorrent/apps/application_1499808956620_0716
-Djava.io.tmpdir=$PWD/tmp -Ddt.cid=container_e47_1499808956620_0716_02_000034 -Dhadoop.root.logger=INFO,RFA
-Dhadoop.log.dir=<LOG_DIR> -Dapex.application.name=$'SlowConsumerTimeoutWindowCountSet.apa'
com.datatorrent.stram.engine.StreamingContainer 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
 
> 2017-07-25 11:24:53,947 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl:
Processing Event EventType: START_CONTAINER for Container container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService: Completed
containerId=container_e47_1499808956620_0716_01_000090, state=COMPLETE, exitStatus=0, diagnostics=
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService: Container
completed successfully., containerId=container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:53,947 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy : node18.morado.com:8041
> 2017-07-25 11:24:53,948 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting Application
Master
> java.lang.NullPointerException
> 	at com.datatorrent.stram.StreamingAppMasterService$AllocatedContainer.access$1000(StreamingAppMasterService.java:1251)
> 	at com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:1014)
> 	at com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
> 	at com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message