hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roland DePratti" <roland.depra...@cox.net>
Subject RE: Yarn AM is abending job when submitting a remote job to cluster
Date Sun, 22 Feb 2015 12:42:42 GMT
Ulul,

 

I appreciate your help and trying my use case.  I think I have a lot of good
details for you.

 

Here is my commands:

 

hadoop jar avgwordlength.jar solution.AvgWordLength -conf
~/conf/hadoop-cluster.xml /user/cloudera/shakespeare wordlengths7.

Since my last email, I examined the syslogs ( I ran both jobs with debug
turned on) for both the remote abend and the local successful run on the
cluster server.

I have attached both logs, plus a file where I posted my manual comparison
findings, and the config xml file

Briefly, here is what I found (more details in Comparison Log w/ Notes
file):

1.     Both logs follow the same steps with same outcome from the beginning
to line 1590.

2.     At line 1590 both logs record a AMRMTokenSelector Looking for Token
with service

-       The successful job does this on the cluster server (192.168.2.253)
since it was run locally.

-       The abending job does this on the client vm (192.168.2.185)

3.     After that point the logs are not the same until JobHistory kicks in

-       The abending log spends a lot of time trying to handle the error

-       The successful job begins processing the job.

o    At line 1615 it setup the queue (root.cloudera)

o    At line 1651 JOB_SETUP_Complete is reported.

o    Both of these messages do not appear in the abended log.

 

My guess is this a setup problem that I produced – I just can’t find it.

 

-        rd

 

 

From: Ulul [mailto:hadoop@ulul.org] 
Sent: Saturday, February 21, 2015 9:50 PM
To: user@hadoop.apache.org
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster

 

Hi Roland

I tried to reproduce your problem with a single node setup submitting a job
to a remote cluster (please note I'm an HDP user, it's a sandbox submitting
to a 3 VMs cluster)
It worked like a charm...
I run into problems when submitting the job from another user but with a
permission problem, it does not look like your AMRMToken problem.

We are probably submitting our jobs differently though. I use hadoop jar
--config <conf dir>, you seem to be using something different since you have
the -conf generic option

Would you please share your job command ?

Ulul 

Le 20/02/2015 03:09, Roland DePratti a écrit :

Xuan,

 

Thanks for asking. Here is the RM log. It almost looks like the log
completes successfully (see red highlighting).

 

 

 

2015-02-19 19:55:43,315 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new
applicationId: 12
2015-02-19 19:55:44,659 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 12 submitted by user cloudera
2015-02-19 19:55:44,659 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
application with id application_1424003606313_0012
2015-02-19 19:55:44,659 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
IP=192.168.2.185    OPERATION=Submit Application Request
TARGET=ClientRMService    RESULT=SUCCESS
APPID=application_1424003606313_0012
2015-02-19 19:55:44,659 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1424003606313_0012 State change from NEW to NEW_SAVING
2015-02-19 19:55:44,659 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing
info for app: application_1424003606313_0012
2015-02-19 19:55:44,660 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1424003606313_0012 State change from NEW_SAVING to SUBMITTED
2015-02-19 19:55:44,666 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Accepted application application_1424003606313_0012 from user: cloudera, in
queue: default, currently num of applications: 1
2015-02-19 19:55:44,667 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1424003606313_0012 State change from SUBMITTED to ACCEPTED
2015-02-19 19:55:44,667 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1424003606313_0012_000001
2015-02-19 19:55:44,667 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from NEW to SUBMITTED
2015-02-19 19:55:44,667 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Added Application Attempt appattempt_1424003606313_0012_000001 to scheduler
from user: cloudera
2015-02-19 19:55:44,669 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from SUBMITTED to
SCHEDULED
2015-02-19 19:55:50,671 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_01_000001 Container Transitioned from NEW to
ALLOCATED
2015-02-19 19:55:50,671 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
OPERATION=AM Allocated Container    TARGET=SchedulerApp    RESULT=SUCCESS
APPID=application_1424003606313_0012
CONTAINERID=container_1424003606313_0012_01_000001
2015-02-19 19:55:50,671 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_1424003606313_0012_01_000001 of capacity
<memory:1024, vCores:1> on host hadoop0.rdpratti.com:8041, which has 1
containers, <memory:1024, vCores:1> used and <memory:433, vCores:1>
available after allocation
2015-02-19 19:55:50,672 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerI
nRM: Sending NMToken for nodeId : hadoop0.rdpratti.com:8041 for container :
container_1424003606313_0012_01_000001
2015-02-19 19:55:50,672 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_01_000001 Container Transitioned from ALLOCATED
to ACQUIRED
2015-02-19 19:55:50,673 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerI
nRM: Clear node set for appattempt_1424003606313_0012_000001
2015-02-19 19:55:50,673 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: Storing attempt: AppId: application_1424003606313_0012 AttemptId:
appattempt_1424003606313_0012_000001 MasterContainer: Container:
[ContainerId: container_1424003606313_0012_01_000001, NodeId:
hadoop0.rdpratti.com:8041, NodeHttpAddress: hadoop0.rdpratti.com:8042,
Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind:
ContainerToken, service: 192.168.2.253:8041 }, ]
2015-02-19 19:55:50,673 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from SCHEDULED to
ALLOCATED_SAVING
2015-02-19 19:55:50,673 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from ALLOCATED_SAVING to
ALLOCATED
2015-02-19 19:55:50,673 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Launching masterappattempt_1424003606313_0012_000001
2015-02-19 19:55:50,674 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting
up container Container: [ContainerId:
container_1424003606313_0012_01_000001, NodeId: hadoop0.rdpratti.com:8041,
NodeHttpAddress: hadoop0.rdpratti.com:8042, Resource: <memory:1024,
vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
192.168.2.253:8041 }, ] for AM appattempt_1424003606313_0012_000001
2015-02-19 19:55:50,675 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command
to launch container container_1424003606313_0012_01_000001 :
$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Djava.net.preferIPv4Stack=true -Xmx209715200
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
2><LOG_DIR>/stderr 
2015-02-19 19:55:50,675 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Create AMRMToken for ApplicationAttempt:
appattempt_1424003606313_0012_000001
2015-02-19 19:55:50,675 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Creating password for appattempt_1424003606313_0012_000001
2015-02-19 19:55:50,688 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
launching container Container: [ContainerId:
container_1424003606313_0012_01_000001, NodeId: hadoop0.rdpratti.com:8041,
NodeHttpAddress: hadoop0.rdpratti.com:8042, Resource: <memory:1024,
vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
192.168.2.253:8041 }, ] for AM appattempt_1424003606313_0012_000001
2015-02-19 19:55:50,688 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from ALLOCATED to
LAUNCHED
2015-02-19 19:55:50,928 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_01_000001 Container Transitioned from ACQUIRED
to RUNNING
2015-02-19 19:55:57,941 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_01_000001 Container Transitioned from RUNNING
to COMPLETED
2015-02-19 19:55:57,941 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
Completed container: container_1424003606313_0012_01_000001 in state:
COMPLETED event:FINISHED
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
OPERATION=AM Released Container    TARGET=SchedulerApp    RESULT=SUCCESS
APPID=application_1424003606313_0012
CONTAINERID=container_1424003606313_0012_01_000001
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Released container container_1424003606313_0012_01_000001 of capacity
<memory:1024, vCores:1> on host hadoop0.rdpratti.com:8041, which currently
has 0 containers, <memory:0, vCores:0> used and <memory:1457, vCores:2>
available, release resources=true
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application attempt appattempt_1424003606313_0012_000001 released container
container_1424003606313_0012_01_000001 on node: host:
hadoop0.rdpratti.com:8041 #containers=0 available=1457 used=0 with event:
FINISHED
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: Updating application attempt appattempt_1424003606313_0012_000001 with
final state: FAILED, and exit status: 1
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from LAUNCHED to
FINAL_SAVING
2015-02-19 19:55:57,942 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Unregistering app attempt : appattempt_1424003606313_0012_000001
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Application finished, removing password for
appattempt_1424003606313_0012_000001
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000001 State change from FINAL_SAVING to
FAILED
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application appattempt_1424003606313_0012_000001 is done. finalState=FAILED
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1424003606313_0012_000002
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
Application application_1424003606313_0012 requests cleared
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from NEW to SUBMITTED
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Added Application Attempt appattempt_1424003606313_0012_000002 to scheduler
from user: cloudera
2015-02-19 19:55:57,943 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from SUBMITTED to
SCHEDULED
2015-02-19 19:55:58,941 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Null container completed...
2015-02-19 19:56:03,950 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_02_000001 Container Transitioned from NEW to
ALLOCATED
2015-02-19 19:56:03,950 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
OPERATION=AM Allocated Container    TARGET=SchedulerApp    RESULT=SUCCESS
APPID=application_1424003606313_0012
CONTAINERID=container_1424003606313_0012_02_000001
2015-02-19 19:56:03,950 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_1424003606313_0012_02_000001 of capacity
<memory:1024, vCores:1> on host hadoop0.rdpratti.com:8041, which has 1
containers, <memory:1024, vCores:1> used and <memory:433, vCores:1>
available after allocation
2015-02-19 19:56:03,950 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerI
nRM: Sending NMToken for nodeId : hadoop0.rdpratti.com:8041 for container :
container_1424003606313_0012_02_000001
2015-02-19 19:56:03,951 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_02_000001 Container Transitioned from ALLOCATED
to ACQUIRED
2015-02-19 19:56:03,951 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerI
nRM: Clear node set for appattempt_1424003606313_0012_000002
2015-02-19 19:56:03,951 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: Storing attempt: AppId: application_1424003606313_0012 AttemptId:
appattempt_1424003606313_0012_000002 MasterContainer: Container:
[ContainerId: container_1424003606313_0012_02_000001, NodeId:
hadoop0.rdpratti.com:8041, NodeHttpAddress: hadoop0.rdpratti.com:8042,
Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind:
ContainerToken, service: 192.168.2.253:8041 }, ]
2015-02-19 19:56:03,952 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from SCHEDULED to
ALLOCATED_SAVING
2015-02-19 19:56:03,952 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from ALLOCATED_SAVING to
ALLOCATED
2015-02-19 19:56:03,952 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Launching masterappattempt_1424003606313_0012_000002
2015-02-19 19:56:03,953 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting
up container Container: [ContainerId:
container_1424003606313_0012_02_000001, NodeId: hadoop0.rdpratti.com:8041,
NodeHttpAddress: hadoop0.rdpratti.com:8042, Resource: <memory:1024,
vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
192.168.2.253:8041 }, ] for AM appattempt_1424003606313_0012_000002
2015-02-19 19:56:03,953 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command
to launch container container_1424003606313_0012_02_000001 :
$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Djava.net.preferIPv4Stack=true -Xmx209715200
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
2><LOG_DIR>/stderr 
2015-02-19 19:56:03,953 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Create AMRMToken for ApplicationAttempt:
appattempt_1424003606313_0012_000002
2015-02-19 19:56:03,953 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Creating password for appattempt_1424003606313_0012_000002
2015-02-19 19:56:03,974 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
launching container Container: [ContainerId:
container_1424003606313_0012_02_000001, NodeId: hadoop0.rdpratti.com:8041,
NodeHttpAddress: hadoop0.rdpratti.com:8042, Resource: <memory:1024,
vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
192.168.2.253:8041 }, ] for AM appattempt_1424003606313_0012_000002
2015-02-19 19:56:03,974 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from ALLOCATED to
LAUNCHED
2015-02-19 19:56:04,947 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_02_000001 Container Transitioned from ACQUIRED
to RUNNING
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1424003606313_0012_02_000001 Container Transitioned from RUNNING
to COMPLETED
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
Completed container: container_1424003606313_0012_02_000001 in state:
COMPLETED event:FINISHED
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
OPERATION=AM Released Container    TARGET=SchedulerApp    RESULT=SUCCESS
APPID=application_1424003606313_0012
CONTAINERID=container_1424003606313_0012_02_000001
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Released container container_1424003606313_0012_02_000001 of capacity
<memory:1024, vCores:1> on host hadoop0.rdpratti.com:8041, which currently
has 0 containers, <memory:0, vCores:0> used and <memory:1457, vCores:2>
available, release resources=true
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: Updating application attempt appattempt_1424003606313_0012_000002 with
final state: FAILED, and exit status: 1
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application attempt appattempt_1424003606313_0012_000002 released container
container_1424003606313_0012_02_000001 on node: host:
hadoop0.rdpratti.com:8041 #containers=0 available=1457 used=0 with event:
FINISHED
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from LAUNCHED to
FINAL_SAVING
2015-02-19 19:56:10,956 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Unregistering app attempt : appattempt_1424003606313_0012_000002
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManage
r: Application finished, removing password for
appattempt_1424003606313_0012_000002
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
: appattempt_1424003606313_0012_000002 State change from FINAL_SAVING to
FAILED
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating
application application_1424003606313_0012 with final state: FAILED
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1424003606313_0012 State change from ACCEPTED to FINAL_SAVING
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Updating info for app: application_1424003606313_0012
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application appattempt_1424003606313_0012_000002 is done. finalState=FAILED
2015-02-19 19:56:10,957 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
Application application_1424003606313_0012 requests cleared
2015-02-19 19:56:10,990 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application
application_1424003606313_0012 failed 2 times due to AM Container for
appattempt_1424003606313_0012_000002 exited with  exitCode: 1 due to:
Exception from container-launch.
Container id: container_1424003606313_0012_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchCon
tainer(DefaultContainerExecutor.java:197)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Containe
rLaunch.call(ContainerLaunch.java:299)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Containe
rLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2015-02-19 19:56:10,990 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1424003606313_0012 State change from FINAL_SAVING to FAILED
2015-02-19 19:56:10,991 WARN
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=cloudera
OPERATION=Application Finished - Failed    TARGET=RMAppManager
RESULT=FAILURE    DESCRIPTION=App failed with state: FAILED
PERMISSIONS=Application application_1424003606313_0012 failed 2 times due to
AM Container for appattempt_1424003606313_0012_000002 exited with  exitCode:
1 due to: Exception from container-launch.
Container id: container_1424003606313_0012_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchCon
tainer(DefaultContainerExecutor.java:197)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Containe
rLaunch.call(ContainerLaunch.java:299)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.Containe
rLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
    at java.lang.Thread.run(Thread.java:745)




 

 

From: Xuan Gong [mailto:xgong@hortonworks.com] 
Sent: Thursday, February 19, 2015 8:23 PM
To: user@hadoop.apache.org
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster

 

Hey, Roland:

    Could you also check the RM logs for this application, please ? Maybe we
could find something there.

 

Thanks

 

Xuan Gong

 

From: Roland DePratti <roland.depratti@cox.net>
Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Date: Thursday, February 19, 2015 at 5:11 PM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: Yarn AM is abending job when submitting a remote job to cluster

 

No, I hear you.  

 

I was just stating that the fact that hdfs works, there is something right
about the connectivity, that’s all, i.e. Server is reachable, hadoop was
able to process the request – but like you said, doesn’t mean yarn works.

 

I tried both your solution and Alex’s solution unfortunately without any
improvement.

 

Here is the command I am executing:

 

hadoop jar avgWordlength.jar  solution.AvgWordLength -conf
~/conf/hadoop-cluster.xml /user/cloudera/shakespeare wordlength4

 

Here is the new hadoop-cluseter.xml

 

<?xml version="1.0" encoding="UTF-8"?>

<!--generated by Roland-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop0.rdpratti.com:8020</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>hadoop0.rdpratti.com:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop0.rdpratti.com:8032</value>
  </property>





I also deleted the .staging directory under the submitting user. Plus
restarted Job History Server. 

 

Resubmitted the job with the same result. Here is the log:

 

2015-02-19 19:56:05,061 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for
application appattempt_1424003606313_0012_000002
2015-02-19 19:56:05,468 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.require.client.cert;  Ignoring.
2015-02-19 19:56:05,471 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-02-19 19:56:05,471 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;
Ignoring.
2015-02-19 19:56:05,473 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.keystores.factory.class;  Ignoring.
2015-02-19 19:56:05,476 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;
Ignoring.
2015-02-19 19:56:05,490 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-02-19 19:56:05,621 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2015-02-19 19:56:05,621 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN,
Service: , Ident:
(org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3909f88f)
2015-02-19 19:56:05,684 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred
newApiCommitter.
2015-02-19 19:56:05,923 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.require.client.cert;  Ignoring.
2015-02-19 19:56:05,925 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-02-19 19:56:05,929 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;
Ignoring.
2015-02-19 19:56:05,930 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.keystores.factory.class;  Ignoring.
2015-02-19 19:56:05,934 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;
Ignoring.
2015-02-19 19:56:05,958 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-02-19 19:56:06,529 WARN [main] org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable
2015-02-19 19:56:06,719 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in
config null
2015-02-19 19:56:06,837 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2015-02-19 19:56:06,881 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.jobhistory.EventType for class
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2015-02-19 19:56:06,882 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2015-02-19 19:56:06,882 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2015-02-19 19:56:06,883 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2015-02-19 19:56:06,884 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2015-02-19 19:56:06,885 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2015-02-19 19:56:06,885 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2015-02-19 19:56:06,886 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for
class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2015-02-19 19:56:06,899 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is enabled. Will
try to recover from previous life on best effort basis.
2015-02-19 19:56:06,918 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at
hdfs://hadoop0.rdpratti.com:8020/user/cloudera/.staging/job_1424003606313_00
12/job_1424003606313_0012_1.jhist
2015-02-19 19:56:07,377 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Read completed tasks from
history 0
2015-02-19 19:56:07,423 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2015-02-19 19:56:07,453 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2015-02-19 19:56:07,507 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
at 10 second(s).
2015-02-19 19:56:07,507 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics
system started
2015-02-19 19:56:07,515 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for
job_1424003606313_0012 to jobTokenSecretManager
2015-02-19 19:56:07,536 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing
job_1424003606313_0012 because: not enabled; too much RAM;
2015-02-19 19:56:07,555 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job
job_1424003606313_0012 = 5343207. Number of splits = 5
2015-02-19 19:56:07,557 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for
job job_1424003606313_0012 = 1
2015-02-19 19:56:07,557 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1424003606313_0012Job Transitioned from NEW to INITED
2015-02-19 19:56:07,558 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching
normal, non-uberized, multi-container job job_1424003606313_0012.
2015-02-19 19:56:07,618 INFO [main] org.apache.hadoop.ipc.CallQueueManager:
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-19 19:56:07,630 INFO [Socket Reader #1 for port 46841]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 46841
2015-02-19 19:56:07,648 INFO [main]
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding
protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2015-02-19 19:56:07,648 INFO [IPC Server Responder]
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-19 19:56:07,649 INFO [main]
org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated
MRClientService at hadoop0.rdpratti.com/192.168.2.253:46841
2015-02-19 19:56:07,650 INFO [IPC Server listener on 46841]
org.apache.hadoop.ipc.Server: IPC Server listener on 46841: starting
2015-02-19 19:56:07,721 INFO [main] org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2015-02-19 19:56:07,727 INFO [main] org.apache.hadoop.http.HttpRequestLog:
Http request log for http.requests.mapreduce is not defined
2015-02-19 19:56:07,739 INFO [main] org.apache.hadoop.http.HttpServer2:
Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-02-19 19:56:07,745 INFO [main] org.apache.hadoop.http.HttpServer2:
Added filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to
context mapreduce
2015-02-19 19:56:07,745 INFO [main] org.apache.hadoop.http.HttpServer2:
Added filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to
context static
2015-02-19 19:56:07,749 INFO [main] org.apache.hadoop.http.HttpServer2:
adding path spec: /mapreduce/*
2015-02-19 19:56:07,749 INFO [main] org.apache.hadoop.http.HttpServer2:
adding path spec: /ws/*
2015-02-19 19:56:07,760 INFO [main] org.apache.hadoop.http.HttpServer2:
Jetty bound to port 39939
2015-02-19 19:56:07,760 INFO [main] org.mortbay.log: jetty-6.1.26.cloudera.4
2015-02-19 19:56:07,789 INFO [main] org.mortbay.log: Extract
jar:file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-yarn-c
ommon-2.5.0-cdh5.3.0.jar!/webapps/mapreduce to
/tmp/Jetty_0_0_0_0_39939_mapreduce____.o5qk0w/webapp
2015-02-19 19:56:08,156 INFO [main] org.mortbay.log: Started
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:39939
2015-02-19 19:56:08,157 INFO [main] org.apache.hadoop.yarn.webapp.WebApps:
Web app /mapreduce started at 39939
2015-02-19 19:56:08,629 INFO [main] org.apache.hadoop.yarn.webapp.WebApps:
Registered webapp guice modules
2015-02-19 19:56:08,634 INFO [main] org.apache.hadoop.ipc.CallQueueManager:
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-19 19:56:08,635 INFO [Socket Reader #1 for port 43858]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 43858
2015-02-19 19:56:08,639 INFO [IPC Server Responder]
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-19 19:56:08,642 INFO [IPC Server listener on 43858]
org.apache.hadoop.ipc.Server: IPC Server listener on 43858: starting
2015-02-19 19:56:08,663 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
nodeBlacklistingEnabled:true
2015-02-19 19:56:08,663 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
maxTaskFailuresPerNode is 3
2015-02-19 19:56:08,663 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
blacklistDisablePercent is 33
2015-02-19 19:56:08,797 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.require.client.cert;  Ignoring.
2015-02-19 19:56:08,798 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-02-19 19:56:08,798 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;
Ignoring.
2015-02-19 19:56:08,798 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.keystores.factory.class;  Ignoring.
2015-02-19 19:56:08,799 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;
Ignoring.
2015-02-19 19:56:08,809 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-02-19 19:56:08,821 INFO [main] org.apache.hadoop.yarn.client.RMProxy:
Connecting to ResourceManager at quickstart.cloudera/192.168.2.185:8030
2015-02-19 19:56:08,975 WARN [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token
.SecretManager$InvalidToken): appattempt_1424003606313_0012_000002 not found
in AMRMTokenSecretManager.
2015-02-19 19:56:08,976 WARN [main] org.apache.hadoop.ipc.Client: Exception
encountered while connecting to the server :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.Secre
tManager$InvalidToken): appattempt_1424003606313_0012_000002 not found in
AMRMTokenSecretManager.
2015-02-19 19:56:08,976 WARN [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token
.SecretManager$InvalidToken): appattempt_1424003606313_0012_000002 not found
in AMRMTokenSecretManager.
2015-02-19 19:56:08,981 ERROR [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while
registering
org.apache.hadoop.security.token.SecretManager$InvalidToken:
appattempt_1424003606313_0012_000002 not found in AMRMTokenSecretManager.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:57)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientI
mpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109
)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati
onHandler.java:187)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand
ler.java:102)
        at com.sun.proxy.$Proxy36.registerApplicationMaster(Unknown Source)
        at
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator
.java:161)
        at
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunic
ator.java:122)
        at
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMCo
ntainerAllocator.java:238)
        at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serv
iceStart(MRAppMaster.java:807)
        at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.jav
a:120)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java
:1075)
        at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1642)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMa
ster.java:1474)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.Secre
tManager$InvalidToken): appattempt_1424003606313_0012_000002 not found in
AMRMTokenSecretManager.
        at org.apache.hadoop.ipc.Client.call(Client.java:1411)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:206)
        at com.sun.proxy.$Proxy35.registerApplicationMaster(Unknown Source)
        at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientI
mpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106
)
        ... 22 more
2015-02-19 19:56:08,983 INFO [main]
org.apache.hadoop.service.AbstractService: Service RMCommunicator failed in
state STARTED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
org.apache.hadoop.security.token.SecretManager$InvalidToken:
appattempt_1424003606313_0012_000002 not found in AMRMTokenSecretManager.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
org.apache.hadoop.security.token.SecretManager$InvalidToken:
appattempt_1424003606313_0012_000002 not found in AMRMTokenSecretManager.

 

 

 

 

From: Ulul [mailto:hadoop@ulul.org] 
Sent: Thursday, February 19, 2015 5:08 PM
To: user@hadoop.apache.org
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster

 

Is your point is that using the hdfs:// prefix is valid since our hdfs
client works ?
fs.defaultFS defines the namenode address and the filesystem type. It doen't
imply that the prefix should be used for yarn and mapreduce options that are
not directly linked to hdfs 





Le 19/02/2015 22:56, Ulul a écrit :

In that case it's just between your hdfs client, the NN and the DNs, no YARN
or MR component involved.
The fact that this works is not related to your MR job not succeeding.





Le 19/02/2015 22:45, roland.depratti a écrit :

Thanks for looking at my problem.

 

I can run an hdfs command from the client, with the config file listed, that
does a cat on a file in hdfs on the remote cluster and returns the contents
of that file to the client.

 

- rd

 

 

Sent from my Verizon Wireless 4G LTE smartphone



-------- Original message --------
From: Ulul  <mailto:hadoop@ulul.org> <hadoop@ulul.org> 
Date:02/19/2015 4:03 PM (GMT-05:00) 
To: user@hadoop.apache.org 
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster


Hi
Doesn't seem like an ssl error to me (the log states that attempts to 
override final properties are ignored)

On the other hand the configuration seems wrong 
:mapreduce.jobtracker.address and yarn.resourcemanager.address should 
only contain an IP or a hostname. You should remove 'hdfs://' though the 
log doesn't suggest it has anything to do with your problem....

And what do you mean by an "HDFS job" ?

Ulul

Le 19/02/2015 04:22, daemeon reiydelle a écrit :
> I would guess you do not have your ssl certs set up, client or server, 
> based on the error.
>
> ***
> .......
> ***“Life should not be a journey to the grave with the intention of 
> arriving safely in a
> pretty and well preserved body, but rather to skid in broadside in a 
> cloud of smoke,
> thoroughly used up, totally worn out, and loudly proclaiming “Wow! 
> What a Ride!”*
> - Hunter Thompson
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872*/
> /
>
> On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti 
> <roland.depratti@cox.net  <mailto:roland.depratti@cox.net>
<mailto:roland.depratti@cox.net>> wrote:
>
>     I have been searching for a handle on a problem without very
>     little clues. Any help pointing me to the right direction will be
>     huge.
>
>     I have not received any input form the Cloudera google groups.
>     Perhaps this is more Yarn based and I am hoping I have more luck here.
>
>     Any help is greatly appreciated.
>
>     I am running a Hadoop cluster using CDH5.3. I also have a client
>     machine with a standalone one node setup (VM).
>
>     All environments are running CentOS 6.6.
>
>     I have submitted some Java mapreduce jobs locally on both the
>     cluster and the standalone environment with successfully completions.
>
>     I can submit a remote HDFS job from client to cluster using -conf
>     hadoop-cluster.xml (see below) and get data back from the cluster
>     with no problem.
>
>     When submitted remotely the mapreduce jobs remotely, I get an AM
>     error:
>
>     AM fails the job with the error:
>
>
>                SecretManager$InvalidToken:
>     appattempt_1424003606313_0001_000002 not found in
>     AMRMTokenSecretManager
>
>
>     I searched /var/log/secure on the client and cluster with no
>     unusual messages.
>
>     Here is the contents of hadoop-cluster.xml:
>
>     <?xml version="1.0" encoding="UTF-8"?>
>
>     <!--generated by Roland-->
>     <configuration>
>       <property>
>         <name>fs.defaultFS</name>
>         <value>hdfs://mycluser:8020</value>
>       </property>
>       <property>
>     <name>mapreduce.jobtracker.address</name>
>         <value>hdfs://mycluster:8032</value>
>       </property>
>       <property>
>     <name>yarn.resourcemanager.address</name>
>         <value>hdfs://mycluster:8032</value>
>       </property>
>
>     Here is the output from the job log on the cluster:
>
>     2015-02-15 07:51:06,544 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created
>     MRAppMaster for application appattempt_1424003606313_0001_000002
>
>     2015-02-15 07:51:06,949 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
>
>     2015-02-15 07:51:06,952 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter:
>     mapreduce.job.end-notification.max.retry.interval; Ignoring.
>
>     2015-02-15 07:51:06,952 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.client.conf;  Ignoring.
>
>     2015-02-15 07:51:06,954 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.keystores.factory.class; 
>     Ignoring.
>
>     2015-02-15 07:51:06,957 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.server.conf;  Ignoring.
>
>     2015-02-15 07:51:06,973 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter:
>     mapreduce.job.end-notification.max.attempts; Ignoring.
>
>     2015-02-15 07:51:07,241 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
>
>     2015-02-15 07:51:07,241 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind:
>     YARN_AM_RM_TOKEN, Service: , Ident:
>     (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0)
>
>     2015-02-15 07:51:07,332 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred
>     newApiCommitter.
>
>     2015-02-15 07:51:07,627 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
>
>     2015-02-15 07:51:07,632 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter:
>     mapreduce.job.end-notification.max.retry.interval; Ignoring.
>
>     2015-02-15 07:51:07,632 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.client.conf;  Ignoring.
>
>     2015-02-15 07:51:07,639 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.keystores.factory.class; 
>     Ignoring.
>
>     2015-02-15 07:51:07,645 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter: hadoop.ssl.server.conf;  Ignoring.
>
>     2015-02-15 07:51:07,663 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to
>     override final parameter:
>     mapreduce.job.end-notification.max.attempts; Ignoring.
>
>     2015-02-15 07:51:08,237 WARN [main]
>     org.apache.hadoop.util.NativeCodeLoader: Unable to load
>     native-hadoop library for your platform... using builtin-java
>     classes where applicable
>
>     2015-02-15 07:51:08,429 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter
>     set in config null
>
>     2015-02-15 07:51:08,499 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is
>     org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
>
>     2015-02-15 07:51:08,526 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.jobhistory.EventType for class
>     org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
>
>     2015-02-15 07:51:08,527 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for
>     class
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
>
>     2015-02-15 07:51:08,561 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for
>     class
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
>
>     2015-02-15 07:51:08,562 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType
>     for class
>
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
>
>     2015-02-15 07:51:08,566 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for
>     class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
>
>     2015-02-15 07:51:08,568 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType
>     for class
>
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
>
>     2015-02-15 07:51:08,568 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType
>     for class
>
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
>
>     2015-02-15 07:51:08,570 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType
>     for class
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
>
>     2015-02-15 07:51:08,599 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is
>     enabled. Will try to recover from previous life on best effort basis.
>
>     2015-02-15 07:51:08,642 INFO [main]
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history
>     file is at
>
hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_
1424003606313_0001_1.jhist
>
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job
_1424003606313_0001_1.jhist2015-02-15>
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job
_1424003606313_0001_1.jhist2015-02-15>
>
>     _2015-02-15
>
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job
_1424003606313_0001_1.jhist2015-02-15>
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job
_1424003606313_0001_1.jhist2015-02-15>_07:51:09,147
>     INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Read
>     completed tasks from history 0
>
>     2015-02-15 07:51:09,193 INFO [main]
>     org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
>     org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type
>     for class
>     org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
>
>     2015-02-15 07:51:09,222 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>
>     2015-02-15 07:51:09,277 INFO [main]

 

 

 


Mime
View raw message