hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Seeing issues Jobs failing using yarn for setting spark.master=yarn-client in Hive or in mapred for mapreduce.framework.name
Date Sun, 22 May 2016 23:01:09 GMT
Sorted it out. Sometimes simplest of things can derail one :)

It turned out that when running in cluster mode, the local directories used
by the Spark executors and the Spark driver will be the local directories
configured for YARN in yarn-site.xml

yarn.nodemanager.local-dirs.

If the user specifies spark.local.dir, it will be ignored.

In yarn-client mode, the Spark executors will use the local directories
configured for YARN while the Spark driver will use those defined in
spark.local.dir. This is because the Spark driver does not run on the YARN
cluster in yarn-client mode, only the
Spark executors do.

So all I did was to change the following settings:

<property>
    <name>yarn.nodemanager.local-dirs</name>
    *<value>/tmp</value>*
</property>

in mapred-site.xml I set

<property>
<name>mapreduce.cluster.local.dir</name>
<*value>/tmp</value>*
</property>

HTH





Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 22 May 2016 at 15:53, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

> I have started seeing this issue since I tried to use TEZ as well as Spark
> and mr as the execution engine for Hive.
>
>
>
> Anyway I got rid of TEZ for now.
>
>
>
> The thing I have noticed that with set spark.master=yarn-client; in Hive,
> jobs are failing whether hive uses mr or spark as execution engine. The
> same goes if I set this in mapred-site.xml
>
>
>
> <property>
>
>    <name>mapreduce.framework.name</name>
>
>    <value>yarn</value>
>
> </property>
>
>
>
> When I use “set spark.master=local” or use  <value>local</value> it works.
>
>
>
> These are the diagnostics from yarn logs.
>
>
> [image: Inline images 2]
>
> If I look at the logs I see where the failure is coming.
>
>
>
> From resource manager my notes
>
>
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application
> application_1463911910089_0003 failed 2 times due to AM Container for
> appattempt_1463911910089_0003_000002 exited with  exitCode: -1
>
> For more detailed output, check application tracking page:
> http://rhes564:8088/proxy/application_1463911910089_0003/Then, click on
> links to logs of each attempt.
>
> *Diagnostics: File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_02_000001
> does not exist*
>
>
>
> From the node manager log my notes
>
>
>
>
>
> *--yarn stuff*
>
> *2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource **hdfs://rhes564:9000/tmp/hadoop-yarn/*
> *staging/hduser/.staging/job_1463911910089_0003/**job.splitmetainfo **transitioned
> from INIT to DOWNLOADING**. It is in /tmp*
> */nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/10/job.splitmetainfo*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/**job.jar
> **transitioned from INIT to DOWNLOADING*
>
> *It is in
> /tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar/job.jar*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/**job.split
> **transitioned from INIT to DOWNLOADING*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/*
> *job.xml** transitioned from INIT to DOWNLOADING*
>
>
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/13/job.xml*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar/job.jar*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/10/job.splitmetainfo*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/12/job.split*
>
>
>
> *Hive stuff*
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *reduce.xml* transitioned from INIT to DOWNLOADING
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *map.xml* transitioned from INIT to DOWNLOADING
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:55,637 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens.
> Credentials list:
>
> 2016-05-22 13:23:55,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user hduser
>
> 2016-05-22 13:23:55,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
> from*
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
> to
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens
>
> *Source Ok*
>
> *ls -ls
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
>
> *8 -rw-r--r-- 1 hduser hadoop 105 May 22 13:23
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
>
> *Target file copy fails*
>
> ls -l
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens
>
>
>
> ls:
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens:
> No such file or directory
>
>
>
> *But these empty directories are created*
>
>
>
> ltr
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23 filecache
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_01_000001
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_02_000001
>
>
>
>
>
>
>
> *cat yarn-hduser-resourcemanager-rhes564.log*
>
>
>
> 2016-05-22 13:23:51,850 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
> new applicationId: 3
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
> with id 3 submitted by user hduser
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
> application with id application_1463911910089_0003
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1463911910089_0003 State change from NEW to NEW_SAVING
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> IP=50.140.197.217       OPERATION=Submit Application Request
> TARGET=ClientRMService  RESULT=SUCCESS
> APPID=application_1463911910089_0003
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Storing info for app: application_1463911910089_0003
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1463911910089_0003 State change from NEW_SAVING to SUBMITTED
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Application added - appId: application_1463911910089_0003 user: hduser
> leaf-queue of parent: root #applications: 1
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Accepted application application_1463911910089_0003 from user: hduser, in
> queue: default
>
> 2016-05-22 13:23:54,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1463911910089_0003 State change from SUBMITTED to ACCEPTED
>
> 2016-05-22 13:23:54,712 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Registering app attempt : appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:54,712 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from NEW to SUBMITTED
>
> 2016-05-22 13:23:54,712 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application application_1463911910089_0003 from user: hduser activated in
> queue: default
>
> 2016-05-22 13:23:54,712 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application added - appId: application_1463911910089_0003 user:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@60864b2b,
> leaf-queue: default #user-pending-applications: 0
> #user-active-applications: 1 #queue-pending-applications: 0
> #queue-active-applications: 1
>
> 2016-05-22 13:23:54,712 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Added Application Attempt appattempt_1463911910089_0003_000001 to scheduler
> from user hduser in queue default
>
> 2016-05-22 13:23:54,713 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from SUBMITTED to
> SCHEDULED
>
> 2016-05-22 13:23:55,607 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_01_000001 Container Transitioned from NEW to
> ALLOCATED
>
> 2016-05-22 13:23:55,607 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> OPERATION=AM Allocated Container        TARGET=SchedulerApp
> RESULT=SUCCESS  APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:55,607 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Assigned container container_1463911910089_0003_01_000001 of capacity
> <memory:4096, vCores:1> on host rhes564:49141, which has 1 containers,
> <memory:4096, vCores:1> used and <memory:4096, vCores:7> available after
> allocation
>
> 2016-05-22 13:23:55,607 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> assignedContainer application attempt=appattempt_1463911910089_0003_000001
> container=Container: [ContainerId: container_1463911910089_0003_01_000001,
> NodeId: rhes564:49141, NodeHttpAddress: rhes564:8042, Resource:
> <memory:4096, vCores:1>, Priority: 0, Token: null, ] queue=default:
> capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>,
> usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
> clusterResource=<memory:8192, vCores:8>
>
> 2016-05-22 13:23:55,608 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting assigned queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:4096, vCores:1>,
> usedCapacity=0.5, absoluteUsedCapacity=0.5, numApps=1, numContainers=1
>
> 2016-05-22 13:23:55,608 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=0.5 absoluteUsedCapacity=0.5
> used=<memory:4096, vCores:1> cluster=<memory:8192, vCores:8>
>
> 2016-05-22 13:23:55,609 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> Sending NMToken for nodeId : rhes564:49141 for container :
> container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:55,611 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_01_000001 Container Transitioned from
> ALLOCATED to ACQUIRED
>
> 2016-05-22 13:23:55,611 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> Clear node set for appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,611 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> Storing attempt: AppId: application_1463911910089_0003 AttemptId:
> appattempt_1463911910089_0003_000001 MasterContainer: Container:
> [ContainerId: container_1463911910089_0003_01_000001, NodeId:
> rhes564:49141, NodeHttpAddress: rhes564:8042, Resource: <memory:4096,
> vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 50.140.197.217:49141 }, ]
>
> 2016-05-22 13:23:55,611 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from SCHEDULED to
> ALLOCATED_SAVING
>
> 2016-05-22 13:23:55,611 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from ALLOCATED_SAVING to
> ALLOCATED
>
> 2016-05-22 13:23:55,612 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Launching masterappattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,614 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Setting up container Container: [ContainerId:
> container_1463911910089_0003_01_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] for AM appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,614 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Command to launch container container_1463911910089_0003_01_000001 :
> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
>
> 2016-05-22 13:23:55,615 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Create AMRMToken for ApplicationAttempt:
> appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,615 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Creating password for appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
> launching container Container: [ContainerId:
> container_1463911910089_0003_01_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] for AM appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from ALLOCATED to LAUNCHED
>
> 2016-05-22 13:23:56,610 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_01_000001 Container Transitioned from ACQUIRED
> to RUNNING
>
> 2016-05-22 13:23:58,616 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_01_000001 Container Transitioned from RUNNING
> to COMPLETED
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Completed container: container_1463911910089_0003_01_000001 in state:
> COMPLETED event:FINISHED
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS
> APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Released container container_1463911910089_0003_01_000001 of capacity
> <memory:4096, vCores:1> on host rhes564:49141, which currently has 0
> containers, <memory:0, vCores:0> used and <memory:8192, vCores:8>
> available, release resources=true
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> Updating application attempt appattempt_1463911910089_0003_000001 with
> final state: FAILED, and exit status: -1
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> default used=<memory:0, vCores:0> numContainers=0 user=hduser
> user-resources=<memory:0, vCores:0>
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from LAUNCHED to
> FINAL_SAVING
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> completedContainer container=Container: [ContainerId:
> container_1463911910089_0003_01_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] queue=default: capacity=1.0, absoluteCapacity=1.0,
> usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:8192,
> vCores:8>
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Unregistering app attempt : appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:58,617 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0
> used=<memory:0, vCores:0> cluster=<memory:8192, vCores:8>
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Application finished, removing password for
> appattempt_1463911910089_0003_000001
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Application attempt appattempt_1463911910089_0003_000001 released container
> container_1463911910089_0003_01_000001 on node: host: rhes564:49141
> #containers=0 available=8192 used=0 with event: FINISHED
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000001 State change from FINAL_SAVING to
> FAILED
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number
> of failed attempts is 1. The max attempts is 2
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Application Attempt appattempt_1463911910089_0003_000001 is done.
> finalState=FAILED
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Registering app attempt : appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1463911910089_0003 requests cleared
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from NEW to SUBMITTED
>
> 2016-05-22 13:23:58,618 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application removed - appId: application_1463911910089_0003 user: hduser
> queue: default #user-pending-applications: 0 #user-active-applications: 0
> #queue-pending-applications: 0 #queue-active-applications: 0
>
> 2016-05-22 13:23:58,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application application_1463911910089_0003 from user: hduser activated in
> queue: default
>
> 2016-05-22 13:23:58,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application added - appId: application_1463911910089_0003 user:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@4a7e445b,
> leaf-queue: default #user-pending-applications: 0
> #user-active-applications: 1 #queue-pending-applications: 0
> #queue-active-applications: 1
>
> 2016-05-22 13:23:58,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Added Application Attempt appattempt_1463911910089_0003_000002 to scheduler
> from user hduser in queue default
>
> 2016-05-22 13:23:58,620 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from SUBMITTED to
> SCHEDULED
>
> 2016-05-22 13:23:59,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Null container completed...
>
> 2016-05-22 13:23:59,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_02_000001 Container Transitioned from NEW to
> ALLOCATED
>
> 2016-05-22 13:23:59,619 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> OPERATION=AM Allocated Container        TARGET=SchedulerApp
> RESULT=SUCCESS  APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_02_000001
>
> 2016-05-22 13:23:59,620 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Assigned container container_1463911910089_0003_02_000001 of capacity
> <memory:4096, vCores:1> on host rhes564:49141, which has 1 containers,
> <memory:4096, vCores:1> used and <memory:4096, vCores:7> available after
> allocation
>
> 2016-05-22 13:23:59,620 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> assignedContainer application attempt=appattempt_1463911910089_0003_000002
> container=Container: [ContainerId: container_1463911910089_0003_02_000001,
> NodeId: rhes564:49141, NodeHttpAddress: rhes564:8042, Resource:
> <memory:4096, vCores:1>, Priority: 0, Token: null, ] queue=default:
> capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>,
> usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
> clusterResource=<memory:8192, vCores:8>
>
> 2016-05-22 13:23:59,620 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting assigned queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:4096, vCores:1>,
> usedCapacity=0.5, absoluteUsedCapacity=0.5, numApps=1, numContainers=1
>
> 2016-05-22 13:23:59,620 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=0.5 absoluteUsedCapacity=0.5
> used=<memory:4096, vCores:1> cluster=<memory:8192, vCores:8>
>
> 2016-05-22 13:23:59,621 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> Sending NMToken for nodeId : rhes564:49141 for container :
> container_1463911910089_0003_02_000001
>
> 2016-05-22 13:23:59,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_02_000001 Container Transitioned from
> ALLOCATED to ACQUIRED
>
> 2016-05-22 13:23:59,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> Clear node set for appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> Storing attempt: AppId: application_1463911910089_0003 AttemptId:
> appattempt_1463911910089_0003_000002 MasterContainer: Container:
> [ContainerId: container_1463911910089_0003_02_000001, NodeId:
> rhes564:49141, NodeHttpAddress: rhes564:8042, Resource: <memory:4096,
> vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 50.140.197.217:49141 }, ]
>
> 2016-05-22 13:23:59,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from SCHEDULED to
> ALLOCATED_SAVING
>
> 2016-05-22 13:23:59,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from ALLOCATED_SAVING to
> ALLOCATED
>
> 2016-05-22 13:23:59,624 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Launching masterappattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,626 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Setting up container Container: [ContainerId:
> container_1463911910089_0003_02_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] for AM appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,626 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Command to launch container container_1463911910089_0003_02_000001 :
> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
>
> 2016-05-22 13:23:59,626 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Create AMRMToken for ApplicationAttempt:
> appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,626 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Creating password for appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,639 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
> launching container Container: [ContainerId:
> container_1463911910089_0003_02_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] for AM appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:23:59,639 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from ALLOCATED to LAUNCHED
>
> 2016-05-22 13:24:00,623 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_02_000001 Container Transitioned from ACQUIRED
> to RUNNING
>
> 2016-05-22 13:24:02,629 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1463911910089_0003_02_000001 Container Transitioned from RUNNING
> to COMPLETED
>
> 2016-05-22 13:24:02,629 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Completed container: container_1463911910089_0003_02_000001 in state:
> COMPLETED event:FINISHED
>
> 2016-05-22 13:24:02,629 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS
> APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_02_000001
>
> 2016-05-22 13:24:02,629 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Released container container_1463911910089_0003_02_000001 of capacity
> <memory:4096, vCores:1> on host rhes564:49141, which currently has 0
> containers, <memory:0, vCores:0> used and <memory:8192, vCores:8>
> available, release resources=true
>
> 2016-05-22 13:24:02,629 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> Updating application attempt appattempt_1463911910089_0003_000002 with
> final state: FAILED, and exit status: -1
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> default used=<memory:0, vCores:0> numContainers=0 user=hduser
> user-resources=<memory:0, vCores:0>
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from LAUNCHED to
> FINAL_SAVING
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> completedContainer container=Container: [ContainerId:
> container_1463911910089_0003_02_000001, NodeId: rhes564:49141,
> NodeHttpAddress: rhes564:8042, Resource: <memory:4096, vCores:1>, Priority:
> 0, Token: Token { kind: ContainerToken, service: 50.140.197.217:49141 },
> ] queue=default: capacity=1.0, absoluteCapacity=1.0,
> usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:8192,
> vCores:8>
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Unregistering app attempt : appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0
> used=<memory:0, vCores:0> cluster=<memory:8192, vCores:8>
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting completed queue: root.default stats: default: capacity=1.0,
> absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
> absoluteUsedCapacity=0.0, numApps=1, numContainers=0
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Application attempt appattempt_1463911910089_0003_000002 released container
> container_1463911910089_0003_02_000001 on node: host: rhes564:49141
> #containers=0 available=8192 used=0 with event: FINISHED
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Application finished, removing password for
> appattempt_1463911910089_0003_000002
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1463911910089_0003_000002 State change from FINAL_SAVING to
> FAILED
>
> 2016-05-22 13:24:02,630 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number
> of failed attempts is 2. The max attempts is 2
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating
> application application_1463911910089_0003 with final state: FAILED
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1463911910089_0003 State change from ACCEPTED to FINAL_SAVING
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Updating info for app: application_1463911910089_0003
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Application Attempt appattempt_1463911910089_0003_000002 is done.
> finalState=FAILED
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application
> application_1463911910089_0003 failed 2 times due to AM Container for
> appattempt_1463911910089_0003_000002 exited with  exitCode: -1
>
> For more detailed output, check application tracking page:
> http://rhes564:8088/proxy/application_1463911910089_0003/Then, click on
> links to logs of each attempt.
>
> *Diagnostics: File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_02_000001
> does not exist*
>
> Failing this attempt. Failing the application.
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1463911910089_0003 requests cleared
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1463911910089_0003 State change from FINAL_SAVING to FAILED
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application removed - appId: application_1463911910089_0003 user: hduser
> queue: default #user-pending-applications: 0 #user-active-applications: 0
> #queue-pending-applications: 0 #queue-active-applications: 0
>
> 2016-05-22 13:24:02,631 WARN
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> OPERATION=Application Finished - Failed TARGET=RMAppManager
> RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED  PERMISSIONS=Application
> application_1463911910089_0003 failed 2 times due to AM Container for
> appattempt_1463911910089_0003_000002 exited with  exitCode: -1
>
> For more detailed output, check application tracking page:
> http://rhes564:8088/proxy/application_1463911910089_0003/Then, click on
> links to logs of each attempt.
>
> Diagnostics: File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_02_000001
> does not exist
>
> Failing this attempt. Failing the application.
> APPID=application_1463911910089_0003
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Application removed - appId: application_1463911910089_0003 user: hduser
> leaf-queue of parent: root #applications: 0
>
> 2016-05-22 13:24:02,632 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary:
> appId=application_1463911910089_0003,name=select min(id)\,
> max(id)...oraclehadoop.dummy(Stage-1),user=hduser,queue=default,state=FAILED,trackingUrl=
> http://rhes564:8088/cluster/app/application_1463911910089_0003,appMasterHost=N/A,startTime=1463919834711,finishTime=1463919842630,finalStatus=FAILED
>
> 2016-05-22 13:24:02,842 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser
> IP=50.140.197.217       OPERATION=Kill Application Request
> TARGET=ClientRMService  RESULT=SUCCESS
> APPID=application_1463911910089_0003
>
> 2016-05-22 13:24:03,632 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Null container completed...
>
>
>
>
>
>
>
>
>
> *cat yarn-hduser-nodemanager-rhes564.log*
>
>
>
>
>
> 2016-05-22 13:23:55,621 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth successful for appattempt_1463911910089_0003_000001 (auth:SIMPLE)
>
> 2016-05-22 13:23:55,628 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1463911910089_0003_01_000001 by user hduser
>
> 2016-05-22 13:23:55,629 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Creating a new application reference for app application_1463911910089_0003
>
> 2016-05-22 13:23:55,629 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
> IP=50.140.197.217       OPERATION=Start Container Request
> TARGET=ContainerManageImpl      RESULT=SUCCESS
> APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:55,629 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1463911910089_0003 transitioned from NEW to INITING
>
> 2016-05-22 13:23:55,629 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Adding container_1463911910089_0003_01_000001 to application
> application_1463911910089_0003
>
> 2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1463911910089_0003 transitioned from INITING to
> RUNNING
>
> 2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_01_000001 transitioned from NEW to
> LOCALIZING
>
> 2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_INIT for appId application_1463911910089_0003
>
> *--yarn stuff*
>
> *2016-05-22 13:23:55,630 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource **hdfs://rhes564:9000/tmp/hadoop-yarn/*
> *staging/hduser/.staging/job_1463911910089_0003/**job.splitmetainfo **transitioned
> from INIT to DOWNLOADING**. It is in /tmp*
> */nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/10/job.splitmetainfo*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/**job.jar
> **transitioned from INIT to DOWNLOADING*
>
> *It is in
> /tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar/job.jar*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/**job.split
> **transitioned from INIT to DOWNLOADING*
>
> *2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/*
> *job.xml** transitioned from INIT to DOWNLOADING*
>
>
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/13/job.xml*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/11/job.jar/job.jar*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/10/job.splitmetainfo*
>
>
> */tmp/nm-local-dir/usercache/hduser/appcache/application_1463843859823_0003/filecache/12/job.split*
>
>
>
> *Hive stuff*
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *reduce.xml* transitioned from INIT to DOWNLOADING
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *map.xml* transitioned from INIT to DOWNLOADING
>
> 2016-05-22 13:23:55,631 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:55,637 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens.
> Credentials list:
>
> 2016-05-22 13:23:55,643 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user hduser
>
> 2016-05-22 13:23:55,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
> from*
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
> to
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens
>
> *Source Ok*
>
> *ls -ls
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
>
> *8 -rw-r--r-- 1 hduser hadoop 105 May 22 13:23
> /data6/hduser/tmp/nm-local-dir/nmPrivate/container_1463911910089_0003_01_000001.tokens*
>
>
>
> *Target file copy fails*
>
> ls -l
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens
>
>
>
> ls:
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001.tokens:
> No such file or directory
>
>
>
> *But these empty directories are created*
>
>
>
> ltr
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23 filecache
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_01_000001
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_02_000001
>
>
>
>
>
> OK what is there? Only empty stuff
>
>
>
> ltr
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/
>
> total 24
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23 filecache
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_01_000001
>
> drwx--x--- 2 hduser hadoop 4096 May 22 13:23
> container_1463911910089_0003_02_000001
>
>
>
> from *yarn-hduser-resourcemanager-rhes564.log*
>
>
>
> 2016-05-22 13:24:02,631 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application
> application_1463911910089_0003 failed 2 times due to AM Container for
> appattempt_1463911910089_0003_000002 exited with  exitCode: -1
>
> For more detailed output, check application tracking page:
> http://rhes564:8088/proxy/application_1463911910089_0003/Then, click on
> links to logs of each attempt.
>
> *Diagnostics: File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_02_000001
> does not exist*
>
>
>
>
>
> 2016-05-22 13:23:55,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Localizer CWD set to
> /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003
> =
> file:/tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003
>
> 2016-05-22 13:23:55,704 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/job.splitmetainfo(->/data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/filecache/10/job.splitmetainfo)
> transitioned from DOWNLOADING to LOCALIZED
>
> 2016-05-22 13:23:55,874 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/job.jar(->/data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/filecache/11/job.jar)
> transitioned from DOWNLOADING to LOCALIZED
>
> 2016-05-22 13:23:55,888 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/job.split(->/data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/filecache/12/job.split)
> transitioned from DOWNLOADING to LOCALIZED
>
> 2016-05-22 13:23:55,903 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hadoop-yarn/staging/hduser/.staging/job_1463911910089_0003/
> *job.xml*(->  OK
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/filecache/13/job.xml)
> transitioned from DOWNLOADING to LOCALIZED
>
> 2016-05-22 13:23:55,922 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *reduce.xml*(-> OK    /data6/hduser/tmp/nm-local-dir/usercache/hduser/filecache/15/reduce.xml)
> transitioned from DOWNLOADING to LOCALIZED
>
> 2016-05-22 13:23:55,948 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://rhes564:9000/tmp/hive/hduser/848605bf-4c31-4835-8c2c-1822ab5778d5/hive_2016-05-22_13-23-51_579_632928559015756974-1/-mr-10005/f57bfa89-069e-4346-9334-ce333a930113/
> *map.xml*(->  OK
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/filecache/16/map.xml)
> transitioned from DOWNLOADING to LOCALIZED
>
>
>
> *Note that there is only filecache sub-directory under *
> */data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003* *NO
> container_xxxx !*
>
>
>
> Ltr ltr
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/
>
> total 8
>
> drwxr-xr-x 6 hduser hadoop 4096 May 22 13:23 filecache
>
>
>
> ltr
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/filecache/
>
> drwxr-xr-x 2 hduser hadoop 4096 May 22 13:23 13
>
> drwxr-xr-x 2 hduser hadoop 4096 May 22 13:23 12
>
> drwxr-xr-x 3 hduser hadoop 4096 May 22 13:23 11
>
> drwxr-xr-x 2 hduser hadoop 4096 May 22 13:23 10
>
>
>
>
>
> 2016-05-22 13:23:55,948 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_01_000001 transitioned from
> LOCALIZING to LOCALIZED
>
> 2016-05-22 13:23:55,984 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_01_000001 transitioned from
> LOCALIZED to RUNNING
>
> 2016-05-22 13:23:55,985 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Failed to launch container.
>
> java.io.FileNotFoundException: *File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_01_000001
> does not exist*
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>
>         at
> org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
>
>         at
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>
>         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
>
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
>
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
>
>         at
> org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>
>         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:513)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:161)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> 2016-05-22 13:23:55,986 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_01_000001 transitioned from RUNNING
> to EXITED_WITH_FAILURE
>
> 2016-05-22 13:23:55,986 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:56,738 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Starting resource-monitoring for container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:58,128 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Could not get pid for container_1463911910089_0003_01_000001. Waited for
> 2000 ms.
>
> 2016-05-22 13:23:58,141 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state:
> EXITED_WITH_FAILURE       APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_01_000001
>
> 2016-05-22 13:23:58,141 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_01_000001 transitioned from
> EXITED_WITH_FAILURE to DONE
>
> 2016-05-22 13:23:58,141 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Removing container_1463911910089_0003_01_000001 from application
> application_1463911910089_0003
>
> 2016-05-22 13:23:58,141 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1463911910089_0003
>
> 2016-05-22 13:23:59,619 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
> completed containers from NM context:
> [container_1463911910089_0003_01_000001]
>
> 2016-05-22 13:23:59,631 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth successful for appattempt_1463911910089_0003_000002 (auth:SIMPLE)
>
> 2016-05-22 13:23:59,637 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1463911910089_0003_02_000001 by user hduser
>
> 2016-05-22 13:23:59,637 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
> IP=50.140.197.217       OPERATION=Start Container Request
> TARGET=ContainerManageImpl      RESULT=SUCCESS
> APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_02_000001
>
> 2016-05-22 13:23:59,637 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Adding container_1463911910089_0003_02_000001 to application
> application_1463911910089_0003
>
> 2016-05-22 13:23:59,638 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_02_000001 transitioned from NEW to
> LOCALIZING
>
> 2016-05-22 13:23:59,638 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_INIT for appId application_1463911910089_0003
>
> 2016-05-22 13:23:59,639 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_02_000001 transitioned from
> LOCALIZING to LOCALIZED
>
> 2016-05-22 13:23:59,668 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_02_000001 transitioned from
> LOCALIZED to RUNNING
>
> 2016-05-22 13:23:59,670 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Failed to launch container.
>
> java.io.FileNotFoundException: File
> /data6/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1463911910089_0003/container_1463911910089_0003_02_000001
> does not exist
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>
>         at
> org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1051)
>
>         at
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>
>         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:197)
>
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:724)
>
>         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)
>
>         at
> org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>
>         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:720)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:513)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:161)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> 2016-05-22 13:23:59,671 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_02_000001 transitioned from RUNNING
> to EXITED_WITH_FAILURE
>
> 2016-05-22 13:23:59,671 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_1463911910089_0003_02_000001
>
> 2016-05-22 13:24:00,207 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Starting resource-monitoring for container_1463911910089_0003_02_000001
>
> 2016-05-22 13:24:00,207 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1463911910089_0003_01_000001
>
> 2016-05-22 13:24:01,813 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Could not get pid for container_1463911910089_0003_02_000001. Waited for
> 2000 ms.
>
> 2016-05-22 13:24:01,834 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state:
> EXITED_WITH_FAILURE       APPID=application_1463911910089_0003
> CONTAINERID=container_1463911910089_0003_02_000001
>
> 2016-05-22 13:24:01,834 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1463911910089_0003_02_000001 transitioned from
> EXITED_WITH_FAILURE to DONE
>
> 2016-05-22 13:24:01,834 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Removing container_1463911910089_0003_02_000001 from application
> application_1463911910089_0003
>
> 2016-05-22 13:24:01,834 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1463911910089_0003
>
> 2016-05-22 13:24:03,209 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1463911910089_0003_02_000001
>
> 2016-05-22 13:24:03,633 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
> completed containers from NM context:
> [container_1463911910089_0003_02_000001]
>
> 2016-05-22 13:24:03,633 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1463911910089_0003 transitioned from RUNNING to
> APPLICATION_RESOURCES_CLEANINGUP
>
> 2016-05-22 13:24:03,633 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event APPLICATION_STOP for appId application_1463911910089_0003
>
> 2016-05-22 13:24:03,633 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1463911910089_0003 transitioned from
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
>
> 2016-05-22 13:24:03,634 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
> Scheduling Log Deletion for application: application_1463911910089_0003,
> with delay of 10800 seconds
>
> Thanks
>

Mime
View raw message