hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "roland.depratti" <roland.depra...@cox.net>
Subject Re: Yarn AM is abending job when submitting a remote job to cluster
Date Thu, 19 Feb 2015 14:12:50 GMT
Alex,

That sounds like a very likely situation.

I read in the first jira that tokens are now used in nonsecure setups, which explains my earlier
ssl question.

Is the solution simply to delete those staging files from the cluster?

- rd 


Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Alexander Alten-Lorenz <wget.null@gmail.com> 
Date:02/19/2015  7:43 AM  (GMT-05:00) 
To: user@hadoop.apache.org 
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster 

Hi,

https://issues.apache.org/jira/browse/YARN-1116 <https://issues.apache.org/jira/browse/YARN-1058>

Looks like that the history server received a unclean shutdown or an previous job doesn’t
finished, or wasn’t cleaned up after finishing the job (2015-02-15 07:51:07,241 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident:
(org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 <mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
…. Previous history file is at hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>).

BR,
Alex


> On 19 Feb 2015, at 13:27, Roland DePratti <roland.depratti@cox.net> wrote:
> 
> Daemeon,
>  
> Thanks for the reply.  I have about 6 months exposure to Hadoop and new to SSL so I did
some digging after reading your message.
>  
> In the HDFS config, I have hadoop.ssl.enabled. using the default which is ‘false’
 (which I understand sets it for all Hadoop daemons).
>  
> I assumed this meant that it is not in use and not a factor in job submission (ssl certs
not needed).
>  
> Do I misunderstand and are you saying that it needs to be set to ‘true’ with valid
certs and store setup for me to submit a remote job (this is a POC setup without exposure
to outside my environment)?
>  
> -  rd
>  
> From: daemeon reiydelle [mailto:daemeonr@gmail.com] 
> Sent: Wednesday, February 18, 2015 10:22 PM
> To: user@hadoop.apache.org
> Subject: Re: Yarn AM is abending job when submitting a remote job to cluster
>  
> I would guess you do not have your ssl certs set up, client or server, based on the error.

> 
> 
> .......
> “Life should not be a journey to the grave with the intention of arriving safely in
a
> pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
> thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!”

> - Hunter Thompson
> 
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
>  
> On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti <roland.depratti@cox.net <mailto:roland.depratti@cox.net>>
wrote:
> I have been searching for a handle on a problem without very little clues. Any help pointing
me to the right direction will be huge.
> I have not received any input form the Cloudera google groups. Perhaps this is more Yarn
based and I am hoping I have more luck here.
> Any help is greatly appreciated.
>  
> I am running a Hadoop cluster using CDH5.3. I also have a client machine with a standalone
one node setup (VM).
>  
> All environments are running CentOS 6.6.
>  
> I have submitted some Java mapreduce jobs locally on both the cluster and the standalone
environment with successfully completions.   
>  
> I can submit a remote HDFS job from client to cluster using -conf hadoop-cluster.xml
(see below) and get data back from the cluster with no problem.
> 
> When submitted remotely the mapreduce jobs remotely, I get an AM error:
>  
> AM fails the job with the error: 
> 
>            SecretManager$InvalidToken: appattempt_1424003606313_0001_000002 not found
in AMRMTokenSecretManager
> 
> I searched /var/log/secure on the client and cluster with no unusual messages.
> 
> Here is the contents of hadoop-cluster.xml:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> <!--generated by Roland-->
> <configuration>
>   <property>
>     <name>fs.defaultFS</name>
>     <value>hdfs://mycluser:8020</value>
>   </property>
>   <property>
>     <name>mapreduce.jobtracker.address</name>
>     <value>hdfs://mycluster:8032</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.address</name>
>     <value>hdfs://mycluster:8032</value>
>   </property>
> 
> Here is the output from the job log on the cluster:  
> 
> 2015-02-15 07:51:06,544 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created
MRAppMaster for application appattempt_1424003606313_0001_000002
> 2015-02-15 07:51:06,949 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
> 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
> 2015-02-15 07:51:06,954 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
> 2015-02-15 07:51:06,957 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
> 2015-02-15 07:51:06,973 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2015-02-15 07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing
with tokens:
> 2015-02-15 07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind:
YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0
<mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
> 2015-02-15 07:51:07,332 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using
mapred newApiCommitter.
> 2015-02-15 07:51:07,627 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
> 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
> 2015-02-15 07:51:07,639 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
> 2015-02-15 07:51:07,645 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
> 2015-02-15 07:51:07,663 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2015-02-15 07:51:08,237 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes where applicable
> 2015-02-15 07:51:08,429 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter
set in config null
> 2015-02-15 07:51:08,499 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter
is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> 2015-02-15 07:51:08,526 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
> 2015-02-15 07:51:08,527 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
> 2015-02-15 07:51:08,561 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
> 2015-02-15 07:51:08,562 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
> 2015-02-15 07:51:08,566 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2015-02-15 07:51:08,568 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
> 2015-02-15 07:51:08,568 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
> 2015-02-15 07:51:08,570 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
> 2015-02-15 07:51:08,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery
is enabled. Will try to recover from previous life on best effort basis.
> 2015-02-15 07:51:08,642 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous
history file is at hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
> 2015-02-15 <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
07:51:09,147 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Read completed tasks
from history 0
> 2015-02-15 07:51:09,193 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering
class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
> 2015-02-15 07:51:09,222 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded
properties from hadoop-metrics2.properties
> 2015-02-15 07:51:09,277 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Scheduled snapshot period at 10 second(s).
> 2015-02-15 07:51:09,277 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
MRAppMaster metrics system started
> 2015-02-15 07:51:09,286 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Adding job token for job_1424003606313_0001 to jobTokenSecretManager
> 2015-02-15 07:51:09,306 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Not uberizing job_1424003606313_0001 because: not enabled; too much RAM;
> 2015-02-15 07:51:09,324 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Input size for job job_1424003606313_0001 = 5343207. Number of splits = 5
> 2015-02-15 07:51:09,325 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Number of reduces for job job_1424003606313_0001 = 1
> 2015-02-15 07:51:09,325 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1424003606313_0001Job Transitioned from NEW to INITED
> 2015-02-15 07:51:09,327 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster
launching normal, non-uberized, multi-container job job_1424003606313_0001.
> 2015-02-15 07:51:09,387 INFO [main]
Mime
View raw message