hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alten-Lorenz <wget.n...@gmail.com>
Subject Re: Yarn AM is abending job when submitting a remote job to cluster
Date Thu, 19 Feb 2015 14:25:01 GMT
Daemeon,

Yes, deleting the older stagings should help. But could be that you have to restart the history
server.

BR,
 Alex


> On 19 Feb 2015, at 15:12, roland.depratti <roland.depratti@cox.net> wrote:
> 
> Alex,
> 
> That sounds like a very likely situation.
> 
> I read in the first jira that tokens are now used in nonsecure setups, which explains
my earlier ssl question.
> 
> Is the solution simply to delete those staging files from the cluster?
> 
> - rd 
> 
> 
> Sent from my Verizon Wireless 4G LTE smartphone
> 
> 
> -------- Original message --------
> From: Alexander Alten-Lorenz <wget.null@gmail.com> 
> Date:02/19/2015 7:43 AM (GMT-05:00) 
> To: user@hadoop.apache.org 
> Subject: Re: Yarn AM is abending job when submitting a remote job to cluster 
> 
> Hi,
> 
> https://issues.apache.org/jira/browse/YARN-1116 <https://issues.apache.org/jira/browse/YARN-1058>
> 
> Looks like that the history server received a unclean shutdown or an previous job doesn’t
finished, or wasn’t cleaned up after finishing the job (2015-02-15 07:51:07,241 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident:
(org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 <mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
…. Previous history file is at hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>).
> 
> BR,
> Alex
> 
> 
> > On 19 Feb 2015, at 13:27, Roland DePratti <roland.depratti@cox.net> wrote:
> > 
> > Daemeon,
> >  
> > Thanks for the reply.  I have about 6 months exposure to Hadoop and new to SSL so
I did some digging after reading your message.
> >  
> > In the HDFS config, I have hadoop.ssl.enabled. using the default which is ‘false’
 (which I understand sets it for all Hadoop daemons).
> >  
> > I assumed this meant that it is not in use and not a factor in job submission (ssl
certs not needed).
> >  
> > Do I misunderstand and are you saying that it needs to be set to ‘true’ with
valid certs and store setup for me to submit a remote job (this is a POC setup without exposure
to outside my environment)?
> >  
> > -  rd
> >  
> > From: daemeon reiydelle [mailto:daemeonr@gmail.com] 
> > Sent: Wednesday, February 18, 2015 10:22 PM
> > To: user@hadoop.apache.org
> > Subject: Re: Yarn AM is abending job when submitting a remote job to cluster
> >  
> > I would guess you do not have your ssl certs set up, client or server, based on
the error. 
> > 
> > 
> > .......
> > “Life should not be a journey to the grave with the intention of arriving safely
in a
> > pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
> > thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!”

> > - Hunter Thompson
> > 
> > Daemeon C.M. Reiydelle
> > USA (+1) 415.501.0198
> > London (+44) (0) 20 8144 9872
> >  
> > On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti <roland.depratti@cox.net <mailto:roland.depratti@cox.net>>
wrote:
> > I have been searching for a handle on a problem without very little clues. Any help
pointing me to the right direction will be huge.
> > I have not received any input form the Cloudera google groups. Perhaps this is more
Yarn based and I am hoping I have more luck here.
> > Any help is greatly appreciated.
> >  
> > I am running a Hadoop cluster using CDH5.3. I also have a client machine with a
standalone one node setup (VM).
> >  
> > All environments are running CentOS 6.6.
> >  
> > I have submitted some Java mapreduce jobs locally on both the cluster and the standalone
environment with successfully completions.   
> >  
> > I can submit a remote HDFS job from client to cluster using -conf hadoop-cluster.xml
(see below) and get data back from the cluster with no problem.
> > 
> > When submitted remotely the mapreduce jobs remotely, I get an AM error:
> >  
> > AM fails the job with the error: 
> > 
> >            SecretManager$InvalidToken: appattempt_1424003606313_0001_000002 not
found in AMRMTokenSecretManager
> > 
> > I searched /var/log/secure on the client and cluster with no unusual messages.
> > 
> > Here is the contents of hadoop-cluster.xml:
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > 
> > <!--generated by Roland-->
> > <configuration>
> >   <property>
> >     <name>fs.defaultFS</name>
> >     <value>hdfs://mycluser:8020</value>
> >   </property>
> >   <property>
> >     <name>mapreduce.jobtracker.address</name>
> >     <value>hdfs://mycluster:8032</value>
> >   </property>
> >   <property>
> >     <name>yarn.resourcemanager.address</name>
> >     <value>hdfs://mycluster:8032</value>
> >   </property>
> > 
> > Here is the output from the job log on the cluster:  
> > 
> > 2015-02-15 07:51:06,544 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Created MRAppMaster for application appattempt_1424003606313_0001_000002
> > 2015-02-15 07:51:06,949 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
> > 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
> > 2015-02-15 07:51:06,954 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
> > 2015-02-15 07:51:06,957 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
> > 2015-02-15 07:51:06,973 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
> > 2015-02-15 07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Executing with tokens:
> > 2015-02-15 07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0
<mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
> > 2015-02-15 07:51:07,332 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Using mapred newApiCommitter.
> > 2015-02-15 07:51:07,627 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;  Ignoring.
> > 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
> > 2015-02-15 07:51:07,639 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.keystores.factory.class;  Ignoring.
> > 2015-02-15 07:51:07,645 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
> > 2015-02-15 07:51:07,663 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
> > 2015-02-15 07:51:08,237 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable
to load native-hadoop library for your platform... using builtin-java classes where applicable
> > 2015-02-15 07:51:08,429 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
OutputCommitter set in config null
> > 2015-02-15 07:51:08,499 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> > 2015-02-15 07:51:08,526 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
> > 2015-02-15 07:51:08,527 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
> > 2015-02-15 07:51:08,561 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
> > 2015-02-15 07:51:08,562 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
> > 2015-02-15 07:51:08,566 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> > 2015-02-15 07:51:08,568 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
> > 2015-02-15 07:51:08,568 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
> > 2015-02-15 07:51:08,570 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType
for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
> > 2015-02-15 07:51:08,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Recovery is enabled. Will try to recover from previous life on best effort basis.
> > 2015-02-15 07:51:08,642 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Previous history file is at hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
> > 2015-02-15 <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
07:51:09,147 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Read completed tasks
from history 0
> > 2015-02-15 07:51:09,193 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
> > 2015-02-15 07:51:09,222 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
> > 2015-02-15 07:51:09,277 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Scheduled snapshot period at 10 second(s).
> > 2015-02-15 07:51:09,277 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
MRAppMaster metrics system started
> > 2015-02-15 07:51:09,286 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Adding job token for job_1424003606313_0001 to jobTokenSecretManager
> > 2015-02-15 07:51:09,306 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Not uberizing job_1424003606313_0001 because: not enabled; too much RAM;
> > 2015-02-15 07:51:09,324 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Input size for job job_1424003606313_0001 = 5343207. Number of splits = 5
> > 2015-02-15 07:51:09,325 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Number of reduces for job job_1424003606313_0001 = 1
> > 2015-02-15 07:51:09,325 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1424003606313_0001Job Transitioned from NEW to INITED
> > 2015-02-15 07:51:09,327 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
MRAppMaster launching normal, non-uberized, multi-container job job_1424003606313_0001.
> > 2015-02-15 07:51:09,387 INFO [main]


Mime
View raw message