hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Job running on YARN gets automatically killed after 10-12 minutes
Date Mon, 05 Nov 2012 18:47:34 GMT
Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so
that RM can know that it is alive. This is done by simply doing an allocate(..) call that
is part of the scheduler API. This you should do irrespective of whether you have any new
container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed
roughly after that much time.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
> 
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.
 
> 
>   I have changed the monitoring time limit Client.java that comes with distributed shell
example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml
by 10 fold. Then also the same kind of error repeats.
> 
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent
container requests to RM. 
> 
> Content from RM's log:
> =====================
> 
> 
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721))
- Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004
on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111))
- Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483))
- appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> 
> 
> 
> Content from NM's log:
> ======================
> 
> 
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160))
- Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387))
- Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP
to FINISHED
> 
> 
> Is this behavior not controllable by any of the parameters in XML configuration files?
> 
> Thanks & Regards,
> Kishore


Mime
View raw message