hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nur Kholis Majid <nur.kholis.ma...@gmail.com>
Subject Re: How to set AM attempt interval?
Date Mon, 02 Mar 2015 09:33:29 GMT
Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_000010. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_000001_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
<vinodkv@hortonworks.com> wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
> <nur.kholis.majid@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>

Mime
View raw message