hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
Date Mon, 23 Sep 2013 22:09:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775711#comment-13775711

Xuan Gong commented on YARN-1229:

[~vinodkv], [~bikassaha], [~hitesh], [~sseth], [~jlowe], [~cnauroth]

The bug shows an error in launch_container.sh while trying to export NM_AUX_SERVICE_mapreduce.shuffle.
The problem is that '.' is not considered a valid character in an environment variable. In
order to solve this, we might need to rename the service name.
There are three places need to rename (use mapreduce_shuffle instead of mapreduce.shuffle):
  public static final String MAPREDUCE_SHUFFLE_SERVICEID =
in ShuffleHandler.java.

The other two places are in yarn_site.xml
        <description>shuffle service that needs to be set for Map Reduce to run </description>

We can just simply replace all three places with mapreduce_shuffle, or we can split the shuffle
service out of the aux_services, say, create a new property called mapreduce_shuffle_service.
The ShuffleHandler can read this property instead of defining MAPREDUCE_SHUFFLE_SERVICEID
by itself. And AuxService#init() will need to read both mapreduce_shuffle_service and yarn.nodemanager.aux-services
to do the initialization. 

An alternate is to convert all special characters to "_" - and AuxServiceHelpers becomes the
public API to access this data.

Since we're trying to rename variables, this can be considered backward incompatible. I would
like get in touch with folks who are already using it.
> Shell$ExitCodeException could happen if AM fails to start
> ---------------------------------------------------------
>                 Key: YARN-1229
>                 URL: https://issues.apache.org/jira/browse/YARN-1229
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Tassapol Athiapinya
>            Assignee: Xuan Gong
>            Priority: Critical
>             Fix For: 2.1.1-beta
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED
due to: Application application_1379673267098_0020 failed 1 times due to AM Container for
appattempt_1379673267098_0020_000001 exited with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh:
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message