hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5850) PATH environment variable contains duplicate values in map and reduce tasks on Windows.
Date Fri, 18 Apr 2014 20:47:14 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated MAPREDUCE-5850:
-------------------------------------

    Attachment: MAPREDUCE-5850.1.patch

This is only a problem on Windows.  It doesn't happen on Linux.  Here is a description of
how this happens.

In {{MRJobConfig}}, the default value of {{mapreduce.admin.user.env}} is defined to set the
PATH environment variable on Windows so that tasks will be able to find and load hadoop.dll.

{code}
  public final String DEFAULT_MAPRED_ADMIN_USER_ENV = 
      Shell.WINDOWS ? 
          "PATH=%PATH%;%HADOOP_COMMON_HOME%\\bin":
          "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native";
{code}

{{TaskAttemptImpl#createCommonContainerLaunchContext}} sets up the base environment.  As part
of that, it includes picking up {{mapreduce.admin.user.env}}.  This is the point where the
behavior diverges from Linux.  On Linux, the common context won't have a PATH.  On Windows,
the common context will have a PATH.

{code}
    // Add the env variables passed by the admin
    MRApps.setEnvFromInputString(
        environment, 
        conf.get(
            MRJobConfig.MAPRED_ADMIN_USER_ENV, 
            MRJobConfig.DEFAULT_MAPRED_ADMIN_USER_ENV), conf
        );
{code}

Then, at task launch time, we end up setting PATH again via a call to {{TaskAttemptImpl#createContainerLaunchContext}}
-> {{MapReduceChildJVM#setVMEnv}} -> {{MRApps#setEnvFromInputString}} -> {{Apps#setEnvFromInputString}}.
 This uses {{Apps#addToEnvironment}} to set the new value in the environment, and the logic
of this method appends to existing values:

{code}
  @Public
  @Unstable
  public static void addToEnvironment(
      Map<String, String> environment,
      String variable, String value, String classPathSeparator) {
    String val = environment.get(variable);
    if (val == null) {
      val = value;
    } else {
      val = val + classPathSeparator + value;
    }
    environment.put(StringInterner.weakIntern(variable), 
        StringInterner.weakIntern(val));
  }
{code}

I haven't been able to come up with a clean fix for this.  We can't change the default value
of {{mapreduce.admin.user.env}}, because tasks are dependent on it to find the native code
(an absolute must on Windows).  We can't drop the appending behavior, because there are valid
use cases dependent on it.  Adding a special case for Windows + PATH seems hacky.  Does anyone
else have ideas?

Since this is ultimately harmless, we might consider simply relaxing the assertion in {{TestMiniMRChildTask}}.
 I'm attaching a patch that does that.  This passes on Mac and Windows.

> PATH environment variable contains duplicate values in map and reduce tasks on Windows.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5850
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5850
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>         Attachments: MAPREDUCE-5850.1.patch
>
>
> The value of the PATH environment variable gets appended twice before execution of a
container for a map or reduce task.  This is ultimately harmless at runtime, but it does cause
a failure in {{TestMiniMRChildTask}} when running on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message