hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuan Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows
Date Tue, 26 Jun 2012 19:09:44 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chuan Liu updated MAPREDUCE-4374:
---------------------------------

    Attachment: MAPREDUCE-4374-branch-1-win.patch

In this patch, I provide a more complete implementation for child environment variable expansion,
as well as adding support for Windows. For this feature, we will have different syntaxes on
Windows and Linux. I added some descriptions to the Java doc as well. For example, users can
specify env via the following config on Linux.
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=$HOME:/opt/bin</value>
</property>
{code}
While on Windows, the equivalent will look like:
{code}
<property>
  <name>mapred.child.env</name>
  <value>PATH=%HOME%;C:\opt\bin</value>
</property>
{code}
For the implementation, I followed the following IEEE POSIX standards except the letter case
based on some discussion with my colleagues, i.e. both uppercase and lowercase letters are
allowed. From the discussion, it seems it is both common for applications on Linux and Windows
to use lower case letters for environment variable, and Hadoop does not need to follow IEEE
guideline.  If there are other common use cases in Hadoop community, we can expand the support
as well.
“Environment variable names used by the utilities in the Shell and Utilities volume of IEEE
Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from
the characters defined in Portable Character Set and do not begin with a digit.”
All matching patterns in the string are considered an environment variable, and are expanded
to actual values accordingly.

*Why not use existing syntax, i.e. $ and ':' (e.g. '$x=a:b'),  to set environment variables
on Windows?*
The most common usage for the environment variables is to provide path holders for the programs,
e.g. LD_LIBRARY_PATH, PATH, HOME, etc. Unlike Linux, ':' is common in Windows paths as in
'C:\Windows'. If we use ':' as a separate for different values for the env variable, it will
cause confusing during parsing.
We need to either choose another separator, e.g. ';' (semicolon); or escape ':' (colon). Escaping
':' is very ugly in my opinion, and also not a cross platform solution. If we follow the route
to use another separator, we are already changing the existing syntax. I think using '%' instead
of '$' and ';' will be more natural for Windows users. Since the paths are the most common
usages of env variables, and will most likely be different on Windows and Linux, so it should
be fine to ask users to adopt different settings on different platforms, since they likely
need to change the path settings for different OSes anyway.


I also refactored two related tests to make them run on Windows. For *TestMiniMRChildTask*,
the change is essential choosing different syntax to set the child task config for different
OSes. For *TestTaskEnvironment*, we removed unnecessary parts that seems to be borrowed from
TestJvmManager.
                
> Fix child task environment variable config and add support for Windows
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Priority: Minor
>         Attachments: MAPREDUCE-4374-branch-1-win.patch
>
>
> In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop
config 'mapred.child.env' for child tasks. There are some further fixes and improvements around
this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env'
and 'mapred.reduce.child.env'.  However the current implementation is still not complete.
It does not match its documentation or original intend as I believe. Also, by using ‘:’
(colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them
on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment
variables are used very often to hold path names. The Jira is created to fix the problem and
provide support on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message