hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "arkady borkovsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-477) Streaming should execute Unix commands and scripts in well known languages without user specifying the path
Date Fri, 15 Sep 2006 17:06:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-477?page=comments#action_12435036 ] 
arkady borkovsky commented on HADOOP-477:

As Owen pointed out few weeks ago, all what is necessary is to provide a standard environment
for the executables.
I suggest 3 things to be done:

(1) privide a "standard", meaningful, and documented environment to commands executed by streaming.
   It should include at least
      -- PATH that has /usr/bin,  etc. -- so that commands like grep, perl, python, awk etc
are found, as well as hadoop uitilities
      -- LD_LIBRARY_PATH (to get to C and C++ shared libraries)
(2) allow the user to provide a file (similar to .cshrc) to define environment variables.

(3) allow the user to define environment variable in the Streaming command.

The (1) is probably a must.
I strongly prefer (2) over (3), as I would just borrow someone set up that works, and I do
not know what are the actual necessary setting.

> Streaming should execute Unix commands and scripts in well known languages without user
specifying the path
> -----------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-477
>                 URL: http://issues.apache.org/jira/browse/HADOOP-477
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
> If the executables for -mapper or -reducer are well-known (grep, cat, awk), Streaming
should make sure that the executable is found.
> If a script  for -mapper or -reducer are in a well-known language (.pl, .py), Streaming
should  execute it  with the correct language processor.
> Reason:
> many jobs get started from machines with a different environment from that on the cluster.
> On another hand, different clusters may have different environments.  
> Also, a user may have no access to the cluster machines.
> Because of this, a user may be unable to specify correct paths for standard commands,
and correct language processors for scripts.
> Implementation:
> Stream may tailr the commands by prepending the path, or the name of language processor.
> Another solution is to make sure that the commands are executed in a "meaningful" environment
(with good $PATH, and other variables Unix users are accustomed to count upon).
> Once again, Streaming is user facing tool -- it is not a library or a hackable example
that the users are to modify for their needs.  So it should work out of the box.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message