hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "arkady borkovsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-477) Streaming should execute Unix commands and scripts in well known languages without user specifying the path
Date Fri, 15 Sep 2006 17:06:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-477?page=comments#action_12435036 ] 
            
arkady borkovsky commented on HADOOP-477:
-----------------------------------------

As Owen pointed out few weeks ago, all what is necessary is to provide a standard environment
for the executables.
I suggest 3 things to be done:

(1) privide a "standard", meaningful, and documented environment to commands executed by streaming.
   It should include at least
      -- PATH that has /usr/bin,  etc. -- so that commands like grep, perl, python, awk etc
are found, as well as hadoop uitilities
      -- LD_LIBRARY_PATH (to get to C and C++ shared libraries)
      
(2) allow the user to provide a file (similar to .cshrc) to define environment variables.
  

(3) allow the user to define environment variable in the Streaming command.

The (1) is probably a must.
I strongly prefer (2) over (3), as I would just borrow someone set up that works, and I do
not know what are the actual necessary setting.

> Streaming should execute Unix commands and scripts in well known languages without user
specifying the path
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-477
>                 URL: http://issues.apache.org/jira/browse/HADOOP-477
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> If the executables for -mapper or -reducer are well-known (grep, cat, awk), Streaming
should make sure that the executable is found.
> If a script  for -mapper or -reducer are in a well-known language (.pl, .py), Streaming
should  execute it  with the correct language processor.
> Reason:
> many jobs get started from machines with a different environment from that on the cluster.
 
> On another hand, different clusters may have different environments.  
> Also, a user may have no access to the cluster machines.
> Because of this, a user may be unable to specify correct paths for standard commands,
and correct language processors for scripts.
> Implementation:
> Stream may tailr the commands by prepending the path, or the name of language processor.
 
> Another solution is to make sure that the commands are executed in a "meaningful" environment
(with good $PATH, and other variables Unix users are accustomed to count upon).
> Once again, Streaming is user facing tool -- it is not a library or a hackable example
that the users are to modify for their needs.  So it should work out of the box.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message