hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-52) mapred input and output dirs must be absolute
Date Wed, 22 Mar 2006 21:42:10 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-52?page=comments#action_12371466 ] 

Owen O'Malley commented on HADOOP-52:
-------------------------------------

I'll go ahead and update the variable names to camelCast.

The LocalJobRunner only has a problem when the user is submitting multiple jobs at the same
time, right?

I knew I was opening a can of worms with local directories with respect to threads. There
are lots of potential solutions including making the current directory thread local. (Although
the LocalFileSystem would have discontinuities, because it also sets the system property,
which is by definition global.) I think for now that we are better off following the unix
semantics of treating the working directory as global.

For now, I'm proposing synchronizing on the file system, like:

        synchronize (fs) {
          setWorkingDirectory(job, fs);
          MapTask map = new MapTask(file, (String)mapIds.get(i), splits[i]);
        }

around each of the places where the LocalJobRunner sets the working directory.

On a side note, if we are worried about local runners with multiple jobs, we should put synchronization
in around the updates of the jobs list and other fields of the LocalJobRunner.

> mapred input and output dirs must be absolute
> ---------------------------------------------
>
>          Key: HADOOP-52
>          URL: http://issues.apache.org/jira/browse/HADOOP-52
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: cwd.patch
>
> DFS converts relative pathnames to be under /user/$USER.  But MapReduce jobs may be submitted
by a different user than is running the jobtracker and tasktracker.  Thus relative paths must
be resolved before a job is submitted, so that only absolute paths are seen on the job tracker
and tasktracker.  I think the simplest way to fix this is to make JobConf.setInputDir(), setOutputDir(),
etc. resolve relative pathnames. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message