hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2116) Job.local.dir to be exposed to tasks
Date Fri, 11 Jan 2008 05:38:34 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-2116:
----------------------------------

    Status: Open  (was: Patch Available)

I light of HADOOP-2570, I'm cancelling this patch.

Reasoning:

The *-file* option works by putting the script into the job's jar file by unjar-ing, copying
and then jar-ing it again. (yuck!) 

This means that on the TaskTracker the script has moved from jobCache/work to jobCache/job_jar_xml
(I propose we rename that to *private*, heh). Clearly user-scripts which rely on "../work/<script_name>"
will break again...

Having said that we need to debate whether this feature is an incompatible-change, what do
folks think?

If people say otherwise we need to ensure all files in jobCache/private are smylinked into
jobCache/work... ugh!

----

I'd like to take this opportunity to take a hard look at streaming's *-file* option too. The
unjar/jar way is completely backwards! We _should_ rework the -file option to use the DistributedCache
and the symlink option it provides.
So, user-scripts can simply be "./<script>" rather than "../work/<script>". Yes,
the way to maintain compatibility (if we want) is to use the previous option of symlinking
files into jobCache/work also. I'd strongly vote for this option.

Thoughts?

> Job.local.dir to be exposed to tasks
> ------------------------------------
>
>                 Key: HADOOP-2116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2116
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.0
>
>         Attachments: patch-2116.txt, patch-2116.txt
>
>
> Currently, since all task cwds are created under a jobcache directory, users that need
a job-specific shared directory for use as scratch space, create ../work. This is hacky, and
will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir
via localized configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message