hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3975) Default value not set for Configuration parameter mapreduce.job.local.dir
Date Tue, 06 Mar 2012 23:05:59 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Payne updated MAPREDUCE-3975:
----------------------------------

    Attachment: MAPREDUCE-3975-1.txt

@Arun,

I modified YarnChild.configureLocalDirs(...) to mimic what JobLocalizer.createWorkDir(...)
did in 20.205. Namely, to use the LocalDirAllocator to get a local path within one of the
"mapreduce.cluster.local.dir" locations and create a scratch directory under it.

TESTING:
On a 10-node secure cluster, I ran manual tests to make sure that the "mapreduce.job.local.dir"
was being set and that the directory was being created.

My manual tests printed the value of "job.local.dir" which showed up in the task logs. I also
made sure that the scratch directories existed and were writeable by the user.

The 20.205 value for "job.local.dir" looked something like this:
  ${mapred.local.dir[x]}/taskTracker/$user/jobcache/$jobid/work
  e.g.: /cluster/0/tmp/mapred-local/taskTracker/joe/jobcache/job_1234_0001/work
The 23 value for "mapreduce.job.local.dir" looks something like this:
  ${mapreduce.cluster.local.dir[x]}/usercache/$user/appcache/$appid/work
  e.g.: /cluster/2/tmp/mapred-local/usercache/joe/appcache/application_5678_0002/work

CONCERN:
This solution works as far as it goes. However, I do have one concern.

In 20.205, it appears that the "job.local.dir" has the same parent dir (e.g. /cluster/0) for
all of the task attempts that are run on a specific node. However, this is not the case in
23. That is, on 23, even if two task attempts are run on the same node for application_5678_0002,
they could have different root directories (e.g., one task attempt would have /cluster/2 for
its root dir and the other would have /cluster/4).

Is this expected behavior, or is YarnChild the wrong place to set this value?

                
> Default value not set for Configuration parameter mapreduce.job.local.dir
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3975
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3975
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1, 0.23.2
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>            Priority: Blocker
>         Attachments: MAPREDUCE-3975-1.txt
>
>
> mapreduce.job.local.dir (formerly job.local.dir in 0.20) is not set by default. This
is a regression from 0.20.205.
> In 0.20.205, JobLocalizer.createWorkDir() constructs the "$mapred.local.dir/taskTracker/$user/jobcache/$jobid/work"
path based on $user and $jobid, and then sets TaskTracker.JOB_LOCAL_DIR in the job's JobConf.
> So far, I haven't found where this is done in 0.23. It could be that this is what should
be done by LocalJobRunner.setupChildMapredLocalDirs(), but I am still investigating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message