hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
Date Mon, 14 Dec 2009 10:53:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790095#action_12790095

Hemanth Yamijala commented on MAPREDUCE-1186:

Amarsri,  Vinod and I discussed the trunk patch a bit. The current implementation attempts
to work as follows:
- Before task launch, the task controller is launched to secure localized cache files. Previously,
all files under $mapred-local-dir/$user/taskTracker/archive were secured. Obviously, we are
trying to fix that in the context of this JIRA.
- The patch lists the directories under $mapred-local-dir/$user/taskTracker/archive, (which
after MAPREDUCE-1098, is the list of random id directories that were localized).
- For each directory, if the path is not already secured, it secures it recursively.

This approach has a race condition that we identified:
- Say a task has localized a file and has launched the task controller to secure the path,
and the task controller is currently under operation.
- In parallel, say another task localized another file into a different random id directory.
- The task controller could get the random id directory created by the second task when it
is listing directories and set permissions for it. However, this directory does not contain
fully localized files and hence it would be incompletely localized.

The key problem here is that this approach does not have a real idea of what files were localized
by a task as part of the distributed cache. One way to fix that would be to pass the paths
to the task controller, as a list of random id directories under $mapred-local-dir/$user/taskTracker/archive
that were localized in this task. This is what I suggested in the proposal above. However,
there are a few problems with this proposal as well:

- How do we get the list of these paths ? There's currently no way exposed by distributed
cache about these files.
- This could be a huge list, if several tens of files are being localized in a task. How would
we transfer all this info to the task-controller ? A huge command line of paths to the task
controller could be unmanageable, hit some command line length limits, etc. Other approaches
(like transferring the info through a file) would also be cumbersome.
- It could result in duplicate work. Say if two tasks running in parallel are sharing a file,
both of them would get the random id directory to secure, and both would try and secure the

To solve these problems, I am proposing the following:
- Change the directory structure for localized cache files as: $mapred-local-dir/$user/taskTracker/archive/$task-id,
where task-id is for the task attempt on behalf of which localization is happening. Note that
a task could use localized files that have already been localized for another task-id. Since
a cache entry stores the full path for a cache key, it can retrieve this information.
- Move securing the cache file path in the same code path as where localization of the cache
files happens.

The last point is actually important in this new proposal, because without that, we might
have a situation that a task could use files that have been localized by a prior task-id,
but is not yet secured. And if we don't wait for that, we would have incompletely secured
cache files in use.

One drawback I can think of this approach is that the new task-id directory in the path might
give a wrong impression that the files localized under it are all the files used by the task
in distributed cache. But clearly, files localized under other task-ids could be used as well.

Comments on this proposal ?

> While localizing a DistributedCache file, TT sets permissions recursively on the whole
> -----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>         Attachments: patch-1186-1.txt, patch-1186-3-ydist.txt, patch-1186-3-ydist.txt,
patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
> This is a performance problem.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message