hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3011) TT should remove bad local dirs from conf to prevent constant disk checking
Date Thu, 15 Sep 2011 22:15:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105727#comment-13105727

Eli Collins commented on MAPREDUCE-3011:

@Todd - Yes, to re-trigger you need to restart the TT. This is how the code currently works
- once a directory is removed from LocalStorage's "good list" it is never put back while the
TT is running, ie once a dir is identified as bad it won't be used by the TT.   LocalDirAllocator#confChanged
tries to notice when a new dir is added to the conf but we don't add new MR local dirs at
runtime so this feature isn't used. Per HADOOP-7551 LocalDirAllocator (common) and LocalStorage
(mr) are currently independent but should be aware of each other.

@Ravi LocalDirAllocator already keeps track of the valid dirs itself. Once there is a bad
dir LocalDirAllocator#confChanged executes for every call to get a local directory, it's this
code that calls checkDirs on each local directory. It turns out the version of checkDirs that
doesn't take a permissions parameter is not as expensive as I thought (the method that takes
a permission forks a call to ls for each directory which is expensive). However confChanged
creates a new DF object for each local dir which has the side effect of resetting the df interval
which means forking a call to df instead of caching the last result when LocalDirAllocator
uses each DF.

In short, I think it's expensive if the configured dirs are different from the list of valid
dirs maintained by LocalDirAllocator. If we remove bad dirs from the conf in the TT then they
won't differ. Alternatively, we could modify LocalDirAllocator to ignore bad directories but
that would conflict with its current design that explicitly tries to notice a difference between
the set of valid and configured dirs.

> TT should remove bad local dirs from conf to prevent constant disk checking
> ---------------------------------------------------------------------------
>                 Key: MAPREDUCE-3011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3011
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions:
>            Reporter: Eli Collins
>             Fix For:
> Per HADOOP-7551 the TT does not remove bad mapred.local.dirs from the conf so after a
single disk failure *every* call to get a local path for reading or writing results in a disk
check of *all* configured local dirs. After detecting that a local dir is bad we should remove
it from the conf so that we don't repeatedly perform this expensive operation.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message