hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
Date Mon, 02 Dec 2013 18:48:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836779#comment-13836779
] 

Jason Lowe commented on MAPREDUCE-5661:
---------------------------------------

Yes, in a YARN cluster the majority of the I/O on a node for MapReduce should be in the yarn.nodemanager.local-dirs
directories.  Those settings are replicated to mapreduce.cluster.local.dir during task startup
under YARN, so admins don't normally have to configure mapreduce.cluster.local.dir.  I would
expect an explicit setting of mapreduce.cluster.local.dir to only take effect when running
a job in local mode, which usually isn't a very big job and therefore the default of somewhere
under /tmp is probably fine for most of those cases.

So to sum up, tasks are using mapreduce.cluster.local.dir but the directories listed there
are derived from yarn.nodemanager.local-dirs in a YARN cluster.  Setting mapreduce.cluster.local.dir
in mapred-site.xml would have no effect for most MapReduce jobs in a YARN cluster.

> ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5661
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Eric Sirianni
>            Priority: Trivial
>
> While debugging an issue where a MapReduce job is failing due to running out of disk
space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its
{{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
> {noformat}
> $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name "*.java"
| xargs grep "new LocalDirAllocator("
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
   LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
     new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
     this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> *****hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
     new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
> {noformat}
> This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message