hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Sirianni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
Date Mon, 02 Dec 2013 18:29:36 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836758#comment-13836758
] 

Eric Sirianni commented on MAPREDUCE-5661:
------------------------------------------

The description for {{mapreduce.cluster.local.dir}} implies that that directory will receive
significant load:

{code:xml}
<property>
  <name>mapreduce.cluster.local.dir</name>
  <value>${hadoop.tmp.dir}/mapred/local</value>
  <description>
      The local directory where MapReduce stores intermediate
      data files.  May be a comma-separated list of
      directories on different devices in order to spread disk i/o.
      Directories that do not exist are ignored.
  </description>
</property>
{code}

Since you are suggesting that the default (typically in /tmp) is sufficient, perhaps that
description should be altered?  I'm observing that the shuffle is creating the majority of
the disk I/O in my MapReduce jobs, which is using the {{yarn.nodemanager.local-dirs}}.

> ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5661
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Eric Sirianni
>            Priority: Trivial
>
> While debugging an issue where a MapReduce job is failing due to running out of disk
space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its
{{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
> {noformat}
> $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name "*.java"
| xargs grep "new LocalDirAllocator("
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
   LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
     new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
     this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
> *****hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
     new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
> {noformat}
> This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message