hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5299) Reducer inputs should be spilled to HDFS rather than local disk.
Date Fri, 20 Feb 2009 19:15:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675440#action_12675440
] 

Arun C Murthy commented on HADOOP-5299:
---------------------------------------

Caveat emptor:  

# Typical mid-to-large applications have reducers which process multiple gigabytes of data
(e.g. our friend, the sort<nodes>, has each reducer generating 5GB) which means we'll
need atleast tens of intermediate merged files per reducer (assuming a heap-size of 512M).
# io.sort.factor defaults to 100, which means we might open upto a 100 files for the final
merge which then feeds the 'reduce' call. Given that we might have every reducer in the cluster
open a 100 files at time; for a mid-size cluster of 500 nodes with 4 reduce slots that would
be 2000 * 100 ...

> Reducer inputs should be spilled to HDFS rather than local disk.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-5299
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5299
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>
> Currently, both map outputs and reduce inputs are stored on local disks of tasktrackers.
(Un) Availability of local disk space for intermediate data is seen as a major factor in job
failures. 
> The suggested solution is to store these intermediate data on HDFS (maybe with replication
factor of 1). However, the main blocker issue with that solution is that lots of temporary
names (proportional to total number of maps), can overwhelm the namenode, especially since
the map outputs are typically small (most produce one block output).
> Also, as we see in many applications, the map outputs can be estimated more accurately,
and thus users can plan accordingly, based on available local disk space.
> However, the reduce input sizes can vary a lot, especially for skewed data (or because
of bad partitioning.)
> So, I suggest that it makes more sense to keep map outputs on local disks, but the reduce
inputs (when spilled from reducer memory) should go to HDFS.
> Adding a configuration variable to indicate the filesystem to be used for reduce-side
spills would let us experiment and compare the efficiency of this new scheme.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message