hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-590) Reducer's pass merger should utilize temporary directories on different disks
Date Tue, 10 Oct 2006 17:40:20 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-590?page=comments#action_12441219 ] 
            
Runping Qi commented on HADOOP-590:
-----------------------------------


I observed from my current running job that the throughput for the sortPass (sort map output
files into runs) is a bit faster than mergePass. I believe two factors contribute that: one
is that the mergePass reads from one disk and the sortPass reads from 4 disks; another is
that the mergePass reads from/writes to the same disk; the third is that the pass factor 
for mergePass is 400, which may be too high.


Instead of writing the intermediate files into one big file on one disk, if sortPass and mergePass
(other than the last pass) write d files, where d is the number of usable disks, then the
next pass will be able to fully utilize the available disks.



> Reducer's pass merger should utilize temporary directories on different disks
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-590
>                 URL: http://issues.apache.org/jira/browse/HADOOP-590
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>
> The current implementation of pass merge of SequenceFile class uses the same temp directory
for the in/out files of the pass merger class, even though when multiple temp dirs are available.
Thus, it cannot fully utlize the advantage of multiple disks during sort.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message