hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1965) Handle map output buffers better
Date Tue, 23 Oct 2007 15:53:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537049
] 

Doug Cutting commented on HADOOP-1965:
--------------------------------------

> Sort timings increase rapidly as compared to combine+spill with an increase in io.sort.mb.

That's just the expected n*log(n) cost for sorting, no?

> causing the overall map timings to increase with io.sort.mb. 

And reduce timings should correspondingly decrease, since more of the sorting has already
been completed.  An advantage of pushing more of the sort to the map might be that, since
there are generally more map tasks, node failure during map will affect overall job time less
than during reduce, right?


> Handle map output buffers better
> --------------------------------
>
>                 Key: HADOOP-1965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1965
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: 1965_single_proc_150mb_gziped.jpeg, 1965_single_proc_150mb_gziped.pdf,
1965_single_proc_150mb_gziped_breakup.png
>
>
> Today, the map task stops calling the map method while sort/spill is using the (single
instance of) map output buffer. One improvement that can be done to improve performance of
the map task is to have another buffer for writing the map outputs to, while sort/spill is
using the first buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message