hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1338) Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers of files
Date Tue, 08 May 2007 18:04:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494358
] 

Devaraj Das commented on HADOOP-1338:
-------------------------------------

"This might make more sense for cases where we have two or more waves of reduces" - should
clarify this by saying that, although the first wave of shuffles is gated by the time the
Map phase takes to complete, we should check whether the subsequent waves of shuffles would
gain by this optimization. I think the first phase of shuffle might also gain to some extent.

> Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers
of files
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1338
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1338
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>             Fix For: 0.14.0
>
>
> We should do transfers of map outputs at the granularity of  *total-bytes-transferred*
rather than the current way of transferring a single file and then closing the connection
to the server. A single TaskTracker might have a couple of map output files for a given reduce,
and we should transfer multiple of them (upto a certain total size) in a single connection
to the TaskTracker. Using HTTP-1.1's keep-alive connection would help since it would keep
the connection open for more than one file transfer. We should limit the transfers to a certain
size so that we don't hold up a jetty thread indefinitely (and cause timeouts for other clients).
> Overall, this should give us improved performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message