hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4888) Use Apache HttpClient for fetching map outputs
Date Wed, 17 Dec 2008 08:06:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657310#action_12657310
] 

Chris Douglas commented on HADOOP-4888:
---------------------------------------

HttpClient with the current patch actually degraded performance in five runs of a shuffle
benchmark on trunk.

498 nodes, 256MB/map, 495 maps, no map-side merge, half of reduce input from memory, no intermediate
compression.

|| Version || 1 || 2 || 3 || 4 || 5 || avg || std.d ||
| r727228 | 406 | 485 | 360 | 448 | 411 | 422 | 48 |
| r727228 + patch | 418 | 357 | 501 | 446 | 442 | 433 |  52 |

Stragglers were dominant. In both versions, output from the final few maps held up the reduce
phase, so neither could distinguish itself with better throughput, connection reuse, protocol
efficiency, etc. Larger benchmarks that might compensate for these effects, such as gridmix,
cannot be run on available nodes.

> Use Apache HttpClient for fetching map outputs
> ----------------------------------------------
>
>                 Key: HADOOP-4888
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4888
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>         Attachments: 4888-0.patch
>
>
> It's worth experimenting with the [HttpClient|http://hc.apache.org/httpclient-3.x/] library
to speed up the shuffle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message