hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: silly question: why http for map output?
Date Thu, 01 Jun 2006 15:44:25 GMT

The mapoutput files are not located in DFS, they are on the local disks of
the mapper that creates them, avoiding the 3X replication overhead of DFS.

Previously, the files were transferred using the RPC mechanism built for
status updates among the nodes. This mechanism multiplexed multiple RPCs
through a single TCP connection, and there were various logjams within it.

The most straightforward fix was to use HTTP instead of trying to resolve
the logjam(s). Its a quick solution to a problem that was really a
bottleneck for some of us.

Do you see a drawback to HTTP?


On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
> Hi Owen, Hi All,
> a silly question, please give me some glue.
> Why  we use now http for mapoutput transfer instead of tcp or the dfs
> itself?
> Sorry but the issue HADOOP-254 doesn't give very much information
> just that it is faster, what surprise me a little bit.
> Thanks.
> Stefan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message