hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: silly question: why http for map output?
Date Thu, 01 Jun 2006 16:39:36 GMT
Stefan

The logjams were not TCP-related, they were in the Hadoop RPC code, and had
to do with the way multiple requests were multiplexed over a single socket.

I dont think there is anything about HTTP that makes it better or worse for
binary files.

Paul

On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
>
> > The mapoutput files are not located in DFS, they are on the local
> > disks of
> > the mapper that creates them, avoiding the 3X replication overhead
> > of DFS.
> Wasn't there an issue to allow defining replication on a file based
> level?
>
> >
> > Previously, the files were transferred using the RPC mechanism
> > built for
> > status updates among the nodes. This mechanism multiplexed multiple
> > RPCs
> > through a single TCP connection, and there were various logjams
> > within it.
>
> That it was I was missing. What logjams are solved by http that
> occurs with tcp?
>
> >
> > The most straightforward fix was to use HTTP instead of trying to
> > resolve
> > the logjam(s). Its a quick solution to a problem that was really a
> > bottleneck for some of us.
> >
> > Do you see a drawback to HTTP?
>
> Well, just wondering, since the idea of http is not really
> transferring large binary files.
> I understand that this is a quick fix, but add a new mechanism how
> hadoop transfer data (rpc, dfs, http) surprise me.
>
>
> Thanks,
> Stefan
>
> >
> > Paul
> >
> > On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
> >>
> >> Hi Owen, Hi All,
> >>
> >> a silly question, please give me some glue.
> >> Why  we use now http for mapoutput transfer instead of tcp or the dfs
> >> itself?
> >> Sorry but the issue HADOOP-254 doesn't give very much information
> >> just that it is faster, what surprise me a little bit.
> >>
> >>
> >> Thanks.
> >> Stefan
> >>
> >>
> >>
> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message