hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: silly question: why http for map output?
Date Thu, 01 Jun 2006 17:40:57 GMT
It's well tested on large binary files!

On Jun 1, 2006, at 9:39 AM, Paul Sutter wrote:

> Stefan
>
> The logjams were not TCP-related, they were in the Hadoop RPC code,  
> and had
> to do with the way multiple requests were multiplexed over a single  
> socket.
>
> I dont think there is anything about HTTP that makes it better or  
> worse for
> binary files.
>
> Paul
>
> On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
>>
>> > The mapoutput files are not located in DFS, they are on the local
>> > disks of
>> > the mapper that creates them, avoiding the 3X replication overhead
>> > of DFS.
>> Wasn't there an issue to allow defining replication on a file based
>> level?
>>
>> >
>> > Previously, the files were transferred using the RPC mechanism
>> > built for
>> > status updates among the nodes. This mechanism multiplexed multiple
>> > RPCs
>> > through a single TCP connection, and there were various logjams
>> > within it.
>>
>> That it was I was missing. What logjams are solved by http that
>> occurs with tcp?
>>
>> >
>> > The most straightforward fix was to use HTTP instead of trying to
>> > resolve
>> > the logjam(s). Its a quick solution to a problem that was really a
>> > bottleneck for some of us.
>> >
>> > Do you see a drawback to HTTP?
>>
>> Well, just wondering, since the idea of http is not really
>> transferring large binary files.
>> I understand that this is a quick fix, but add a new mechanism how
>> hadoop transfer data (rpc, dfs, http) surprise me.
>>
>>
>> Thanks,
>> Stefan
>>
>> >
>> > Paul
>> >
>> > On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
>> >>
>> >> Hi Owen, Hi All,
>> >>
>> >> a silly question, please give me some glue.
>> >> Why  we use now http for mapoutput transfer instead of tcp or  
>> the dfs
>> >> itself?
>> >> Sorry but the issue HADOOP-254 doesn't give very much information
>> >> just that it is faster, what surprise me a little bit.
>> >>
>> >>
>> >> Thanks.
>> >> Stefan
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>>


Mime
View raw message