hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: silly question: why http for map output?
Date Thu, 01 Jun 2006 15:55:20 GMT
> The mapoutput files are not located in DFS, they are on the local  
> disks of
> the mapper that creates them, avoiding the 3X replication overhead  
> of DFS.
Wasn't there an issue to allow defining replication on a file based  
level?

>
> Previously, the files were transferred using the RPC mechanism  
> built for
> status updates among the nodes. This mechanism multiplexed multiple  
> RPCs
> through a single TCP connection, and there were various logjams  
> within it.

That it was I was missing. What logjams are solved by http that  
occurs with tcp?

>
> The most straightforward fix was to use HTTP instead of trying to  
> resolve
> the logjam(s). Its a quick solution to a problem that was really a
> bottleneck for some of us.
>
> Do you see a drawback to HTTP?

Well, just wondering, since the idea of http is not really  
transferring large binary files.
I understand that this is a quick fix, but add a new mechanism how  
hadoop transfer data (rpc, dfs, http) surprise me.


Thanks,
Stefan

>
> Paul
>
> On 6/1/06, Stefan Groschupf <sg@media-style.com> wrote:
>>
>> Hi Owen, Hi All,
>>
>> a silly question, please give me some glue.
>> Why  we use now http for mapoutput transfer instead of tcp or the dfs
>> itself?
>> Sorry but the issue HADOOP-254 doesn't give very much information
>> just that it is faster, what surprise me a little bit.
>>
>>
>> Thanks.
>> Stefan
>>
>>
>>
>>
>>
>>


Mime
View raw message