hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dar...@ontrenet.com>
Subject Re: Why inter-rack communication in mapreduce slow?
Date Mon, 06 Jun 2011 13:34:56 GMT

Yeah, that's a good point.

I wonder though, what the load on the tracker nodes (port et. al) would
be if a inter-rack fiber switch at 10's of GBS' is getting maxed.

Seems to me that if there is that much traffic being mitigate across
racks, that the tracker node (or whatever node it is) would overload
first?

if I recall correctly, in order for a intra-rack node to be
selected, the tracker service has to be consulted, putting further
load on it...

On Mon, 06 Jun 2011 09:28:57 -0400, John Armstrong
<john.armstrong@ccri.com> wrote:
> On Mon, 06 Jun 2011 09:26:11 -0400, <darren@ontrenet.com> wrote:
>> I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop
>> "trackers" get bottlenecked to resolve those dependencies?
>> 
>> Again, this exposes the oddity of hadoop IMO, it tries to NOT
>> be I/O bound, but seems its very I/O bound...
> 
> I'm not a jedi either, but I think that the key is in the word "tries". 
> Distributed computing is extremely I/O bound, and Hadoop tries to bring
> that down to just very.

Mime
View raw message