hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rekha Joshi <rekha...@yahoo-inc.com>
Subject Re: communication protocols in hadoop mapreduce
Date Wed, 21 Apr 2010 11:48:56 GMT
A quick answer would be - it is heartbeat communication mechanism, poll-like flow between JT/TT's.
Also for communication underneath its RPC, and not the default Java serialization but a hadoop
specific serialization implementation to have some performance gains.

AVRO is in strong contention to be used in hadoop for serialization.You might like to also
look up into Thrift, Google Protocol Buffers.


On 4/21/10 4:37 PM, "Ahmad Shahzad" <ashahzad4@gmail.com> wrote:

Hey everyone,
                     I wanted to know that which communication protocols
hadoop mapreduce uses under the hood to provide communication if any. For
example for the shuffle process it uses http to shuffle the values to the
So, job tracker has to talk to task trackers, and task trackers have to
report back to job trackers, and what about if the data  is not available on
the same node and the slave node has to fetch the data from other node. In
all of the cases which communication mechanisms are used to achieve the
communication, is it http only??

I would really appreciate if someone can tell me regarding this thing or if
someone has some link that can help me regarding this issue.

Ahmad Shahzad

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message