hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Druilhe Remi <remi.drui...@ens-lyon.fr>
Subject Re: Separate communications of HDFS and MapReduce
Date Tue, 27 Apr 2010 13:01:45 GMT
Thanks for your answer :)

Allen Wittenauer a écrit :
> On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
>> For example, when I run "wordcount" example, there is HDFS communications and MapReduce
communications and I am not able to distinguish which packet belong to HDFS or to MapReduce.
> This shouldn't be too surprising given that the MapReduce job needs to talk to HDFS to
determine input and to write output.
You right, there is a link between HDFS and MapReduce but I hoped
capture each communications in separated files to deal with each,
>> A way could be to use odd port number for HDFS and even port number for MapReduce,
but I think I have to modify source code.
> The ports for the services are already separated out.  
> In general, client -> server connections map out as:
> MR -> MR, HDFS
But, is there an easy way to determine which port belong to which
process once sockets are opened ? Because Hadoop uses a JVM, I can't use
netstat. I can see which port is connected but not which process in the
JVM uses it.

Hadoop uses log4j, maybe there is a property that can give me what I am
looking for.
> Given a small 3 node grid, a dump of what processes open what ports, and what connections
are made between all the machines, it should be trivial to make a more complex connection
map.  [You can probably even do it as a map reduce job. :) ]

Rémi Druilhe

View raw message