hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Druilhe Remi <remi.drui...@ens-lyon.fr>
Subject Re: Separate communications of HDFS and MapReduce
Date Wed, 28 Apr 2010 08:32:32 GMT
Ok, I just found what I was looking for. I haven't read the
documentation until the end :-/

Druilhe Remi a écrit :
> Thanks for your answer :)
>
> Allen Wittenauer a écrit :
>   
>> On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
>>   
>>     
>>> For example, when I run "wordcount" example, there is HDFS communications and
MapReduce communications and I am not able to distinguish which packet belong to HDFS or to
MapReduce.
>>>     
>>>       
>> This shouldn't be too surprising given that the MapReduce job needs to talk to HDFS
to determine input and to write output.
>>   
>>     
> You right, there is a link between HDFS and MapReduce but I hoped
> capture each communications in separated files to deal with each,
> independently.
>   
>>> A way could be to use odd port number for HDFS and even port number for MapReduce,
but I think I have to modify source code.
>>>     
>>>       
>> The ports for the services are already separated out.  
>>
>> In general, client -> server connections map out as:
>>
>> MR -> MR, HDFS
>> HDFS -> HDFS
>>   
>>     
> But, is there an easy way to determine which port belong to which
> process once sockets are opened ? Because Hadoop uses a JVM, I can't use
> netstat. I can see which port is connected but not which process in the
> JVM uses it.
>
> Hadoop uses log4j, maybe there is a property that can give me what I am
> looking for.
>   
>> Given a small 3 node grid, a dump of what processes open what ports, and what connections
are made between all the machines, it should be trivial to make a more complex connection
map.  [You can probably even do it as a map reduce job. :) ]
>>     
> Regards,
>
> Rémi Druilhe
>   


Mime
View raw message